[ C H R O N I C L E ]
How has our literary heritage managed to survive through the ages? The content of a book - a succession of characters (forgetting for a moment the illuminations, textures, paper’s odour, etc.) - is printed on media kept in libraries. Copies can “disappear”, libraries be flooded, tyrants decide to burn every copy of a work in a given geographical area... as long as one copy survives, the book is safe; we can replicate it. Only the cost of duplication is limiting. If we lose the content of a book, it is only through negligence, or a lack of interest. Digital information on the other hand (a book, an image, a video, etc.) can be stored and reproduced massively at almost no cost. It can be dispersed into space to preserve it from fire, water, tyrants, etc. Why then does it seem so difficult to safeguard our digital memory?
Firstly, formats are evolving. We all have videos of our kids’ first steps in VHS that we can’t watch anymore. Solutions do exist however, such as a software that converts old formats into more recent ones that we can read. Another problem is that current standard media has a relatively short lifespan (from a few years to a few dozen years), typically shorter than Sumerian tablets or paper. It is easy to overcome this difficulty; just replicate the information regularly on new media to constantly ensure the existence of at least once complete copy. This requires effort however. While not entirely free, preserving digital data is still possible and therefore less costly than its physical counterpart.
So where is the problem exactly? In the avalanche of data that forces us to choose what to keep; we can’t keep everything. It is said that if we emptied every disk and data support on the planet on 31st December and started filling them the next day without deleting anything, the space would run out before the end of the year. Hyperthymesia is not an option for humanity. Neither is it one for any of us, as we will ultimately drown in an ocean of data. We have to choose... and this is a laborious task.
Before, we would sort. Possibly using a shoebox, where we kept our most precious photos. The most organised of us put them into albums. Where are our photos now? Somewhere on an Instagram or Facebook account, on our telephone, a computer or an external hard drive, etc. Equipment gets damaged, stolen, hacked. The cloud may help by eliminating the dependency on specific equipment, but we got lost when we have several of them. And a supplier may decide not to archive our data beyond just a few years, unbeknownst to us. We may change computer, close an account, time passes... and we lose whole sections of our memory.
What can we do? Let us put everything we are attached to in a digital shoebox - our favourite digital photos, films, texts, books, etc. The cost: pay a service provider to guarantee its continued existence, or keep several copies of the box and check from time to time that a sufficient number of them are operational. Above all however, we have to choose what we keep. Companies, governments and archivists are all faced with the same issue. The real issue, for them and for us, is choosing exactly what we want to keep. Ultimately, with the explosion in the volume of data, we haven’t a hope of dealing with this without the support of algorithms. Salvation will come in the form of digital assistants responsible for preserving our memories. In the meantime, we will just have to manage our digital shoeboxes using our own brainpower.
Serge Abiteboul is an IT researcher at Inria and ENS Paris, a blogger at binaire.blog.lemonde.fr and a member of the French Academy of Sciences.