Storing data! The challenge of the 21st century… A few bytes written onto long chains of synthetic polymers are opening up huge perspectives.
Photo credit: Tomasz Sowinski/Getty Image
Data storage on traditional materials (silicon, hard disks) is reaching its limits.
To overcome these limits, chemists are starting to store digital data on natural polymers, such as DNA, but also on synthetic polymers.
Today, synthetic polymers can be used to store a few bytes.
In our digital society, we are producing ever more data: documents, images, videos… and it is becoming increasingly difficult to conserve them. Data centres funded by the Internet giants are monopolising vast spaces that require huge quantities of energy. And for good reason: traditional supports, such as hard disks or flash memories, have more or less reached their limits in terms of miniaturisation and storage capacity. Within this context, for decades numerous scientists have been looking for other ways of storing information. Among the avenues being explored, storage on a molecular scale (well below the nanometric scale) is currently in fashion. Promising new solutions have been emerging for several years, all still limited to the laboratory: storage on atoms, on magnetic molecules, on DNA or on synthetic polymers. We have been exploring the synthetic polymer avenue in our laboratory; our team has demonstrated that digital data can be efficiently written, stored, read and deleted on them.
How is digital information stored today? In binary form, i.e. sequences of two figures, 0 and 1, known as bits, which are themselves grouped into bytes (blocks of 8 bits). The storage capacity and overall volume of a memory depends directly on bit size. In a hard disk, bits are small magnetic regions. By changing the magnetic orientation of each region using a write-head, it is possible to write 0s and 1s. Over the last few years, the size of these areas has been reduced to a few tens of nanometres, namely thanks to the work of French physicist Albert Fert on giant magnetoresistance (*), work which was awarded the Nobel prize in 2007. However, it now seems difficult to descend to a smaller scale using magnetic materials.
FROM DNA TO SYNTHETICS
What about molecular storage, and more specifically, how can data be written onto polymer chains? Polymers are giant molecules (or macromolecules) that bind together small component units called monomers. As well as manufactured polymers, such as polyethylene and polystyrene found in numerous everyday applications, natural polymers (or biopolymers) play an essential role in biological organisms. One such organism is DNA. This macromolecule is used to store, transmit and develop the genetic information of all living beings by means of a molecular alphabet comprised of four monomers (represented by the letters A, T, G and C). Each gene is encoded in a specific sequence of these four monomers. This is a simple principle, optimised over billions of years of evolution. DNA therefore operates like a molecular hard disk. This type of storage is progress towards miniaturisation, as the distance between two adjacent monomers is 0.34 nanometres, a hundred times smaller than the magnetic areas on a hard disk. The obvious analogies between IT and genetics were pointed out in the 1980s, namely by British biologist Richard Dawkins in his book The Blind Watchmaker. Only recently, however, have scientists tried to use synthetic DNA to store binary data. The principle is quite simple: two of the four DNA monomers are used to represent a 0, and the two others a 1. The method of writing on DNA is however quite different from that used on a magnetic hard disk; the bits are created by macromolecular synthesis, i.e. by attaching the monomers to each other, one by one, according to a predetermined sequence. Thanks to considerable advances in biotechnologies, it is now relatively easy to automatically synthesise chains of DNA. It is also possible to copy them using in vitro replication – and also to read them using sequencing tools. In 2012 and 2013, the respective teams of George Church in Harvard and Nick Goldman in Cambridge, managed to store several kilobytes of data on chains of synthetic DNA. Data storage on DNA even seems to be developing much faster than the speed predicted by Moore’s Law (*). Last March, two American bioinformaticians announced that they had written and read two megabytes of binary data with a fabulous density of 215 petabytes (*) per gram of DNA (1). It operates therefore rather like a read-only memory the contents of which can be read several times, but cannot be edited. Using DNA would nevertheless enable the creation of archives that would last several hundreds of thousands, or even millions of years, if we are to believe the conservation times announced by paleogeneticists studying ancient DNA. With this aim of archiving and preserving cultural heritage in mind, the American company Technicolor recently encoded Georges Méliès’ film A Trip to the Moon onto DNA; in a test tube containing one gram of DNA, the engineers announced that they had recorded a million copies of the film (2). However, DNA is perhaps not the most practical or economical polymer to use for this type of application. Its molecular structure was selected by Darwinian evolution to operate in vivo, i.e. in conditions far removed from those used in information technologies. Our idea was to extend the principle of molecular storage to synthetic polymers. Their molecular structure is usually quite simplistic, suited to the so-called “commodity” applications they were reserved for. Most of them are homopolymers, comprised of a single type of monomer, excluding the possibility of storing encoded information as on DNA. Over the last few decades however, a certain number of synthetic copolymers have been discovered, containing several types of monomers. Their monomer sequences are mostly random or very rudimentary (repetitive or comprising blocks), and this precludes the storage of complex messages. Nevertheless, several methods have been developed which can now be used to control the sequences of synthetic polymers. In particular, “solid support synthesis”, usually used to prepare biopolymers, has proven suitable for obtaining synthetic polymers (see p. 64). This is the type of approach we chose to synthesise the first examples of polymers containing binary data. How? In general, we use two monomers with slightly different chemical structures, one which we arbitrarily define as a 0 and the other as a 1. Then, to form a specific binary sequence, we attach these monomers one by one in a specific order to a solid support. For example, to create the byte 01001101, we successively attach the monomers 0, 1, 0, 0, 1, 1, 0 and 1. This is a step-by-step process where each added monomer requires one or more chemical reactions. When done manually, these syntheses are rather fastidious and as such, our first digital polymer, described in 2014, only contained three bits of data. However, in 2015, we demonstrated that long chains containing 13 bytes (104 bits) could be synthesised using a robotic platform that automated and simplified the macromolecular synthesis (3). As for DNA, these polymers seem to be developing very rapidly and it is realistic to imagine a storage capacity greater than one kilobyte within a few years. One of the avenues being envisaged to increase storage capacity would be to use libraries of encoded chains where each component would contain a fraction of the information, in the same way as we make sentences using a set of words. For example, a library of 100 different encoded polymers, where each polymer would contain a “word” of 10 bytes of data, could be used to store one kilobyte of data. Of course, binary is not the only language that could be used on a polymer; more elaborate codes – ternary or others – could easily be developed using more than two monomers.
A WIDE SPECTRUM OF POSSIBILITIES
Writing and storing data on a polymer is rather easy, as we have seen. But how can the data be read? Until very recently, sequencing techniques had only been envisaged for studying natural polymers such as DNA or proteins. In fact, most of these techniques use biological mechanisms unsuited to synthetic polymers.
Most, but not all; over the last two years, we have shown that synthetic encoded macromolecules can also be read and sequenced like DNA using tandem mass spectrometry, a more universal method that can be applied to all types of macromolecule. In this approach, the chains are ionised and subjected to collisions that break them apart. By measuring the molar mass (*) of the various fragments, it is possible to reconstruct the sequence of monomers in the polymer analysed. In collaboration with Laurence Charles’ team at the University of Aix-Marseille, we showed that sequencing using tandem mass spectrometry is a highly robust method of analysing synthetic encoded polymers (4). In particular, we were the first to have read binary data from such polymers (Fig. 1). Beyond writing and reading, are digital synthetic polymers suitable for the requirements of data management, strictly speaking? Obviously, they are not self-replicating like DNA and cannot therefore be copied using amplification techniques such as PCR. They do have other advantages however. Their molecular structure can be easily modified and optimised to obtain specific properties: long-term deletion, updating and storage. In fact, modern synthetic chemistry has an almost infinite range of possibilities, meaning solutions can be imagined to any problem (acceleration of read/write speeds, optimisation of storage conditions, etc.). There is also a wide choice of molecular bits. The molecular encoding used in synthetic polymers is defined by the chemist and can therefore be adapted to facilitate macromolecular synthesis and sequencing. For example, we demonstrated that chemical functions that are as easy to manipulate as a hydrogen atom (H) or a methyl group (CH3), added to monomers, are sufficient to create a decryptable binary language; sequencing using tandem mass spectrometry measures the molar mass of each unit making up the encoded chain and easily detects a methylated monomer unit (with a methyl group), slightly heavier than its unmethylated variant (with a hydrogen atom instead of a methyl). A methylated monomer could easily be interpreted as a 1 while an unmethylated monomer would be interpreted as a 0. We also discovered that the molecular structure of a synthetic polymer can be deliberately modified to facilitate sequencing. Thus, polymers containing fragile covalent bonds lead to perfectly controlled fragmentations that are easy to interpret. Furthermore, synthetic polymers can have specific physical-chemical properties, beneficial for data storage. For example, some polymers degrade or change configuration under physical or chemical external stimuli, such as light or temperature. These phenomena can be used to break down macromolecular chains and thus delete the information. We observed that thermosensitive encoded polymers can be kept for several months (and probably longer) at ambient temperature. However, they can also be entirely deleted in a few hours at a temperature of 90°C (the chains start to break down above 60°C). One could even imagine rewritable digital macromolecules, using so-called “dynamic” polymers; their sequences can reorganise themselves. Finally, let us not forget that some synthetic polymers are stable in the very long term, sometimes for millennia. While this is a real issue for our environment, especially in the oceans where plastics accumulate, this property could be of significant advantage for long-term archiving of documents.
Despite its recent appearance, data storage on the macromolecular scale is highly promising. Nevertheless, the field remains extremely basic, and memories based on synthetic DNA or other types of macromolecules will need several years of development before they are functional. In particular, the mechanisms for writing and reading these polymers will need to be optimised, and made faster and less costly. Solutions do exist, such as accelerated writing methods based on ultra-reactive chemistry. Moreover, the development of molecular memories is not only a problem of chemistry. Now it is a nanotechnological challenge – solutions need to be found to organise and make these encoded chains accessible spatially as on a hard disk – issues we are already working on. The technique’s success will also depend on our ability to move towards an interdisciplinary context combining physics, mathematics and engineering with polymer chemistry. In any event, we are convinced that the field of macromolecular data storage will bring big surprises.
SYNTHESIS ON POLYSTYRENE
One of the key factors in molecular memory performance is our ability to synthesise polymers containing perfectly-controlled sequences of monomers. This is not an easy task. The most efficient method is a multi-step process in which monomers are attached one by one using successive chemical reactions. However, when these reactions are carried out in liquid phase, the method is fastidious and time-consuming as, after each addition of monomer, the compound formed needs to be purified. Robert Bruce Merrifield, 1984 Nobel Prize for Chemistry, considerably simplified this method in the 1960s by proposing solid support synthesis; the sequences are built on polystyrene microbeads and after each additional monomer, the beads are filtered and rinsed with solvents. This is a much easier method as the additions of monomers and purifications can be controlled and automated. Initially developed for protein synthesis, this process was extended to the synthesis of oligonucleotides (short segments of nucleic acids), and more recently to the synthesis of encoded synthetic polymers.
(1) Y. Erlich et D. Zielinski, Science, 355, 950, 2017.
(3) A. Al Ouahabi et al., ACS Marco Lett., 4, 1077, 2015.
(4) R. K. Roy et al., Nat. Commun., 6, 7237, 2015.
A pioneer in the field of controlled-sequence synthetic polymers, Jean-François Lutz is CNRS Director of Research at Strasbourg’s Institut Charles Sadron. His work has won many awards, namely from the European Research Council (ERC). Since 2015, the Thomson Reuters agency has listed him as one of the world’s most influential scientists.