Manufacturing Bits: Aug. 25

DNA storage for TV shows; DNA error correction.


DNA storage for TV shows
DNA storage was a hot topic about two or so years ago, but it’s been somewhat quiet since that time.

DNA storage is back in the news. For the first time, Twist Bioscience has stored an episode of a new Netflix show using its synthetic DNA technology. The new show, BIOHACKERS, is a new six-part biotech thriller.

Deoxyribonucleic acid (DNA) is a molecule that carries the genetic instructions in living things. A complete set of DNA is called the genome.

DNA is also a promising archival storage technology. DNA doesn’t degrade over time and is compact. So, it could be used to store a massive amount of data in a tiny space over a long period of time.

In computers, discrete units are stored as “0s” and “1s” as binary code. In contrast, DNA molecules encode information with sequences of discrete units. In DNA molecules, the units involve four distinct nucleotide bases: adenine (A), cytosine (C), guanine (G) and thymine (T).

To store data in DNA, first, a data file is converted from its digital sequence of 0’s and 1’s into a DNA sequence of A’s, C’s, T’s and G’s; for example, 00 = A, 01 = C, 10 = G and 11 = T,” according to Twist Bioscience. “Twist Bioscience then encodes the DNA data file into short segments of DNA (200 to 300 bases long) that can be synthesized (‘written’) and stored. In addition to storing part of the data file, each short segment contains an index to indicate its place within the overall data file. To retrieve the data, the segments are sequenced (‘read’) and then decoded back into the original file. One feature of the indexing system is it allows part of the file to be biologically recovered (‘random access’) before sequencing, so only data of interest is sequenced. And, all data is recovered error-free because error-correcting algorithms are used during the encode/decode process.”

Today, Twist manufactures more than one million small pieces of DNA on a single silicon chip using semiconductor technology. It is now working toward the next generation of silicon chip that will allow the company to synthesize or write 10 gigabytes of DNA on each silicon chip.

Meanwhile, scientists at ETH Zurich encoded the first episode of BIOHACKERS from 1s and 0s into a sequence of the four nucleic bases. This code is then built, base by base into strands of synthetic DNA by Twist to store the series for thousands of years.

“DNA is an incredible molecule that, by its very nature, provides ultra high density storage for thousands of years. In fact, the DNA contained within all cells in a human body could store all the movies created to date in the 21st century three billion times over. That, indeed, illustrates the magic of bringing biology and technology together to create synthetic (inert) DNA,” said Emily Leproust, CEO of Twist.

“Many important documents, music and videos have been encoded and stored in DNA, but this is the first time a leading entertainment provider has embraced the vast possibilities of DNA from imagination to storage,” Leproust said. “It’s exciting to ground the fictional series, which expounds beyond the boundaries of what is possible with DNA today, with the reality of preserving groundbreaking cultural media in synthetic DNA. The ability to store digital data in DNA seems futuristic, but the future is now.”

DNA error correction
The University of Texas at Austin has developed a technology that promises to solve a major issue in DNA storage—it is prone to errors.

DNA storage is promising. A one milliliter droplet of DNA could store the same amount of information as two Walmarts full of data servers, according to researchers.

The problem? “A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing,” said Bill Press, a professor from the University of Texas, in the Proceedings of the National Academy of Sciences. Others contributed to the work.

In response, researchers have developed HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search), an error-correcting code that repairs all three basic types of DNA errors–insertions, deletions, and substitutions, according to the University of Texas.

“HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed–Solomon outer code that is interleaved across strands,” Press said. “Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine–cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.”

Leave a Reply

(Note: This name will be displayed publicly)