Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.
Stabilizing the prefusion SARS-CoV-2 spike The development of therapeutic antibodies and vaccines against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is focused on the spike (S) protein that decorates the viral surface. A version of the spike ectodomain that includes two proline substitutions (S-2P) and stabilizes the prefusion conformation has been used to determine high-resolution structures. However, even S-2P is unstable and difficult to produce in mammalian cells. Hsieh et al. characterized many individual and combined structure-guided substitutions and identified a variant, named HexaPro, that retains the prefusion conformation but shows higher expression than S-2P and can also withstand heating and freezing. This version of the protein is likely to be useful in the development of vaccines and diagnostics. Science , this issue p. 1501