Stephen K. Jones

Vilnius University

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Ilya J. Finkelstein

The University of Texas at Austin

John A. Hawkins

University of California, Davis

William H. Press

The University of Texas at Austin

Constantijn van der Smagt

Vrije Universiteit Amsterdam

Jeffrey M. Schaub

Abbott (United States)

Misha Klein

Delft University of Technology

Behrouz Eslami-Mossallam

Delft University of Technology

Martin Depken

Delft University of Technology

Koen van der Sanden

Netherlands Organisation for Applied Scientific Research

Cooperative Institutions

Delft University of Technology

The University of Texas at Austin

Vrije Universiteit Amsterdam

Leiden University

Abbott (United States)

Abbott Fund

Cayman Chemical (United States)

Patara Pharma (United States)

University of Michigan–Ann Arbor

Center for Systems Biology

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

HEDGES error-correcting code for DNA storage corrects indels and allows sequence constraints

Proceedings of the National Academy of Sciences (2020)

William H. Press John A. Hawkins Stephen K. Jones Jeffrey M. Schaub Ilya J. Finkelstein

Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.

Indel

Code (set theory)

DNA Computing

GC-content

10.1073/pnas.2004821117

Cite

Citations (115)

Indel-correcting DNA barcodes for high-throughput sequencing

Proceedings of the National Academy of Sciences (2018)

John A. Hawkins Stephen K. Jones Ilya J. Finkelstein William H. Press

Many large-scale, high-throughput experiments use DNA barcodes, short DNA sequences prepended to DNA libraries, for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely used error-correcting codes borrowed from computer science (e.g., Hamming, Levenshtein codes) do not properly account for insertions and deletions (indels) in DNA barcodes, even though deletions are the most common type of synthesis error. Here, we present and experimentally validate filled/truncated right end edit (FREE) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced guanine-cytosine (GC) content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error correction levels that may be useful in diverse high-throughput applications, including >106 single-error-correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with >1015 error-correcting barcodes. The included software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community.

Indel

10.1073/pnas.1802640115

Cite

Citations (68)

A kinetic model predicts SpCas9 activity, improves off-target classification, and reveals the physical basis of targeting fidelity

Nature Communications (2022)

Behrouz Eslami-Mossallam Misha Klein Constantijn van der Smagt Koen van der Sanden Stephen K. Jones

Abstract The S. pyogenes (Sp) Cas9 endonuclease is an important gene-editing tool. Sp Cas9 is directed to target sites based on complementarity to a complexed single-guide RNA (sgRNA). However, Sp Cas9-sgRNA also binds and cleaves genomic off-targets with only partial complementarity. To date, we lack the ability to predict cleavage and binding activity quantitatively, and rely on binary classification schemes to identify strong off-targets. We report a quantitative kinetic model that captures the Sp Cas9-mediated strand-replacement reaction in free-energy terms. The model predicts binding and cleavage activity as a function of time, target, and experimental conditions. Trained and validated on high-throughput bulk-biochemical data, our model predicts the intermediate R-loop state recently observed in single-molecule experiments, as well as the associated conversion rates. Finally, we show that our quantitative activity predictor can be reduced to a binary off-target classifier that outperforms the established state-of-the-art. Our approach is extensible, and can characterize any CRISPR-Cas nuclease – benchmarking natural and future high-fidelity variants against Sp Cas9; elucidating determinants of CRISPR fidelity; and revealing pathways to increased specificity and efficiency in engineered systems.

Guide RNA

Nuclease

Complementarity (molecular biology)

10.1038/s41467-022-28994-2

Cite

Citations (29)