Single-molecule studies of protein-nucleic acid interactions frequently require site-specific modification of long DNA substrates. The bacteriophage λ is a convenient source of high quality long (48.5 kb) DNA. However, introducing specific sequences, tertiary structures, and chemical modifications into λ-DNA remains technically challenging. Most current approaches rely on multi-step ligations with low yields and incomplete products. Here, we describe a molecular toolkit for rapid preparation of modified λ-DNA. A set of PCR cassettes facilitates the introduction of recombinant DNA sequences into the λ-phage genome with 90-100% yield. Extrahelical structures and chemical modifications can be inserted at user-defined sites via an improved nicking enzyme-based strategy. As a proof-of-principle, we explore the interactions of S. cerevisiae Proliferating Cell Nuclear Antigen (yPCNA) with modified DNA sequences and structures incorporated within λ-DNA. Our results demonstrate that S. cerevisiae Replication Factor C (yRFC) can load yPCNA onto 5'-ssDNA flaps, (CAG)
Synthetic DNA is rapidly emerging as a durable, high-density information storage platform. A major challenge for DNA-based information encoding strategies is the high rate of errors that arise during DNA synthesis and sequencing. Here, we describe the HEDGES (Hash Encoded, Decoded by Greedy Exhaustive Search) error-correcting code that repairs all three basic types of DNA errors: insertions, deletions, and substitutions. HEDGES also converts unresolved or compound errors into substitutions, restoring synchronization for correction via a standard Reed-Solomon outer code that is interleaved across strands. Moreover, HEDGES can incorporate a broad class of user-defined sequence constraints, such as avoiding excess repeats, or too high or too low windowed guanine-cytosine (GC) content. We test our code both via in silico simulations and with synthesized DNA. From its measured performance, we develop a statistical model applicable to much larger datasets. Predicted performance indicates the possibility of error-free recovery of petabyte- and exabyte-scale data from DNA degraded with as much as 10% errors. As the cost of DNA synthesis and sequencing continues to drop, we anticipate that HEDGES will find applications in large-scale error-free information encoding.
Stabilizing the prefusion SARS-CoV-2 spike The development of therapeutic antibodies and vaccines against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is focused on the spike (S) protein that decorates the viral surface. A version of the spike ectodomain that includes two proline substitutions (S-2P) and stabilizes the prefusion conformation has been used to determine high-resolution structures. However, even S-2P is unstable and difficult to produce in mammalian cells. Hsieh et al. characterized many individual and combined structure-guided substitutions and identified a variant, named HexaPro, that retains the prefusion conformation but shows higher expression than S-2P and can also withstand heating and freezing. This version of the protein is likely to be useful in the development of vaccines and diagnostics. Science , this issue p. 1501
Cohesin is a chromosome-bound, multisubunit adenosine triphosphatase complex. After loading onto chromosomes, it generates loops to regulate chromosome functions. It has been suggested that cohesin organizes the genome through loop extrusion, but direct evidence is lacking. Here, we used single-molecule imaging to show that the recombinant human cohesin-NIPBL complex compacts both naked and nucleosome-bound DNA by extruding DNA loops. DNA compaction by cohesin requires adenosine triphosphate (ATP) hydrolysis and is force sensitive. This compaction is processive over tens of kilobases at an average rate of 0.5 kilobases per second. Compaction of double-tethered DNA suggests that a cohesin dimer extrudes DNA loops bidirectionally. Our results establish cohesin-NIPBL as an ATP-driven molecular machine capable of loop extrusion.
Yeast Rad1-Rad10 (XPF-ERCC1 in mammals) incises UV, oxidation, and cross-linking agent-induced DNA lesions, and contributes to multiple DNA repair pathways. To determine how Rad1-Rad10 catalyzes inter-strand crosslink repair (ICLR), we examined sensitivity to ICLs from yeast deleted for SAW1 and SLX4, which encode proteins that interact physically with Rad1-Rad10 and bind stalled replication forks. Saw1, Slx1, and Slx4 are critical for replication-coupled ICLR in mus81 deficient cells. Two rad1 mutations that disrupt interactions between Rpa1 and Rad1-Rad10 selectively disable non-nucleotide excision repair (NER) function, but retain UV lesion repair. Mutations in the analogous region of XPF also compromised XPF interactions with Rpa1 and Slx4, and are proficient in NER but deficient in ICLR and direct repeat recombination. We propose that Rad1-Rad10 makes distinct contributions to ICLR depending on cell cycle phase: in G1, Rad1-Rad10 removes ICL via NER, whereas in S/G2, Rad1-Rad10 facilitates NER-independent replication-coupled ICLR.
Neuroglobin (Ngb), a protein in the globin family, is found in vertebrate brains. It binds oxygen reversibly. Compared with myoglobin (Mb), the amino acid sequence has limited similarity, but key residues around the heme and the classical globin fold are conserved in Ngb. The CO adduct of Ngb displays two CO absorption bands in the IR spectrum, referred to as N 3 (distal histidine in the pocket) and N 0 (distal histidine swung out of the pocket), which have absorption spectra that are almost identical with the Mb mutants L29F and H64V, respectively. The Mb mutants mimic the heme pocket structures of the corresponding Ngb conformers. The equilibrium protein dynamics for the CO adduct of Ngb are investigated by using ultrafast 2D-IR vibrational echo spectroscopy by observing the CO vibration's spectral diffusion (2D-IR spectra time dependence) and comparing the results with those for the Mb mutants. Although the heme pocket structure and the CO FTIR peak positions of Ngb are similar to those of the mutant Mb proteins, the 2D-IR results demonstrate that the fast structural fluctuations of Ngb are significantly slower than those of the mutant Mbs. The results may also provide some insights into the nature of the energy landscape in the vicinity of the folded protein free energy minimum.
Enzyme structural dynamics play a pivotal role in substrate binding and biological function, but the influence of substrate binding on enzyme dynamics has not been examined on fast time scales. In this work, picosecond dynamics of horseradish peroxidase (HRP) isoenzyme C in the free form and when ligated to a variety of small organic molecule substrates is studied by using 2D-IR vibrational echo spectroscopy. Carbon monoxide bound at the heme active site of HRP serves as a spectroscopic marker that is sensitive to the structural dynamics of the protein. In the free form, HRP assumes two distinct spectroscopic conformations that undergo fluctuations on a tens-of-picoseconds time scale. After substrate binding, HRP is locked into a single conformation that exhibits reduced amplitudes and slower time-scale structural dynamics. The decrease in carbon monoxide frequency fluctuations is attributed to reduced dynamic freedom of the distal histidine and the distal arginine, which are key residues in modulating substrate binding affinity. It is suggested that dynamic quenching caused by substrate binding can cause the protein to be locked into a conformation suitable for downstream steps in the enzymatic cycle of HRP.
Many large-scale, high-throughput experiments use DNA barcodes, short DNA sequences prepended to DNA libraries, for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely used error-correcting codes borrowed from computer science (e.g., Hamming, Levenshtein codes) do not properly account for insertions and deletions (indels) in DNA barcodes, even though deletions are the most common type of synthesis error. Here, we present and experimentally validate filled/truncated right end edit (FREE) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced guanine-cytosine (GC) content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error correction levels that may be useful in diverse high-throughput applications, including >106 single-error-correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with >1015 error-correcting barcodes. The included software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community.
Significance Exonuclease 1 (Exo1) is a conserved eukaryotic nuclease that participates in DNA repair and telomere maintenance. Here we use high-throughput single-molecule imaging to examine Exo1 activity on DNA and in the presence of single-stranded DNA binding proteins. We report that both human and yeast Exo1 are processive nucleases but are rapidly turned over by replication protein A (RPA). In the presence of RPA, Exo1 retains limited DNA-processing activity, albeit via a distributive binding mechanism. This rapid turnover by RPA can appear stimulatory or inhibitory in gel-based assays, clarifying conflicting results in the existing literature. RPA-depleted human cells show elevated Exo1 loading but reduced overall DNA resection, underscoring the many roles of RPA in regulating DNA resection in vivo.