RT Journal Article SR Electronic T1 Predicting DNA Hybridization Kinetics from Sequence JF bioRxiv FD Cold Spring Harbor Laboratory SP 149427 DO 10.1101/149427 A1 Jinny X. Zhang A1 John Z. Fang A1 Wei Duan A1 Lucia R. Wu A1 Angela W. Zhang A1 Neil Dalchau A1 Boyan Yordanov A1 Rasmus Petersen A1 Andrew Phillips A1 David Yu Zhang YR 2017 UL http://biorxiv.org/content/early/2017/06/13/149427.abstract AB Hybridization is a key molecular process in biology and biotechnology, but to date there is no predictive model for accurately determining hybridization rate constants based on sequence information. To approach this problem systematically, we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (subsequences of the CYCS and VEGF genes) at temperatures ranging from 28 °C to 55 °C. Next, we rationally designed 38 features computable based on sequence, each feature individually correlated with hybridization kinetics. These features are used in our implementation of a weighted neighbor voting (WNV) algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants (a.k.a. labeled instances). Automated feature selection and weighting optimization resulted in a final 6-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 2 with ≈74% accuracy and within a factor of 3 with ≈92% accuracy, based on leave-one-out cross-validation. Predictive understanding of hybridization kinetics allows more efficient design of nucleic acid probes, for example in allowing sparse hybrid-capture panels to more quickly and economically enrich desired regions from genomic DNA.