PT - JOURNAL ARTICLE AU - Jinny X. Zhang AU - John Z. Fang AU - Wei Duan AU - Lucia R. Wu AU - Angela W. Zhang AU - Neil Dalchau AU - Boyan Yordanov AU - Rasmus Petersen AU - Andrew Phillips AU - David Yu Zhang TI - Predicting DNA Hybridization Kinetics from Sequence AID - 10.1101/149427 DP - 2017 Jan 01 TA - bioRxiv PG - 149427 4099 - http://biorxiv.org/content/early/2017/06/13/149427.short 4100 - http://biorxiv.org/content/early/2017/06/13/149427.full AB - Hybridization is a key molecular process in biology and biotechnology, but to date there is no predictive model for accurately determining hybridization rate constants based on sequence information. To approach this problem systematically, we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (subsequences of the CYCS and VEGF genes) at temperatures ranging from 28 °C to 55 °C. Next, we rationally designed 38 features computable based on sequence, each feature individually correlated with hybridization kinetics. These features are used in our implementation of a weighted neighbor voting (WNV) algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants (a.k.a. labeled instances). Automated feature selection and weighting optimization resulted in a final 6-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 2 with ≈74% accuracy and within a factor of 3 with ≈92% accuracy, based on leave-one-out cross-validation. Predictive understanding of hybridization kinetics allows more efficient design of nucleic acid probes, for example in allowing sparse hybrid-capture panels to more quickly and economically enrich desired regions from genomic DNA.