Abstract
Motivation With the increasing availability of genome-wide genetic data, methods to combine genetic variables with other sources of data in statistical models are required. This paper introduces quantitative genetic scoring (QGS), a dimensionality reduction method to create quantitative genetic variables representing arbitrary genetic regions.
Methods QGS is defined as the sum of absolute differences in the genetic sequence between a subject and a reference population. QGS properties such as distribution and sensitivity to region size were examined, and QGS was tested in six different existing genomic data sets of various sizes and various phenotypes.
Results QGS can reduce genetic information by >98% yet explain phenotypic variance at low, medium, and high level of granularity. Associations based on QGS are independent of both size and linkage disequilibrium structure of the underlying region. In combination with stability selection, QGS finds significant results where a traditional genome-wide association approaches struggle. In conclusion, QGS preserves phenotypically significant genetic variance while reducing dimensionality, allowing researchers to include quantitative genetic information in any type of statistical analysis.
Availability https://github.com/machine2learn/QGS
Contact gido.schoenmacker{at}radboudumc.nl
Supplemental information Supplemental data are available online.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
HapMap3 sample size and variant correction Figures incorporated in main text