Abstract
Genome-wide association studies (GWAS) have identified thousands of loci linked to hundreds of traits in many different species. However, for most loci, the causal genes and the cellular processes they contribute to remain unknown. This problem is especially pronounced in species where functional annotations are sparse. Given little information about a gene, patterns of expression are a powerful tool for inferring biological function. Here, we developed a computational framework called Camoco that integrates loci identified by GWAS with functional information derived from gene co-expression networks. We built co-expression networks from three distinct biological contexts and establish the precision of our method with simulated GWAS data. We applied Camoco to prioritize candidate genes from a large-scale GWAS examining the accumulation of 17 different elements in maize seeds, demonstrating the need to match GWAS datasets with co-expression networks derived from the appropriate biological context. Furthermore, our results show that simply taking the genes closest to significant GWAS loci will often lead to spurious results, indicating the need for proper functional modeling and a reliable null distribution when integrating these high-throughput data types. We performed functional validation on a gene identified by our approach using mutants and annotate other high-priority candidates with ontological enrichment and curated literature support, resulting in a targeted set of candidate genes that drive elemental accumulation in maize grain.