Abstract
Inferring the genome-scale gene co-expression network is important for understanding genetic architecture underlying the complex and various biological phenotypes. The recent availability of large-scale RNA-seq sequencing-data provides great potential for co-expression network inference. In this study, for the first time, we presented a novel heterogeneous ensemble pipeline integrating three frequently used inference methods, to build a high-quality RNA-seq-based Gene Co-expression Network (GCN) in rice, an important monocot species. The quality of the network obtained by our proposed method was first evaluated and verified with the curated positive and negative gene functional link datasets, which obviously outperformed each single method. Secondly, the powerful capability of this network for associating unknown genes with biological functions and agronomic traits was showed by enrichment analysis and case studies. Particularly, we demonstrated the potential applications of our proposed method to predict the biological roles of long non-coding RNA (lncRNA) and circular RNA (circRNA) genes. Our results provided a valuable data source for selecting candidate genes to further experimental validation during rice genetics research and breeding. To enhance identification of novel genes regulating important biological processes and agronomic traits in rice and other crop species, we released the source code of constructing high-quality RNA-seq-based GCN and rice RNA-seq-based GCN, which can be freely downloaded online at https://github.com/czllab/NetMiner.