TY - JOUR T1 - Exploratory bioinformatics analysis reveals importance of “junk” DNA in early embryo development JF - bioRxiv DO - 10.1101/079921 SP - 079921 AU - Steven Xijin Ge Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/10/09/079921.abstract N2 - Background Instead of testing predefined hypotheses, the goal of exploratory data analysis (EDA) is to find what data can tell us. Following this strategy, we re-analyzed a large body of genomic data to investigate how the early mouse embryos develop from fertilized eggs through a complex, poorly understood process.Results Starting with a single-cell RNA-seq dataset of 259 mouse embryonic cells from zygote to blastocyst stages, we reconstructed the temporal and spatial dynamics of gene expression. Our analyses revealed similarities in the expression patterns of regular genes and those of retrotransposons, and the enrichment of transposable elements in the promoters of corresponding genes. Long Terminal Repeats (LTRs) are associated with transient, strong induction of many nearby genes at the 2-4 cell stages, probably by providing binding sites for Obox and other homeobox factors. The presence of B1 and B2 SINEs (Short Interspersed Nuclear Elements) in promoters is highly correlated with broad upregulation of intracellular genes in a dosage-and distance-dependent manner. Such enhancer-like effects are also found for human Alu and bovine tRNA SINEs. Promoters for genes specifically expressed in embryonic stem cells (ESCs) are rich in B1 and B2 SINEs, but low in CpG islands.Conclusions Our results provide evidence that transposable elements may play a significant role in establishing the expression landscape in early embryos and stem cells. This study also demonstrates that open-ended, exploratory analysis aimed at a broad understanding of a complex process can pinpoint specific mechanisms for further study.Major findingSingle-cell RNA-seq data enables estimation of retrotransposon expression during PDSimilar expression dynamics of retrotransposons and regular genes during PDLong terminal repeats may be essential for the 1st wave of gene expressionObox homeobox factors are possible regulators of PD, upstream of Zscan4SINE repeats predict expression of nearby genes in murine, human and bovine embryosExploratory analysis of large single-cell data pinpoints developmental pathways ER -