Abstract
Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from ChIP-Seq. We exploit the idea that chromosomes encode a one-dimensional sequence of chromatin structural types. Interactions between these chromatin types determine the three-dimensional (3D) structural ensemble of chromosomes through a process similar to phase separation. First, a recurrent neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization (MiChroM) in order to generate an ensemble of 3D chromosome conformations. After training the model, dubbed MEGABASE (Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles), on odd numbered chromosomes, we predict the chromatin type sequences and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps as well as distances measured using 3D FISH experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible.