ABSTRACT
Motivation Genomes of multicellular systems are compartmentalized and dynamically folded within the three-dimensional (3D) confines of the nucleus in order to facilitate gene regulation. Among the 3D-genome mapping technologies currently in use, droplet-based, barcode-linked sequencing (ChIA-Drop) has the unique capability to capture complex multi-way chromatin interactions at the single-molecule level. ChIA-Drop data gives rise to higher-order interaction networks in which nodes represent genomic fragments while (hyper)edges capture observed physical contacts. The problem of interest is to use this data to create a “dictionary” of interaction patterns (subnetworks) that accurately describe all global chromatin structures and associate dictionary elements with cellular functions.
Results To construct interpretable chromatin dictionaries, we introduce a new algorithm termed online convex network dictionary learning (online cvxNDL). Unlike classical dictionary learning for image or text processing, online cvxNDL uses special subgraph sampling methods and produces interpretable subnetwork representatives corresponding to “convex mixtures” of patterns observed in real data. To demonstrate the utility of the method, we perform an in-depth study of RNAPII-enriched ChIA-Drop data from Drosophila Melanogaster S2 cell lines. Our results are two-fold: First, we show that online cvxNDL allows for accurate reconstruction of the original interaction network data using only a collection of roughly 25 dictionary elements and their “representatives” directly observed in the data. Second, we identify collections of interaction patterns of chromatin elements shared by related processes on different chromosomes and those unique to certain chromosomes. This is accomplished through Gene Ontology (GO) enrichment analysis that allows us to associate dictionary element representatives with functional properties of their corresponding chromatin region and in the process, determine what we call the “span” and “density” of chromatin interaction patterns.
Availability and Implementation The code and dataset are available at: https://github.com/jianhao2016/online_cvxNDL/
Contact milenkov{at}illinois.edu
Competing Interest Statement
The work was supported by the National Science Foundation [1956384] and partially supported by National Science Foundation [2206296].