RT Journal Article SR Electronic T1 A Common Class of Transcripts with 5’-Intron Depletion, Distinct Early Coding Sequence Features, and N1-Methyladenosine Modification JF bioRxiv FD Cold Spring Harbor Laboratory SP 057455 DO 10.1101/057455 A1 Can Cenik A1 Hon Nian Chua A1 Guramrit Singh A1 Abdalla Akef A1 Michael P Snyder A1 Alexander F Palazzo A1 Melissa J Moore A1 Frederick P Roth YR 2016 UL http://biorxiv.org/content/early/2016/06/06/057455.abstract AB Introns are found in 5’ untranslated regions (5’UTRs) for35% of all human transcripts. These 5’UTR introns are not randomly distributed: genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5’UTR introns tend to harbor specific RNA sequence elements in their early coding regions. Here we explored the connection between coding-region sequence and 5’UTR intron status. We developed a classifier modeling this relationship that can predict 5’UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5’ proximal-intron-minus-like-coding regions (“5IM” transcripts).Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. Members of this transcript class tend to exhibit differentialRNA expression across many conditions. The 5IM class of transcripts is enrichedfor non-AUG start codons, more extensive secondary structure preceding thestart codon, and near the 5’cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts arebound by the Exon Junction Complex (EJC) at non-canonical 5’ proximal positions. Finally, N1-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses reveal the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5’ proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N1-methyladenosines in the early coding region, and enrichment for non-canonical binding by the Exon Junction Complex.