RT Journal Article SR Electronic T1 Introns structure patterns of variation in nucleotide composition in Arabidopsis thaliana and rice protein-coding genes JF bioRxiv FD Cold Spring Harbor Laboratory SP 010819 DO 10.1101/010819 A1 Adrienne Ressayre A1 Sylvain Glémin A1 Pierre Montalent A1 Laurana Serre-Giardi A1 Christine Dillmann A1 Johann Joets YR 2014 UL http://biorxiv.org/content/early/2014/10/28/010819.abstract AB Plant genomes are large, intron-rich and present a wide range of variation in coding region G + C content. Concerning coding regions, a sort of syndrome can be described in plants: the increase in G + C content is associated with both the increase in heterogeneity among genes within a genome and the increase in variation across genes. Taking advantage of the large number of genes composing plant genomes and the wide range of variation in gene intron number, we performed a comprehensive survey of the patterns of variation in G + C content at different scales from the nucleotide level to the genome scale in two species Arabidopsis thaliana and Oryza sativa, comparing the patterns in genes with different intron numbers. In both species, we observed a pervasive effect of gene intron number and location along genes on G + C content, codon and amino acid frequencies suggesting that in both species, introns have a barrier effect structuring G + C content along genes. In external gene regions (located upstream first or downstream last intron), species-specific factors are shaping G + C content while in internal gene regions (surrounded by introns), G + C content is constrained to remain within a range common to both species. In rice, introns appear as a major determinant of gene G + C content while in A. thaliana introns have a weaker but significant effect. The structuring effect of introns in both species is susceptible to explain the G + C content syndrome observed in plants.