PT - JOURNAL ARTICLE AU - Walter Basile AU - Oxana Sachenkova AU - Sara Light AU - Arne Elofsson TI - High GC Content Causes De Novo Created Proteins to be Intrinsically Disordered AID - 10.1101/070003 DP - 2016 Jan 01 TA - bioRxiv PG - 070003 4099 - http://biorxiv.org/content/early/2016/08/30/070003.short 4100 - http://biorxiv.org/content/early/2016/08/30/070003.full AB - De novo creation of protein coding genes involves formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. De novo created proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not cause aggregation. Therefore, although the creation of the short ORFs could be truly random, but the fixation should be of subject to some selective pressure. The selective forces acting on de novo created proteins have been elusive and contradictory results have been reported. In Drosophila they are more disordered, i.e. are enriched in polar residues, than ancient proteins, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed.To solve this riddle we studied structural properties and age of all proteins in 187 eukaryotic species. We find that, on average, there are small differences between proteins of different ages, with the exception that younger proteins are shorter. However, when we take the GC content into account we find that this can explain the opposite trends observed in yeast (low GC) and drosophila (high GC). GC content is correlated with codons coding for disorder-promoting amino acids, and inversely correlated with transmembrane, helix and sheet promoting residues. We find that for the youngest proteins, i.e. the ones that are most likely to be de novo created, there exists a strong correlation with GC and structural properties. In contrast, this strong relationship is not seen for ancient proteins. This leads us to propose that structural features are not a strong determining factor for fixation of de novo created genes. Instead these proteins resemble random proteins given a particular GC level. The dependency on GC content is then gradually weakened during evolution.Author Summary We show that the GC content of a genomic area is of great importance for the properties of a protein-coding de novo created gene. The GC content affects the frequency of the codons and this affects the probability for each amino acid to be included in a de novo created protein. The codons encoding for Ala, Pro and Glu contain 80% GC, while codons for Lys, Phe, Asn, Tyr and Ile contain 20% or less. Pro and Gly are disorder-promoting, while Phe, Tyr and Ile are order-promoting. Therefore random protein sequences at a high GC will be more disordered than the ones created at a low GC. The structural properties of the youngest (orphan) proteins match to a large degree the properties of random proteins when the GC content is taken into account. In contrast structural properties of ancient proteins only show a weak correlation with GC content. This suggests that even after fixation of de novo created proteins largely resemble random proteins given a certain GC content. Thereafter, during evolution the correlation between structural properties and GC weakens.