ABSTRACT
The birth of genes that encode new protein sequences is a major source of evolutionary innovation. However, we still understand relatively little about how these genes come into being and which functions they are selected for. To address these questions we have obtained a large collection of mammalian-specific gene families that lack homologues in other eukaryotic groups. We have combined gene annotations and de novo transcript assemblies from 30 different mamalian species, obtaining about 6,000 gene families. In general, the proteins in mammalian-specific gene families tend to be short and depleted in aromatic and negatively charged residues. Proteins which arose early in mammalian evolution include milk and skin polypeptides, immune response components, and proteins involved in reproduction. In contrast, the functions of proteins which have a more recent origin remain largely unexplored, despite the fact that these proteins also have extensive proteomics support. We identify several previously described cases of genes originated de novo from non-coding genomic regions, supporting the idea that this mechanism frequently underlies the evolution of novel protein-coding genes in mammals. Interestingly, we find that both young and basal mammalian-specific gene families show similar tissue-specific gene expression biases, with a marked enrichment in testis. This, together with the observed enrichment in genes involved in spermatogenesis and sperm motility, is consistent with a predominant role of sexual selection in the emergence of new genes in mammals.