ABSTRACT
Transcription factors (TF) are characterized by certain DNA binding-domains (DBD) which regulate their binding specificity, and thus their ability to effect a change on gene expression of their downstream targets. TFs are central to organismal development, and morphology; therefore, they potentially are instrumental in producing phenotypic diversity. We measured TF abundance of 49 major TF DBD families in 48 bird genomes, which we then compared with 5 reptile genomes, in an effort to assess the degree to which TF DBD are potentially connected to increased phenotypic diversity in the avian lineage. We hypothesized that there would be increased TF DBD abundance in multiple TF families correlated with the increased phenotypic diversity found in birds; instead ultimately, we see a general loss of major TF DBD families, reflecting general genome reductions seen between reptiles and birds, with largest losses in TF DBD families associated with multiple developmental (feather, sex-determination, body-plan, immune, blood) and metabolic processes.
INTRODUCTION
Transcription factors (TFs) are proteins that bind to DNA in a sequence-specific manner and enhance or repress gene expression. In response to a broad range of stimuli, TFs coordinate the regulation of gene expression of essential for defining morphology, functional capacity, and developmental fate at the cellular level. Although, transcription factor binding domains are very well conserved, the other associated domains, largely responsible for protein-protein interactions, readily diverge among homologs. Therefore, the structure and function of transcription factors are inherently modular. This attribute is thought to allow gene-regulatory networks to evolve via transcription factor changes (Wray 2007; Jarvela & Hinman 2015), and could account for the seemingly large phenotypic difference between closely related groups (Liu et al. 2014; Nadimpalli et al. 2015).
A long-standing question has been whether changes in gene regulation or protein sequence have made a larger contribution to phenotypic diversity seen between species (Britten & Davidson 1969; King & Wilson 1975). Now, it is understood that changes in cis-regulatory systems more often underlie the evolution of morphological diversity than gene duplication/loss or protein function (Levine & Tjian 2003; Carroll 2008; Wittkopp & Kalay 2011). These cis-regulatory elements (CREs) typically regulate gene transcription by functioning as binding sites for transcription factors. However, another avenue would be through whole-scale changes in transcription factor function through changes in domain modularity, either through their DNA-binding domains (DBD) and/or through other domains usually involved in protein-protein interactions (Wagner & Lynch 2008, 2010; Schmitz et al. 2016), otherwise known as trans-regulatory elements. Ultimately, phenotypic variation from individual organisms to broad groups has been attributed to a combination changes associated with cis‐ and trans-regulatory elements (Schmitz et al. 2016).
Although transcription factor diversity has been correlated with increased “complexity” across the eukaryotic lineage (Charoensawan et al. 2010; de Mendoza et al. 2013; Lehti-Shiu et al. 2016), no study has measured such transcription factor diversity within a specious, but highly related, animal clade. However, recently, forty-eight avian genomes representing all the major families of birds (Aves) have recently been published (OBrien et al. 2014; Zhang et al. 2014a; Eöry et al. 2015), providing a unique opportunity to do just that. Birds represent one of the most diverse vertebrate lineages, and has the distinction of being the tetrapod class with the most living species, with half of them being Passerines (over 10,000 species) (Gill & Wright 2006). Not only do birds live worldwide and range in size (5 cm - 2.75 m), but also vary widely in morphology, physiology and behavior, and have unique features (ie. feathers). From a genomic standpoint, these organisms have relatively low rates of gene gain/loss in gene families and have similarly sized genomes (0.91-1.3 Gb) (Zhang et al. 2014b).
Therefore, in this study, we aimed to characterize the major metazoan TF families by their major DNA-binding Domains (DBD) in the 48 avian and 5 reptile genomes. This is the first study to analyze the evolutionary history and phylogenetic distribution of transcription factors in the diverse genomes available for avian group, and the closely related reptile lineage. We hypothesized that there would be increased TF DBD abundance in multiple TF families, correlated with the increased phenotypic diversity found in birds; however, we see no such evidence, and instead see wholescale loss within major TF DBD families, potentially reflecting general genome reductions seen between reptiles and birds. The largest losses in TF DBD families associated with multiple developmental (feather, sex-determination, body-plan, immune, blood) and metabolic processes.
MATERIALS AND METHODS
TF DBD Identification
We obtained data on complete genomes from publicly available databases for birds (http://avian.genomics.cn/en/jsp/database.shtml) and reptiles (http://crocgenomes.org/). A PfamScan was performed on the protein models, using a custom database containing the major DBD families, and selecting the gathering threshold option as a conservative approach, which can underestimate total counts for some domains but minimizes false positives (Eddy 2011). We looked for the presence major DBDs, which were TFs were selected based on previous studies (de Mendoza et al. 2013). For the major DBDs, in all cases, we defined a one-to-one relationship between TF class and DBD class (ie. Non-duplicates). In cases in which two or more DBDs were found in the same gene, those were relegated to a separate list, compared to those who showed only one DBD. DBDs that appeared just in combination with other DBDs (ie. Duplicates) were analyzed separately, to avoid an overestimation of TF numbers, due to problems detecting repeated domains (de Mendoza et al. 2013). We counted the number of genes/proteins containing a given DBD, and the number of different associated domain architectures associated with each DBD in each species, via custom macros.
Transcription Factor DBD enrichment and characterization
We tested for enrichment of TF numbers using a Mann-Whitney U test, with a significance threshold of P < 0.01, as performed by de Mendoza et al. (2013). The protein sequences were then filtered to include sequences of the DBD that were significantly different between reptiles and birds; these sequences were then subjected to a BLAST search, mapped, annotated, and analyzed using Blast2GO basic version (Conesa et al. 2005). Gene Ontology (GO) term maps were created for biological processes, cellular and molecular functions, with a threshold of 10% (ie. 1,500).
RESULTS
Transcription Factor DBD Identification
Across the 49 different DNA binding domains, a total of 34,318 non-duplicate and 19,668 duplicate proteins were identified (SUPP Tables). In order to ascertain which families may have experienced family expansion or contraction, comparisons were made against the 5 reptile genomes (crocodile, alligator, gharial, green sea turtle, and soft-shelled turtle). In the non-duplicate group, of the 49 DBD, 21 of them were significantly different (MWU, P > 0.01; Table 1). A majority of these families experienced a reduction in DBD presence, with an average decrease of 51.14%, while only 2 families showed an increase (Homeobox_KN, and HTH_psq), though this increase was substantial (~393%). In the duplicate group, only 11 showed a significant difference (Table 2), also showing a general reduction in DBD presence. In this category, all families decreased in number (~60%), with evidence of complete loss of members with certain DND combinations (Forkhead, HMG box, MH1, and Tub). Most of these DBD families are not significantly different in proteins with only one representative DBD (Table 1).
Transcription Factor DBD Characterization
We tried to assess any additional commonalities of transcription factor families (beyond DNA binding, and general gene expression) that underwent a change between reptiles and birds, using gene ontology (GO) terms associated with the protein sequences. This was performed on the 21 TF DBD families (see previous section; Table 1). This resulted in additional characterization of 21,463 proteins across the 47 birds and 5 reptiles. The GO families associated with the Cellular and Molecular functions were variations on DNA-binding, and Cell parts/Nucleus, while the Biological Processes were those associated with various metabolic (nitrogen, macromolecule, nucleic acid, protein) and developmental processes (at Level 3). Of those GO term classes pertaining to development, the specific were showed the most common terms were those associated with skeletal development, immune system development, circulatory system development, hematopoietic/blood development, embryonic morphogenesis, nervous system development, and animal organ development (Figure 1).
DISCUSSION
The results of this analysis suggest there was an overall major reduction in transcription factor families across the avian lineage, and did not show any major instances of substantial expansions associated with any specific DBD family. Although avian and reptile genomes are generally similarly sized at present, a reduction in genome size between reptiles and birds did occur in the saurischian dinosaur lineage between 230 and 250 million years ago (Organ et al. 2007). This coincided with a major reduction in repetitive elements, intron size, and even whole-scale loss of syntenic protein coding regions, typically attributed to the general metabolic requirements for flight (Organ & Shedlock 2009; Zhang & Edwards 2012; Lovell et al. 2014; Wright et al. 2014; Zhang et al. 2014b); thus, the reduction of TF DBD families seen in these results, potentially mirror the general genome reduction seen in other studies. Despite this fact, it is somewhat surprising to not see any particular instances of TF DBD family expansions, since increases in the number of regulatory proteins, including TFs, have frequently been connected to phenotypic innovations (Miyata & Suga 2001; Levine & Tjian 2003; Kusserow et al. 2005; Schmitz et al. 2016).
Nonetheless, such a pattern suggests that TF DBD families are not correlated with avian diversification, and can possibly be ascribed to other factors, such as protein-coding family duplications, or cis-regulatory changes, instead (Zhang et al. 2014b; Seki et al. 2017). Duplications of protein-coding gene families are known to play a major role in species evolution: redundancy provides a medium for novelty while maintaining initial function (Lynch & Conery 2000; Zhang 2003). However, another possibility is that the absence of an increase TF DBD between reptiles and birds may instead suggest changes in TF modularity, appearing through increased interconnectivity/occurrence of DBD and domains involved in protein-protein interactions. Thus, surveying domains associated with protein-protein interactions could potentially reveal an additional facet to how TF families may have evolved during the reptile-bird transition, in conjunction with a reduction in genome size; an increase or an enrichment in these domains would hint at a scenario of TF families being re-organized, or families functions are being specialized, thus altering their expression profiles or binding properties, affecting the expression of many target genes, often with a major functional impact (Lespinet et al. 2002).
It is interesting to note that the TF DBD families who experienced the sharpest declines were ones associated with heart development (T-box), vocal learning (Forkhead), feather formation (Ets), wing development (HMG box), sex determination (HMG box, DM), immune function (HMG box, IRF, STAT bind, RHD), and aspects of blood (Runt, GATA) (Table 3). All of these aspects have been subjected to major physiological/developmental changes between reptiles and birds (Brusatte et al. 2015; Chatterjee 2015), especially the development of feathers from scales (Chuong et al. 2000), sex-determination through chromosomal differences rather than temperature (Sarre et al. 2004), and even changes in immune system functionality (Zimmerman et al. 2010). Although, it is difficult to speculate how reductions in these major families may be associated with such changes seen between avian and reptile lineages.
CONCLUSTIONS
Ultimately, these results represent the first foray into TF DBD characterization between the avian and reptile genomes. In addition, the results of these analyses strengthen the notion that cis-regulatory regions and protein-coding gene families are behind much of the extant avian diversification. Still, whole-scale reductions in TF DBD families in the genome likely posed a significant hurdle, unless these families were comprised of multiple members that were functionally redundant. Overall, the results of this analyses represent a broad characterization of TF DBD family composition in birds, thus the specific composition of TF families should be probed further, especially those with the largest reductions seen in this study. In addition, whether non-major TF DBD families have also seen a general reduction requires future analysis, as does the composition of domains associated with protein-protein interactions.
Author contributions
AMG and JSP conceived, designed the study, and performed the data-collection. AMG wrote the manuscript. All authors discussed the results and implications, and commented on the manuscript.
Supplemental Information
SUPP Tables 1 - 53: Individual TF DBD output tables for each of the bird and reptile species used.
Acknowledgements
We would like to acknowledge our PhD advisors (Kevin McCracken and William Browne) who allowed us to explore other topics and work on side-projects, at the same time as our thesis work.