Abstract
In attempting to align divergent homologs of a conserved developmental enhancer, a flaw in the homology concept embedded in gapped alignment (GA) was discovered. To correct this flaw, we developed a methodological approach called maximal homology alignment (MHA). The goal of MHA is to rescue internal microparalogy of biological sequences rather than to insert a pattern of gaps (null characters), which transform homologous sequences into strings of uniform size (1-dimensional lengths). The core operation in MHA is the “cinch”, whereby inferred tandem microparalogy is represented in multiple rows across the same span of alignment columns. Thus, MHAs have a second (vertical) paralogy dimension, which re-categorizes most indel mutations as replication slippage and attenuates the indel problem. Furthermore, internally-cinched, inferred microparalogy in a self-MHA can later be relaxed to restore uniformity to 2-dimensional widths in a multiple sequence alignment. This de-cinching operation is used as a first resort before artificial null characters are used. We implement MHA in a program called maximal, which is composed of a series of modules for cinching and cyclelizing divergent tandem repeats. In conclusion, we find that the MHA approach is of higher utility than GA in non-protein-coding regulatory sequences, which are unconstrained by codon-based reading frames and are enriched in dense microparalogical content.
Footnotes
- Abbreviations
- 1-D
- one-dimensional
- 2-D
- 2-dimensional
- CDS
- protein-coding sequence
- GA
- gapped alignment
- indels
- insertion and deletions
- MGMA
- minimally-gapped MHA aligner
- MHA
- maximal homology alignment
- MSA
- multiple sequence alignment
- MSR
- micro-satellite repeats
- NEE
- neurogenic ectoderm enhancer
- TFBS
- transcription factor binding site
- TR
- tandem repeats
- vnd
- ventral nervous system defective
- WCR
- width cinch ratio