PT - JOURNAL ARTICLE AU - G. Sampath TI - Protein fingerprinting with a binary alphabet and a nanopore AID - 10.1101/119313 DP - 2017 Jan 01 TA - bioRxiv PG - 119313 4099 - http://biorxiv.org/content/early/2017/04/13/119313.short 4100 - http://biorxiv.org/content/early/2017/04/13/119313.full AB - Proteins can be partitioned into eight mutually exclusive sets of peptides and recoded with a binary alphabet obtained by dividing the 20 amino acids into two ordered sets based on volume. By searching for these binary-coded peptides in a protein sequence database, their container proteins can be identified. Over 89.7% of 20207 curated proteins in the human proteome (http://www.uniprot.org; database id UP000005640, H. sapiens) can be so identified. This procedure can be translated into practice. Thus standard chemical procedures can be used for partitioning and a nanopore can be used to obtain binary coded sequences for partitioned peptides. In the latter case, recently published work has shown that a sub-nanometer-diameter pore can measure residue volume with a resolution of ~0.07 nm3. This can be used to distinguish between the two sets of residues defined above; a detector with two thresholds outputs a binary sequence for a partitioned peptide from the nanopore current signal. Using normal distributions of amino acid volume data from the literature, routine computations show that ~98% of the protein-identifying peptides in the curated human proteome have binary codes that are correct with a confidence level exceeding 85%. Similar results are presented for the proteomes of baker’s yeast (S. cerevisiae), the pathogen E. coli, and the gut bacterium H. pylori.