Abstract
Proteins can be identified by partitioning them into eight mutually exclusive sets of peptides, recoding them with a binary alphabet obtained by dividing the 20 amino acids into two ordered sets based on some measurable property of amino acids (for example, residue volume or mobility), and searching for the recoded peptides in a proteome sequence database. In principle over 89.7% of the proteins in the human proteome (http://www.uniprot.org; database id UP000005640, 20207 curated sequences) can be uniquely identified with this approach. Potential implementation issues are discussed. In particular, use of a nanopore to identify a residue based on the level of the blockade current, which is in part determined by residue volume, becomes less difficult as it requires the detection of only two, rather than 20, such levels.