Abstract
Engineered biocircuits that interface with living systems as plug-and-play constructs may enable new applications for programmable therapies and diagnostics. We create biological bits (bbits) using proteases – a family of pleiotropic, promiscuous enzymes – to construct the biological equivalent of Boolean logic gates, comparators and analog-to-digital converters. We use these modules to write a cell-free bioprogram that can combine with bacteria-infected blood, quantify infection burden, and then calculate and unlock a selective drug dose. Inspired by probabilistic computing, we leverage multi- and common-target protease promiscuity as the biological analog of superposition to program three probabilistic bbits that solve all implementations of the two-bit oracle problem, Learning Parity with Noise. Treating a network of dysregulated proteases in a living animal as an oracle, we use this algorithm to resolve the probability distribution of coagulation proteases in vivo, allowing diagnosis of pulmonary embolism with high sensitivity and specificity (AUROC = 0.92) in a mouse model of thrombosis. Our results demonstrate that protease activity can be programmed in cell-free systems to carry out classical and probabilistic algorithms for programmable medicine.
Introduction
Rapid advances in engineered biological circuits are motivating the design of new treatment and detection platforms for practical applications in programmable medicine. The development of foundational components, such as molecular logic gates1 and genetic clocks2,3, have enabled the design of biocircuits with increasing complexity, including the ability to solve mathematical problems4, build autonomous robots5, and play interactive games6. Recently, programmable biocircuits have been applied for therapeutic and diagnostic applications7, including genetic circuits that sense-and-respond to dysregulated inflammation8 or blood glucose levels9. To date, the design of these biocircuits is principally focused on constructs that are implemented in cell-based platforms – which require genome or protein engineering10–13 – and carry out algorithms inspired by classical computer circuits, which operate on binary digits (bits) and Boolean logic gates (e.g., AND, OR, NAND).
While classical biocircuits are well-suited at performing deterministic tasks (e.g., input determines output)14, the ability to perform inference-based tasks – such as identification of which single input cause resulted in the observed output effect given multiple plausible inputs – are more challenging. In contrast to classical circuits, probabilistic circuits, which operate on analog bits characterized by a probability distribution of states, efficiently solve inference problems by assigning a likelihood probability that each plausible input would produce the observed output15. Probabilistic bits have been implemented with magnets (p-bits)16–18 as well as photons and electrons in quantum systems (qubits)19,20. In medicine, differential diagnoses are fundamentally based on inference, wherein an observed symptom could be caused by several diseases. Conversely, the decision to treat a patient is determined by a clear set of inputs (e.g., disease stage, biomarker level, etc.)7. For these reasons, we sought to develop a unified system of biological bits capable of executing both classical and probabilistic algorithms for therapeutic and diagnostic applications.
In living organisms, high-level functions arise from intricate networks of enzymatic activity that ultimately control complex systems ranging from immunity to blood homeostasis21–24 Among enzymes, proteases are both ubiquitous, comprising 2% of the human genome25, and promiscuous, having the ability to cleave diverse substrate sequences (6–8 AA) in addition to their putative target26–33. To leverage these features for programmable medicine, we define protease activity acting on a target substrate as a biological bit (bbit). Under a classical framework, a register of bbits comprise distinct protease-substrate pairs that take on the binary state 1 above an activity threshold (Fig. 1A left., Fig. S1A). By contrast, probabilistic bbits are constructed using promiscuous proteases that act on two substrates simultaneously to create a state of superposition where the probability of being measured in state 0 or 1 is based on relative substrate cleavage velocities (v0 and v1) (Fig. 1A right, Fig. S1B). Here we use classical bbits to design a plug-and-play therapeutic biocircuit capable of quantifying input bacterial activity and outputting a digital drug dose to clear infected human blood (Fig 1B). Under a probabilistic framework, we construct diagnostic biocircuits using probability-based gates to first solve the oracle problem Learning Parity with Noise (LPN), and then extend this system in vivo by treating networks of dysregulated proteases as a hidden oracle to noninvasively diagnose pulmonary embolism with high accuracy in a mouse model of thrombosis (Fig 1C).
Results
A central function of complex circuits is the ability to store and manipulate digitized information; therefore, we first set out to construct a flash analog-to-digital converter (ADC) to convert continuous biological signals into binary digits. An electronic ADC performs three major operations during signal conversion: voltage comparison, priority assignment, and digital encoding. An analog input voltage is first compared against a set of increasing reference voltages (V0–Vi) by individual comparators (d0–di) that allows current to pass if the input signal is greater than or equal to its reference value (Fig. S2). During priority assignment, only the activated comparator with the highest reference voltage, dn, remains on while all other activated comparators, dn-1–d0 are turned off. The prioritized signal is then fed into a digital encoder comprising OR gates to produce binary values. To design an ADC biocircuit using protease activity as the core signal, we constructed biological analogs of comparators by using liposomes locked by an outer peptide cage34,35 (Fig. 2A; Fig. S3A, B). With increasing peptide crosslinking densities, these biocomparators (b0–bi) served to reference the level of input protease activity (GzmB) required to fully degrade the peptide cage (IEFDSGK, Table S1) and expose the lipid core (Fig. S3C), analogous to the reference voltages stored in electronic comparators. We used lipase36 as a Buffer gate to open all biocomparators with fully degraded cages (Fig. 2B, C; Fig. S4) and release a unique combination of inhibitors and signal proteases (WNV, TEV, and WNV inhibitor) that collectively act to assign priority to the highest activated biocomparator (bn) by inhibiting all signal proteases released from other biocomparators (b0–bn-1). To encode the prioritized signal into binary values, we designed a set of OR gates using orthogonal quenched substrates (RTKR and ENLYFQG) specific for the signal proteases (WNV and TEV respectively; Fig. 2D) to provide fluorescent 2-bit readouts (p0–pi; Fig. S5). Fully integrated, our 4-2 bit biological ADC converted input protease levels (GzmB) across four orders of magnitude into binary digital outputs (Fig. 2E, F).
To demonstrate a practical biomedical application, we next sought to interface our biological ADC with a living system as a plug-and-play therapeutic biocircuit for digital drug delivery. We rewired our ADC to autonomously quantify input bacterial activity and then output an anti-microbial drug dose to selectively clear infected red blood cells (RBCs) of bacteria (DH5α Escherichia coli) (Fig. 3A). To construct biocomparators with the ability to prioritize input levels of bacterial activity, we synthesized liposomes with peptide cages using a substrate (RRSRRVK) specific for the E. Coli surface protease OmpT37,38 (Fig. 3B). We synthesized a series of 8 biocomparators with increasing peptide densities (0–10.2 nM) and validated their ability to sense input bacterial concentrations across 8 log units (0–108 CFU/ml) using a fluorescent reporter (Fig. S6). To convert the release of signal proteases to a drug output, we designed protease-activatable prodrugs comprising cationic (polyarginine) anti-microbial peptides (AMP) (Fig. 3C, Table S1) in charge complexation with anionic peptide locks (polyglutamic acid)39 to block the activity of AMP. These drug-lock peptides were linked in tandem by OR gate peptides p0 and p1 (RTKR and ENLYFQG respectively) to allow signal proteases that directly cleave p0 or p1 to digitally control the output drug dose (Fig. 2). We designed one-third and two-thirds of the total drug dose to be unlocked by cleavage of p0 and p1, respectively, such that binary values 00, 01, 10, and 11 corresponded to 0/3, 1/3, 2/3, and 3/3 of the total drug dose (Fig. S7).
To confirm the therapeutic efficacy of our prodrug design, treatment of bacteria with locked drug had no significant cytolytic activity compared to untreated controls, but by contrast, treatment with protease-cleaved drug-lock complexes resulted in a significant reduction in bacterial colonies (Fig. 3D). We observed similar levels of bacterial cytotoxicity when AMP was directly loaded into liposomes, showing that charge complexation was required to fully block AMP activity (Fig. S6B). In human RBCs infected with E. coli at concentrations ranging from 100–109 CFU/mL, samples containing a single biocomparator (b0) lacked the ability to eliminate bacteria as anticipated (output = 00). By contrast, increasing the number of biocomparators in the samples (b0–b3) allowed our program to autonomously increase the drug dose (output 01, 10, and 11) in response to higher bacterial loads to completely eliminate infection burdens across 9 orders of magnitude up to 109 CFU/mL without significantly increasing hemolysis (Fig. 3E, Fig. S8). Our data showed that cell-free biocircuits can be constructed using protease activity as a primary digital signal to execute autonomous drug delivery programs under a broad range of conditions.
Our antimicrobial ADC used protease activity as binary classical bbits to carry out a sense- and-respond bioprogram for drug delivery. To demonstrate the use of protease activity as probabilistic bbits to solve inference problems, we designed probabilistic circuits by leveraging protease promiscuity. Protease promiscuity occurs under multi-target (i.e., a single protease cutting multiple substrates) and common-target (i.e., multiple proteases cutting the same substrate) settings, and is a fundamental feature that allows proteases to carry out distinct physiological functions26 (e.g., coagulation proteases control the formation of fibrin clots as well as the expression of adhesion molecules and cytokines40) (Fig S9E). To create two-state probabilistic bbits, we considered the superposed activity of a single protease cleaving two distinct substrates, and defined the probability of the protease to be found in state 0 (cleaving substrate 1) or state 1 (cleaving substrate 2) by the relative cleavage velocity for either substrate41 (Fig. 4A; Fig. S9). This allowed state probabilities (i.e., cleavage velocities) to be quantitatively controlled according to Michaelis-Menten41 kinetics by changing the substrate concentration or sequence.
Under this framework for quantifying protease probabilities, we built a set of biological probabilistic gates to perform operations on state probabilities that we named the Uniform gate (U-gate) and Linker gate (L-gate). These gates make use of multi- and common-target promiscuity, and we designed their operations based on previous implementations of probabilistic gates to solve a classic oracle problem, Learning Parity with Noise (LPN)42, where the goal is to deduce the value of a 2-bit string hidden by the oracle in the fewest number of calls (Fig S9, Table S1, 2). Analogous to conducting a coin flip, we designed our U-gate to create a superposition of states by taking an input bbit, b0, in state 0 with 100% probability (i.e., single substrate cleavage) and outputting b0 in state 0 or 1 with equal probability (i.e., performed by adding a second substrate to allow multi-target cleavage) (Fig. 4A, B). By contrast, analogous to a classical XOR gate, we designed the L-gate to take two input bbits – control and target bbits b0 and b1 respectively – and operate on the state 1 probability of target b1 such that it exhibits parity, or is linked, to the state 1 probability of control b0 (i.e., performed by adding a second substrate to allow common-target cleavage between two proteases) (Fig. 4C, Fig S9E). We constructed biological scores, based on probabilistic scores42, to implement all four instances of the 2-bit LPN problem by using our U- and L-gates to operate on three protease bits – 2 string bits (b0 and b1) to represent possible hidden string values (00, 01, 10, and 11) and 1 answer bit (b2) (Fig. S9B, C, Table S1, 2). By multiplying all permutations of the output state 0 and 1 probabilities of bbits b0–b2 (Fig. S9D), our protease solver correctly deduced the value of the hidden string among all other possibilities by assigning it the highest probability in all four oracle configurations (Fig. 4E; Fig. S9D). Collectively, our results showed that protease activity can be quantified as state probabilities and operated by probabilistic logic gates to efficiently solve inference problems.
We next sought to demonstrate a practical application using probabilistic bbits as a diagnostic platform for disease detection in living animals. The ability to detect dysregulated protease activity has important diagnostic applications for broad diseases, such as the prothrombin time (PTT) assay which is used to diagnose thrombosis43. Here we considered dysregulated protease networks, such as those in thrombosis, to be represented as an oracle string of protease activities with a distinct probability distribution compared to a healthy state (Fig. 5A). Analogous to the Central Limit Theorem (CLT), we postulated that designing promiscuous, common-target substrates to detect dysregulated protease networks could be modeled as sampling a probability distribution where sampling means would converge to normal distributions even if the underlying protease probability distribution is itself not normally distributed. We therefore sought to design and adapt a new set of U-gates to sample differences in uniformity between dysregulated and healthy protease networks, and use the resulting normalized variances (σ2) to discriminate disease (Fig 5A).
To test this approach computationally, we randomly generated baseline activity scores between zero and one for all 550 proteases encoded in the human genome26. Random strings of 0, 20, 100, or 550 proteases were upregulated or downregulated in equal proportion by scaling their activity by a factor of five to reflect an average of literature reported values44–46. To simulate promiscuous sampling by a set of U-gates, we modeled a substrate library of size M (ranging from 2–550) randomly sampling n proteases (ranging from 1–550) by adding corresponding activity scores and computing the probability distribution and normalized variance across all U-gates. The results from our model revealed that the ability to classify disease and healthy networks increases as the number of dysregulated proteases (red and green traces; Fig. 5B) or U-gates increases (e.g., greater than 90% classification accuracy can be achieved with >10 U-gates and >20 dysregulated proteases) (Fig 5B). This result showing dependence of classification accuracy on feature size was consistent with computational results based on multidimensional datasets47. To validate our model prediction, we designed seven substrates (U0–U6) to sense the complement (e.g., C1r, MASP2, Factor D, Factor I) and coagulation protease networks (e.g., thrombin, plasmin, factor XIIa, factor Xa, protein C) (Fig S10). Using the measured U-gate outputs after incubation with either group of proteases in vitro (Fig 5C), the normalized variances of the U-gate outputs classified mixtures as either complement or coagulation with perfect accuracy (n = 10, AUROC = 1.00, Fig 5D). These results confirmed that a set of promiscuous U-gates can be used to sample and discriminate differences in the underlying probability distributions of protease networks.
To apply this approach for in vivo diagnostics, we used a thromboplastin-induced mouse model of pulmonary embolism (PE)48 to test whether our library of U-gates could discriminate mice with blood clots from healthy controls. Recently, we developed a class of protease activity sensors for delivery of mass-barcoded peptide substrates to quantify protease activity in vivo28,49,50. Mass-barcoded substrate libraries are conjugated to a nanoparticle carrier, delivered intravenously, and upon protease cleavage, release substrate fragments that are cleared into urine for quantification by mass spectrometry according to their mass barcode. Using this platform, we administered a single cocktail of our seven mass-barcoded U-gates to quantify protease activity in healthy mice (Fig. 5E; Fig. S11)28 as well as in mice induced with PE (Fig. 5F). The measured variance across our 7 U-gates noninvasively diagnosed PE with high sensitivity and specificity (AUROC = 0.92) (Fig. 5F, G; Fig. S12), and consistent with our mathematical predictions, overall classification accuracy increased from 0.5 to 0.92 as the number of U-gates used in the classifier increased from zero to seven respectively (Fig. 5G). Collectively, our data showed that by treating strings of protease activity as a probability distribution, the underlying sample variance can be used to infer and diagnose pulmonary embolism with high accuracy.
Discussion
By interpreting protease activity as carrying binary or probabilistic information, we demonstrated the use of proteases as biological bits in cell-free biocircuits for therapeutic and diagnostic applications. We used the classical interpretation of protease bbits to construct a 2-bit analog-to-digital converter (ADC) as an autonomous drug delivery biocircuit to clear infected blood of bacteria across 9 orders magnitude in concentration. To construct our biological ADC, we designed biocomparators using peptide-caged liposomes because these materials are well-tolerated and biologically compatible51,52. Our cell-free approach is distinct from cell-based genetic circuits7,53 that require significant protein or organismal engineering to control signaling, including the non-trivial OFF state for proteases which has required insertion strategies11–13 artificial autoinhibitors31, or dimerizing leucine zippers53 to control. Cell-free liposomes have also been used in past studies such as synthetic minimal cells to control the expression of genetic circuits by liposome fusion54,55. Our approach may be amenable to integration with these genetic approaches, if for example, these circuits were redesigned to input or output proteases.
In contrast to classical binary bits, we also explored the use of protease activity as probabilistic bits. By leveraging both multi-target and common-target protease promiscuity, we designed logic gates to operate on the probability states of protease bbits to provide the ability to solve inference-based oracle problems, such as LPN, by deducing the correct value of hidden strings with the highest probability. We further considered dysregulated protease networks within a living animal as representing a biological oracle with a distinct probability distribution of states, which enabled us to noninvasively diagnose thrombosis with high classification accuracy. This approach is similar to the Central Limit Theorem (CLT) where sampling an unknown probability distribution, regardless whether the probability distribution itself is normally distributed or not, will result in sample means that converge to a normal distribution with variance proportional to the unknown distribution56. Therefore, we designed promiscuous U-gates to sample the underlying probability distribution of dysregulated protease networks such as the coagulation cascade, and using as few as seven substrates, achieved a disease classification accuracy > 0.92 in vivo. As there are >15 proteases involved in these cascades, we envision that the future use of larger (>50) substrate libraries may allow development of pan-diagnostics capable of monitoring whole-organism protease activities (>250 extracellular proteases).
Under both classical and probabilistic frameworks, our biological circuits were designed to sense extracellular proteases, which we do not envision will limit potential in vitro or in vivo applications. Of the greater than 550 proteases encoded by the genome, over half are secreted or membrane-bound and involved in a host of different diseases26,57. A significantly greater diversity of secreted and membrane bound proteases is represented by bacterial and viral species58–62. This rich diversity has provided the biological foundation for biomedical applications that rely on extracellular protease activation of pro-drug or pro-diagnostics in living animals including patients57,58. In our work, we provided examples of multiple types of biocircuits (ADC, logic gates, comparators, etc.) that are modular and can be engineered to input or output bacterial (OmpT), viral (TEV, WNV), murine (coagulation cascade), or mammalian (GzmB) proteases in both in vitro and in vivo settings. Looking forward, by integrating the full richness of protease biology and promiscuity, harnessing proteases as binary or probabilistic bits may provide a unique biological advantage for programmable control of future therapeutics and diagnostics.
Author contributions
B.A.H. and G.A.K. designed research; B.A.H. performed research; B.A.H. and G.A.K. analyzed data; and B.A.H and G.A.K. wrote the paper.
Competing interests
No competing interests.
Data and materials availability
All data is available in the main text or the supplementary materials.
Acknowledgments
This work was funded by an NIH Director’s New Innovator Award (Award No. DP2HD091793). B.A.H is supported by the NSF GRFP, National Institutes of Health GT BioMAT Training Grant under Award Number 5T32EB006343 and the Georgia Tech President’s Fellowship. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1650044 (B.A.H.). G.A.K. holds a Career Award at the Scientific Interface from the Burroughs Welcome Fund. We acknowledge Dr. D. Myers (Georgia Tech and Emory) and S. N. Dahotre (Georgia Tech and Emory) for their helpful discussions during the preparation of the manuscript. The authors thank Quoc Mac (Georgia Tech) for assistance with the mouse model. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We acknowledge use of the IBM Q for this work. The views expressed are those of the authors and do not reflect the official policy or position of IBM or the IBM Q team.