Abstract
Large complexes of classical particles play central roles in biology, in polymer physics, and in other disciplines. However, physics currently lacks mathematical methods for describing such complexes in terms of component particles, interaction energies, and assembly rules. Here we describe a Fock space structure that addresses this need, as well as diagrammatic methods that facilitate the use of this formalism. These methods can dramatically simplify the equations governing both equilibrium and non-equilibrium stochastic chemical systems. A mathematical relationship between the set of all complexes and a list of rules for complex assembly is also identified.
Fock spaces – the vector spaces in which quantum field theories are built – provide a natural way to represent physical systems that have variable particle composition. Multiple Fock space formalisms have been described for modeling stochastic chemical systems (e.g., [1–3]), and these have proven useful in a variety of contexts. In particular, the Fock space methods described by Grassberger and Scheunert [2], for which Peliti introduced a path integral formulation [4], have been widely adopted [5, 6], especially is studies of diffusion-limited processes [7].
When modeling chemical systems that generate large multi-particle complexes, however, these formalisms become problematic. For instance, many interesting chemical systems in biology and in polymer physics are capable of generating vast (or even infinite) numbers of distinct complexes based on a relatively small number of components and interaction rules. Existing formalisms treat each distinct multi-particle complex as its own species of particle. It is often impractical to manually enumerate these complexes, to specify each one’s free energy, formation and decay rates, and so on.
The proliferation of complexes and the difficulties it can lead to are well-recognized in the context of molecular systems biology [8–10]. To address this issue computationally, formal grammars [11–15] and accompanying software [16–20] have been developed that enable “rulebased” simulations of biochemical systems. However, a “rule-based mathematics” that allows one to work with such systems analytically has yet to be described.
Here we introduce a mathematical formalism that allows rule-based definitions of both equilibrium and nonequilibrium stochastic chemical systems. The Fock space we describe is most similar to that of Doi [1] and of Park and Park [3], but with a number of key differences. Every particle in this formalism is modeled as occupying one of a large number of internal states. These internal states uniquely identify each particle and are essential for representing multi-particle complexes in terms of their components. Fock space excitations are used to represent not just particles, but also the interactions between particles, the conformation states of particles, and occupied sites on the surfaces of particles. The concept of occlusion, which is essential for any rule-based description of multi-particle complexes, is realized by using a Fock space constructed from hard core bosons [3].
Following [1], we equip our Fock space with an orthonormal basis S of “pure” physical states. The vector |ψ〉 that describes a system of interest is given by where ps is the probability that the system, when observed, will be found in state s. Every measurable quantity Q is represented by a corresponding operator Q that is diagonal in S, and the expectation value for this quantity is given in terms of |ψ〉 by 〈Q〉 = 〈sum|ℚ|ψ〉, where is referred to as the “sum vector.” In thermal equilibrium, |ψ〉 is uniquely determined by |sum〉 and a “Hamiltonian” operator H that assigns a free energy to each pure state: where the partition function Z is given by
Outside of thermal equilibrium, time evolution of the system is governed by a master equation where ℝ is a “rate matrix” and ℝ is a closely related “depletion matrix.” In terms of the transition rates Rs→t from any pure state s to any other pure state t, these operators are given by
To represent a 0-dimensional gas of monomeric particles (called M), we use a Fock space containing a field that has N modes . Each mode Mi represents a single internal state of the particle M and behaves as a hard core boson [3]. Specifically, there is a raising operator M̂i and a lowering operator that are nilpotent and that satisfy the anticommutation relation {M̌i M̂i} = 1. For each mode we further define a “presence” operator M̄i ≡ M̂iM̌i and an “absence” operator M̃i = M̌iM̂i. All operators for different modes commute, and the vacuum state |0〉 is defined to be annihilated by every annihilation operator M̂i. All calculations are performed in the N → ∞ limit.
The sum vector is given by the sum of the vacuum state, all 1-particle states, all 2-particle states, etc. Written in terms of |0〉 and the raising operators M̂i, this is
Note that each term in this series is multiplied by an inverse factorial coefficient that compensates for overcounting due to permutation symmetry of the summands. We can therefore write where G ≡ ∑iM̂i. Eq. (8) in fact holds for any chemical system so long as G equals the sum of creation operators for all internal states of all possible complexes. This fact was noted by Doi [1], but the operator G does not yet have an accepted name. Here we call G the “gallery”.
In thermal equilibrium, |ψ〉 is fully specified by a chemical potential μ. The Hamiltonian corresponding to this chemical potential is where μ′ = μ – kT ln N is the chemical potential appropriately adjusted for the number of internal states of the monomer. Expanding Eq. (3) and Eq. (4) term-by-term, one readily verifies that this gallery G and Hamiltonian H properly define a zero-dimensional gas of monomeric particles in the grand canonical ensemble.
Now suppose the particles M are capable of forming directed homopolymer chains, as depicted in Fig. 1. The gallery for this system is given by
Where Gml and Gmc respectively denote the sum of creation operators for linear and circular polymer chains having m subunits. If each polymer chain were represented by a separate field, the corresponding Hamiltonian would, by analogy to Eq. (9), require an infinite number of terms (one for each complex), each multiplied by its own chemical potential. This proliferation of terms and parameters is inconsistent with our expectation that the Hamiltonian should describe the essential energetic contributions to a system, since the polymer system in question is fully specified by just two parameters: the chemical potential μ of monomeric particles and the interaction energy ∈ between pairs of bound particles.
This problem is remedied by representing each multiparticle complex in terms of its components and interactions. To do this we introduce the “site fields” a and b, which have N modes each , as well as an “interaction field” I that has N2 modes . Like M, the fields a, b, and I behave as hard core bosons. Table I shows each term of the gallery in Eq. (10) expressed in terms of these fields. The creation operator for the linear dimer, for example, is given by
Here, M̂i and M̂j create the two component particles of the dimer, Îij registers an interaction between these particles, and âi and b̂j mark the resulting occupied sites. Eq. (11) defines a “composite field” D which has N2 modes that also behave as hard core bosons.
We now describe diagrammatic methods that aid the use of this composite Fock space formalism. Internal indices are represented by dots, pairs of indices by edges connecting two dots, and sets of three or more indices by dots contained within bubbles. Symbols written next to dots, lines, and bubbles indicate operators that have those corresponding internal indices. Conversely, a dot written next to any symbol indicates whatever internal indices that symbol might possess. All operators depicted in a diagram must commute with each other. For example, Eq. (11) can be written as
Sums over internal indices are represented by filling in the appropriate dots (e.g., see Table I). Within such sums, each distinguishable state is counted exactly once. As with Feynman diagrams, this often leads to symmetry factors in the corresponding formulas.
Using this composite Fock space dramatically simplifies the Hamiltonian of the 0D polymer system. Instead of expressing ℍ as a sum of an infinite number of terms, each multiplied by its own chemical potential, ℍ can be expressed as a sum of only two terms, one for μ and one for ∈:
This Hamiltonian evaluates the energy of each complex in a simple and intuitive way: the first term contributes a free energy of –μ for each component particle, while the second term contributes a free energy of e for every two-particle interaction. Note the use of a single dot next to I; since the internal state of I is indexed by the pair ij, this dot indicates summation over both indices.
This composite Fock space can also dramatically simplify master equations. Suppose that each potential a:b interaction forms at a rate r+, while each realized interaction decays at a rate r-. The corresponding rate matrix is given by
The first term links two M particles together and registers the appropriate a and b sites as occupied. The second term destroys preexisting interactions, in the process freeing up sites a and b. Note that the two diagrams in Eq. (15) are Hermitian conjugates of one another. The corresponding depletion matrix follows from ℝ by replacing creation operators with absence operators and annihilation operators with presence operators:
This method for transforming rate matrices into depletion matrices is fully general. The ease with which this master equation is defined stands in stark contrast to the difficulty of manually specifying correct transition rates between every one of the infinite linear and circular polymer species illustrated in Fig. 1C.
We now address the problem of specifying |sum〉. Eq. (8) greatly simplifies this task, but manually defining the gallery G by listing every possible complex can still be cumbersome. The composite Fock space enables an alternative approach: |sum〉 can be defined using a “factory” – an ordered list of operators that specify rules for assembling complexes. Writing the factory as F = (F1, ·, FK), the sum vector is given by
The order of operators in the factory is important because they do not commute. Indeed, the fact that they do not commute is what generates nontrivial complexes.
The 0D polymer system is readily defined by a twooperator factory F = (F1, F2), where F1 (same as G1l) creates isolated particles and F2 (which appears in R) binds two preexisting particles together:
To verify eF2 eFl |0〉 = eG |0〉, one can Taylor expand the left-hand-side and examine it term by term. For instance, applying the fifth-order term of eFl and the second-order term of eF2 to |0〉 yields all pure states constructed from five particles and two interactions:
As in Table I, field names are hidden on the right hand side of Eq. (19) for clarity. It is straight-forward, if tedious, to verify Eq. (19) using the commutation relation [M̄i, M̂i] = δijM̂i. In doing so one sees that Eq. (17) successfully accounts for subtle combinatorial effects, such as the factor of in G2c that arises due to cyclic permutation symmetry. One also sees how the hard core boson nature of site fields a and b prevents unphysical complexes from forming. We note that using diagrams greatly eases this computation, allowing the right-hand-side of Eq. (19) to be computed by inspection.
To illustrate the utility of these composite Fock spaces for describing more complex biochemical systems, we turn to the classic MWC model [21] for the cooperative binding of oxygen by hemoglobin. In this model, hemoglobin proteins are allowed to be in two conformations, relaxed or tense. These proteins exist only as tetramers in solution, however, and all proteins in the same tetramer must be in the same conformation. The cooperative binding of oxygen results from tense proteins being energetically favored in the absence of oxygen, whereas proteins in the relaxed conformation bind oxygen more tightly.
The molecular complexes of the MWC model can be defined by a three-term factory F = (F1, F2, F3), the operators of which are shown in Table II. F1 creates four hemoglobin monomers, represented by the field H, which are in the relaxed state by default. It also links them together into a tetramer, represented by the field H4. F2 transforms a hemoglobin tetramer and its individual subunits from the relaxed to the tense conformation; tense conformations are represented by the fields T and T4 for the individual particles and for the tetramer, respectively. F3 binds oxygen to a preexisting hemoglobin monomer. Oxygen is not modeled explicitly, but rather its occupancy is indicated only by the site field o. The corresponding Hamiltonian is
Here, is the adjusted chemical potential of hemoglobin tetramers, while α, L, and c are the original parameters described in [21]: α governs the probability of a relaxed monomer binding oxygen, L governs the relative proportion of tense versus relaxed tetramers in the absence of oxygen, and c quantifies the change in affinity for oxygen of tense versus relaxed monomers. Cooperativity obtains when L > 1 and c < 1.
This depiction of the MWC model is far more concise and rigorous than the standard representation (e.g., [2225]). Typically, an illustration similar to Fig. 2 is supplemented with text explaining how to interpret it mathematically. By contrast, the factory in Table II and the Hamiltonian in Eq. (20) provide a fully rigorous mathematical specification of the MWC model; additional text is needed only to provide a biochemical interpretation.
The formalism described here provides a powerful way to concisely and rigorously define mathematical models of stochastic chemical systems that generate large multiparticle complexes. This approach bridges the gap between the mathematical methods used to describe simple stochastic chemical systems and the rule-based approaches that have been developed for computationally simulating more complex systems. One result of this formalism is the identification of a relationship between the set of possible complexes and a list of assembly rules (Eq. (17)). The concepts of space and orientation, which have been ignored thus far, are readily incorporated the way they are in other formalisms (e.g., [1, 3]), i.e., by introducing additional indices. We anticipate that the mathematical and diagrammatic methods described here will prove particularly useful for studying complex biochemical systems, both analytically and computationally.
We thank Rob Phillips for inspiring our work on this problem, as well as Jane Kondev, Ilya Nemenman, and Bruce Stillman for helpful discussions. The work of MJM was supported by NSF Graduate Fellowship DGE-1144469. JBK acknowledges support from the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory.
Footnotes
jkinney{at}cshl.edu