Abstract
Machine learning algorithms are sensitive to meaningless (or “adversarial”) perturbations. This is reminiscent of cellular decision-making where ligands (called “antagonists”) prevent correct signalling, like in early immune recognition. We draw a formal analogy between neural networks used in machine learning and models of cellular decision-making (adaptive proofreading). We apply attacks from machine learning to simple decision-making models, and show explicitly the correspondence to antagonism by weakly bound ligands. Such antagonism is absent in more nonlinear models, which inspired us to implement a biomimetic defence in neural networks filtering out adversarial perturbations. We then apply a gradient-descent approach from machine learning to different cellular decision-making models, and we reveal the existence of two regimes characterized by the presence or absence of a critical point. The critical point causes the strongest antagonists to lie close to the threshold. This is validated in the loss landscapes of robust neural networks and cellular decision-making models, and observed experimentally for immune cells. For both regimes, we explain how associated defence mechanisms shape the geometry of the loss landscape, and why different adversarial attacks are effective in different regimes. Our work connects evolved cellular decision-making to machine learning, and motivates the design of a general theory of adversarial perturbations, both for in vivo and in silico systems.
Introduction
Machine learning is becoming increasingly popular with major advances coming from deep neural networks [31]. Deep learning has improved the state-of-the-art in automated tasks like image processing [24], speech recognition [19] and machine translation [44], and has already seen a wide range of applications in research and industry. Despite their success, neural networks suffer from blind spots: small perturbations added to unambiguous samples may lead to misclassification [45]. Such adversarial examples are most obvious in image recognition, for example, a panda is misclassified as a gibbon or a handwritten 3 as a 7 [16]. Real world scenarios exist, like adversarial road signs fooling computer vision algorithms (Fig. 1 A) [39]. Worse, adversarial examples are often transferable across algorithms (see [1] for a recent review), and certain “universal” perturbations fool any algorithm.
Categorization and inference are also tasks found in cellular decision-making. For instance, T cells have to discriminate between foreign and self ligands which is challenging since foreign ligands might not be very different biochemically from self ligands [14, 10]. Decision-making in an immune context is equally prone to detrimental perturbations in a phenomenon called ligand antagonism [11]. Antagonism appears to be a general feature of cellular decision-makers: it has been observed in T cells [2], mast cells [47] and other recognition processes like olfactory sensing [41].
There is a natural analogy to draw between decision-making in machine learning and in biology. In machine learning terms, cellular decision-making is similar to a classifier. Furthermore, in both artificial and cellular decision-making, targeted perturbations lead to faulty decisions even in the presence of a clear “ground truth” signal. As a consequence, arms races are observed in both systems. Mutating agents might systematically explore ways to fool the immune cells via antagonism, as has been proposed in the HIV case [23, 35, 21]. This is reminiscent of how adversaries could generate “black box attacks” aiming at fooling neural networks [39]. Strategies for provable defenses or robust detection of adversarial examples [18, 51] are currently developed in machine learning, but we are still far from a general solution.
Adaptive proofreading for cellular decision-making
Cellular decision-making in our context refers to classification of biological ligands in two categories, e.g. “self vs non self” in immunology, or “agonist vs non agonist” in endocrinology. For most of those cases, there actually is a continuum of properties between two different categories, so that it is convenient to rank different ligands based on a continuous variable (notation τ) that we will call “quality”. Mathematically, a cell needs to decide if it is exposed to ligands with quality τ > τc, where τc is the quality at the decision threshold. Such ligands triggering response are called “agonists”. A general problem then is to consider cellular decision-making based on ligand quality irrespective of ligand quantity (notation L). An example can be found in immune recognition with the “lifetime dogma” [10], where it is assumed that a T cell discriminates ligands based on their characteristic binding time τ to T cell receptors (this is of course an approximation and other parameters might also play a role in defining quality, see [17, 6, 33]). Ligand discrimination is a nontrivial problem for the cell, which does not measure single-binding events but only has access to global quantities such as the total number of bound receptors (Fig. 1 B). The challenge is to ignore many subthreshold ligands (τ < τc) while responding to few agonist ligands with τ > τc [2, 10, 13]. In particular, it is known experimentally in many different contexts that addition of “antagonistic” subthreshold ligands can impair proper decision-making (Fig. 1 C) [2, 47, 41].
To model cellular decision-making, we will use the general class of “adaptive sorting” or “adaptive proofreading”, models, which account for many aspects of immune recognition [29, 11], and can be shown to capture all relevant features of such cellular decision-making close to a decision threshold [12]. Mathematical details on the models are given in Appendix 1. Adaptive proofreading models rely on an incoherent feedforward loop, where an output is at the same time activated and repressed by bound ligands via two different branches in a biochemical network (Fig. 1 D). Thanks to the tug-of-war between those different branches, the network can perform a complex computation. The activation and repression branch are assumed to be activated linearly as a function of the ligand quantity L. We further assume that the t dependency of the activation branch is stronger (parameter N, generally assumed to be an integer representing number of phosphorylation sites in immune recognition models) than the repression branch (parameter m, we assume m < N). The output TN,m (taken to be the ratio between the activation and the repression branch) ensures that the ligand dependency L disappears while a τ dependency survives, allowing for ligand classification based on quality τ.
Fig. 1 E shows theoretical and experimental curves of an adaptive proofreading model with (N,m) = (4, 2). Adaptive proofreading models give dose response curves plateauing at different values as a function of parameter τ, allowing to perform sensitive and specific measurement of this parameter (Fig. 1 E left). For small τ (e.g. τ = 3s), one never reaches the detection threshold (dotted line on Fig. 1 E) even for many ligands. For slightly bigger τ = 10 s > τc, the curve is shifted up so that detection is made even for a small concentration of agonists.
If we now consider mixtures of ligands with different qualities, the respective computation made by the activation and repression branch of the network depends in different ways on τs. Antagonistic effects occur when the repression branch is activated more strongly than the activation branch, thus killing the response. This corresponds to the “dog in the manger” effect described in [47] for decision-making by mast cells. Fig. 1 E middle and right panels illustrate this. In presence of many ligands below the threshold of detection, the dose response curve for “agonist alone” are simultaneously moved to the right but with a higher starting point, as observed experimentally (data redrawn from [13]). Different models have different antagonistic properties, based on the strength of the activation branch (N) relative to the repression branch (m). More mathematical details on such models can be found in [29, 11, 12].
Neural networks for artificial decision-making
We will compare cellular decision-making to decision-making in machine learning algorithms. We will constrain our analysis to binary decision-making, here classifying images from two types of digits. These images are taken from MNIST [32], a standard database with 70000 pictures of handwritten digits. Even for such a simple task, designing a good classifier is not trivial, since it should be able to classify irrespective of subtle changes in shapes, intensity and writing style (i.e. with or without a central bar for a 7).
A simple machine learning algorithm is logistic regression. Here, the inner product of the input and an optimized weight matrix determines the class of the input. Another class of machine learning algorithms are feedforward neural networks: interconnected groups of nodes processing information layer-wise. We chose to work with neural networks for several reasons. First, logistic regression is a limiting case of a neural network without hidden layers. Second, a neural network with at least one hidden layer more closely imitates information processing in cellular networks. Third, such an architecture reproduces classical results on adversarial perturbations such as the ones described in [16]. Fig. 1 E introduces the iterative matrix multiplication inside a neural network. Each neuron i computes wi · x, i ∈ [0, 3], adds bias bi, and transforms the result with an activation function f(x). We chose to use a Rectified Linear Unit (ReLU), which returns 0 when its input is negative, and the input itself otherwise. The resulting f(wi · x + bi) is multiplied by another weight matrix with elements ai, summed up with a bias, defining a scalar quantity . Finally, the logistic function σ(x) assigns a class (0 or 1) probability to the input.
We use the scikitlearn implementation [40] of the multilayer perceptron to train neural networks. We have chosen our hyperparameters as follows: one hidden layer with four neurons feeding into an output neuron, a random 80/20 training/test split with a 10 percent validation split. The cross-entropy loss function is minimized via stochastic gradient descent in maximal 300 iterations with a batch size of 200 and an adaptive learning rate, initiated at 0.001. The tolerance is 10−4 and the regularization rate is 0.1. Most of these parameters are set to their default value, but we found that the training procedure is largely insensitive to the specific choice of hyperparameters. As an example, in Fig. 1 E, a 7 is correctly classified by the neural network (J(x) > 0.5), while the adversarial 7 is classified as a three (J(xadv) < 0.5).
Results
We first summarize the general approach followed to draw the parallel between machine learning and cellular decision-making. We will limit ourselves to simple classifications where a single decision is made, such as “agonist present vs no agonist present” in biology, or “3 vs 7” in digit recognition. Decision-making on a sample (a picture in machine learning, or a ligand distribution in biology) is then done via a scoring function (or score). This score is computed either directly by the machine learning algorithm (score J) or by the biochemical network, via the concentration of a given species (TN,m where (N, m) depend on the model considered, see Appendix 1). For simple classifications, the decision is then based on the relative value of the score above or below some threshold (typically 0.5 for neural networks where decision is based on sigmoidal functions, or some fixed value related to critical quality τc for biochemical networks).
The overall performance of a given classifier depends on the behavior of the score in the “space” of possible samples (i.e. the space of all possible pictures, or the space of all possible ligand distributions). Both spaces have extremely high dimension: for instance dimension in the MNIST picture correspond to number of pixels 28 × 28 = 784, while in immunology there are roughly 30000 receptors potentially bound to different ligands [2]. The score can thus be thought of as a non-linear projection in one dimension of samples in those huge spaces. We will study how the score behaves in relevant directions in the sample space, and how to change the corresponding geometry and position of decision boundaries (defined as the samples where the score is equal to the classification threshold). We will show that similar properties are observed, both close to initial samples and to the decision boundary. It is important to notice at this stage that the above considerations are completely generic on the biology side and are not necessary limited to, say immune recognition; however we will show that adaptive proofreading presents many features reminiscent of what is observed in machine learning.
Fast Gradient Sign Method recovers antagonism by weakly binding ligands
In this framework, from a given sample, an adversarial perturbation is a “small” perturbation in sample space giving a change in score reaching (or crossing) the decision boundary. We start by mathematically connecting the simplest class of adversarial examples in machine learning to antagonism in adaptive proofreading models. We follow the original Fast Gradient Sign Method (FGSM) proposed by [16]. The FGSM computes the local maximum adversarial perturbation η = ϵsgn(∇xJ) with ║η║∞ ≤ ϵ. ∇xJ represents the gradient of the scoring function, categorizing images in two different categories (such as 3 and 7 in [16]). Its sign defines an image, that is added to the initial batch of images with small weight ϵ. Examples of such perturbations are shown in Fig. 1 E and Appendix 2 for the 3 vs 7 digit classification problem. While to the human observer, the perturbation is weak and only changes the background, “naive” machine learning algorithms are completely fooled by the perturbation and systematically misclassify the digit.
Coming back to adaptive proofreading models, we apply FGSM for the computation of a maximally antagonistic perturbation. To do so, we need to specify the equivalent of pixels in adaptive proofreading models. A natural choice is to consider parameters associated to each pair (index i) of receptor/ligands, namely ki (corresponding to an on-rate) and τi (corresponding to quality), and to compute gradients with respect to those parameters. If a receptor i is unoccupied, we make the choice to consider that this ki and τi are 0 1.
As a simple example, we start with the case (N,m) = (1, 0), which also corresponds to a recently proposed model for antagonism in olfaction [41], with the role of kon played by inverse affinity κ-1, the role of τ played by efficiency η, and the spiking rate of olfactory receptor neurons a function J(TN,m), that can be interpreted as a scoring function in the machine learning sense. In this case, T1,0 simply computes the average quality τavg of ligands presented weighted by (models with N > m > 0 give less intuitive results as will be shown in the following). It should be noted that while this computation is formally simple, biochemically it requires elaborated internal interactions, because a cell can not easily disentangle influence of individual receptors, see [11, 41] for explicit examples.
Starting from the computation of ∇xJ with respect to parameters and τi, the FGSM perturbation is: where . From the above expression, we find that an equivalent maximum adversarial perturbation is given by three simple rules (Fig. 2 A).
Decrease all τi by ϵ
Decrease by ϵ for ligands with τi > T
Increase by ϵ for ligands with τi < T
The key relation to adversarial examples from [16] comes from considering what happens to the unbound receptors for which both and τi are initially 0. Let us consider a situation with L identical bound ligands with {kon = 1, τ} giving response where τ itself is of order 1 (in proper units). The three rules above imply that we are to decrease binding time by ∊, and that all R previously unbound receptors are now to be bound by ligands with hon = ϵ at small binding time ϵ. We compute the new response to be
If there are many receptors compared to initial ligands, and assuming ϵ ≪ τ, the relative change is of order 1 when ϵR ~ L giving a decrease comparable to the original response instead of being of order e as we would naturally expect from small perturbations to all parameters. Thus, if a detection process is based on thresholding variable T1,0, a significant decrease can happen with such perturbation, potentially shutting down response. Biologically, the limit where ϵR is big corresponds to a strong antagonistic effect of many weakly bound ligands, which yields the same effect as “competitive antagonism” in olfaction [41]2.
Behaviour across boundaries in sample space and adversarial perturbations
To further illustrate the correspondence, we compare the behaviour of a trained neural network classifying 3s and 7s with the adaptive proofreading model (N, m) = (1, 0) for more general samples. We build linear interpolations between two samples on either side of the decision boundary for both cases (Fig. 2 C-F, dotted blue boxes, linear interpolation factor f varying between 0 and 1). This interpolation is the most direct way in sample spaces to connect objects in two different categories. The neural network classifies linearly interpolated digits, while the adaptive proofreading model classifies gradually changing ligand distributions.
We plot the output of the neural network x just before taking the final logistic function σ defined in Fig. 1 F and similarly, we plot TN,m – 1 (in units rescaled by τc) for adaptive proofreading models. In both cases decision is thus based on the sign of the considered quantity. In the absence of adversarial/antagonistic perturbations, for both cases, we see that the score of the system almost linearly interpolates between values on either side of the classification boundary (top panel of Fig. 2 D, F, blue curves). However, in the presence of adversarial/antagonistic perturbations, the entire response is shifted way below the decision boundary (top panel of Fig. 2 D, F, red curves), so that in particular the initial samples at f = 0 (image of 7 or ligand distribution above threshold) are strongly misclassified.
Goodfellow et al. [16] proposed the linearity hypothesis as an explanation for this adversarial effect: adding η = ϵ sgn(∇xJ) to the image leads to a significant perturbation on the scoring function J of order ϵd, with d the usually high dimensionality of the input space. Thus many weakly lit up “background” pixels in the initial image can conspire to fool the classifier, explaining the significant shift in the scoring function in Fig. 2 D top panel. This is consistent with the linearity we observe on the interpolation line even without adversarial perturbations. A more quantitative explanation based on averaging is given in [48] on a toy-model: after defining a label y ∈ { – 1, +1}, a fixed probability p and a constant η, one can create a (d + 1)-dimensional feature vector x
From this, we can built a 100% accurate classifier in the limit of d → ∞ by averaging out the weakly correlated features x2,… xd, which gives the score . Taking the sign of favg will coincide with the label y with 99% confidence for . But such classification can be easily fooled by adding a small perturbation ϵ = –2ηy to every component of the features, since it will shift the average by the same quantity –2ηy, which can still be small if we take .
We observe a very similar effect in the simplest adaptive proofreading model. The strong shift of the average T1,0 in Eq. 2 is due to weakly bound receptors ϵR, which play the same role as the weak features (components x2,…, xd+1 above), hiding the ground truth given by ligands of binding time τ (equivalent to x1 above) to fool the classifier. We also see a similar linearity on the interpolation in Fig. 2 F top panel. There is thus a direct intuitive correspondence between adversarial examples in machine learning and the high number of available receptors R. In both cases, the change of scoring function (and corresponding misclassification) can be large despite the small amplitude ϵ of the perturbation. Once this perturbation is added, the system in Fig. 2 still interpolates between the two scores in a linear way, but with a strong shift due to the added perturbation.
Biomimetic defence for digit classification inspired by adaptive sorting
Kinetic proofreading [20, 38, 34] is a well-studied mechanism encoding different τ dependencies in the activation/repression branches of adaptive proofreading models [29]. The primary effect of kinetic proofreading is to non-linearly decrease the relative weight of weakly bound ligands with small binding times, thus ensuring defence against antagonism by weakly bound ligands. Inspired by this idea, we implement a simple defense for digit classification. Before feeding a picture to the neural network, we transform individual pixel values xi of image x by
Similarly to the defence of adaptive proofreading where ligands with small τ are filtered out, this transformation squashes greyish pixels with values below threshold θ to black pixels, see Fig. 2 C bottom panels.
In Fig. 2 D, bottom panel, we show the improved robustness of the neural network armed with this defence. Here, the adversarial perturbation is filtered out efficiently. Strikingly, with or without adversarial perturbation, the score now behaves non-linearly along the interpolation line in sample space: it stays flatter over a broad range of f until suddenly crossing the boundary when the digit switches identity (even for a human observer) at f = 0.5. Similarly, for adaptive sorting with (N, m) = (2,1), antagonism is removed, and the score exhibits the same behaviour of flatness followed by a sudden decrease on the interpolation line. Thus, similar defence displays similar robust behaviour of the score in sample space.
Gradient dynamics identify two different regimes
The dynamics of the score along a trajectory in sample space can thus vary a lot as a function of the model considered. This motivates a more general study of a “worst case” scenario, i.e. gradient descent towards the decision boundary for different models. Krotov and Hopfield studied a similar problem for an MNIST digit classifier, encoded with generalized Rectified polynomials of variable degrees n [25] (reminiscent of the iterative FGSM introduced in [27]). The general idea is to find out how to most efficiently reach the decision boundary, and how this depends on the architecture of the decision algorithm. Krotov and Hopfield identified a qualitative change with increasing n, accompanied by a better resistance to adversarial perturbations [26, 25].
We consider the same problem for adaptive proofreading models, and study the dynamics of binding times for a ligand mixture when following the gradient of TN,m (akin to a potential in physics). The goal is to see how to most efficiently fool the decision-maker (or in biological terms, how to best antagonize it). We iteratively change the binding time of non agonist ligands τ < τc to while keeping the distribution of agonist ligands with τ > τc constant. In the immune context, these dynamics can be thought of as a foreign agent trying to antagonize the immune system by rapidly mutating and generating antagonists ligands to mask its non-self part. Such antagonistic phenomena have been proposed as a mechanism for HIV escape [23, 35] and associated vaccine failure [21].
From a given ligand mixture with few ligands above threshold and many ligands below thresholds, we follow the dynamics of Eq. 6, and display the ligand distribution at the decision boundary for different values of N, m as well as the number of steps to reach the decision boundary in the descent defined by Eq. 6 (Fig. 3, see also Appendix 3 Figure 1 for another example with a visual interpretation). We observe two qualitatively different dynamics. For small m, we observe strong adversarial effects, as the boundary is almost immediately reached and the ligand distribution barely changes. As m increases, in Fig. 3 A the ligands in the distribution concentrate around one peak. For m = 2, a qualitative change occurs: the ligands suddenly spread over a broad range of binding times and the number of iterations in the gradient dynamics to reach the boundary drastically increases. For m > 2, the ligand distribution becomes bimodal, and the ligands close to τ = 0 barely change, while a subpopulation of ligands peaks closer to the boundary. Consistent with this, the number of e-sized steps to reach the boundary is 3 to 4 orders of magnitude higher for m > 2 as for m < 2.
Qualitative change in dynamics is due to a critical point
The qualitative change of behaviour observed at m = 2 can be understood by studying the contribution to the potential TN,m of ligands with very small binding times τϵ ~ 0. Assuming without loss of generality that only two types of ligands are present (agonists τag > τc and spurious τspurious = τϵ), an expansion in τϵ gives, up to a constant, for small τϵ (see Fig. 3 D for a representation of this potential and Appendix 3 for this calculation). In particular, for 0 < m < 1, diverges as τϵ → 0. This corresponds to a steep gradient of TN,m so that the system quickly reaches the boundary in this direction. The ligands close to τϵ ~ 0 then quickly localize close to the minimum of this potential (unimodal distribution of ligand for small m on Fig. 3 A, B).
The potential close to τϵ ~ 0 flattens for 1 < m < 2, but it is only at m = 2 that a critical point appears at τϵ = 0, qualitatively modifying the dynamics defined by Eq. 6. For m ≥ 2, due to the new local flatness of this gradient, ligands at τ = 0, the critical point of Eq. 6, are pinned by the dynamics. By continuity, dynamics of the ligands slightly above τϵ = 0 are critically slowed down, making it much more difficult for them to reach the boundary. This explains both the sudden broadening of the ligand distribution, and the associated increase in the number of steps to reach the decision boundary. Conversely, an inflexion point (square) appears in between the minimum (circle) and τϵ = 0 (Fig 3 B). Ligands close to the inflexion point separate and move more quickly towards the minimum of potential, explaining the bimodality at the boundary. For larger (N, m) we obtain flatter potentials, and a larger number of iterations. In Appendix 2, we further describe the consequence of adding proofreading steps on the position of the boundary itself, using another concept of machine learning called “boundary tilting” [46] (Appendix 2, Figure 1 and Table 1).
Categorization of attacks
The transition at m = 2 is strongly reminiscent of the transition observed by Krotov and Hopfield in their study of gradient dynamics similar to Eq. 6 [25]. In both our works, we see that there are (at least) two kinds of attacks that can bring samples to the decision boundary. The FGSM corresponds to small perturbations to the input in terms of L∞ norm leading to modifications of many background pixels in [25] or many weakly bound ligands for the adaptive proofreading case, also similar to the meaningless changes in meta-features described above in Eq. 4 [48].
Defence against the FGSM perturbation is implemented through a higher degree n of the rectified polynomials in [25], while in adaptive proofreading, this is done through critical slowing down of the dynamics of Eq. 6 for m > 2. The latter models are nevertheless sensitive to another kind of attack with many fewer perturbations of the inputs but with bigger magnitude. This corresponds to digits at the boundary where few well-chosen pixels are turned on in [25]. For adaptive proofreading models this leads to the ligand distribution becoming bimodal at the decision boundary. Two important features are noteworthy. First, the latter perturbations are difficult to find through gradient descent (as illustrated by the many steps to reach the boundary in Fig. 3 A). Second, the perturbations appear to be “meaningful” and it is difficult or even impossible to recover the “ground truth” by inspecting the sample at the decision boundary. Digits at the boundary for [25] appear indeed ambiguous to a human observer, and ligand distribution peaking just below threshold are potentially misinterpreted biologically due to inherent noise. This has actually been observed experimentally in T cells, where strong antagonists are also weak agonists [2, 13], meaning that T cells do not take reliable decisions in this regime.
Biomimetic defenses against few-pixel attacks
It is then worth testing the sensitivity to localized stronger attacks of digit classifiers, helped again with biomimetic defences. The natural analogy is to implement attacks based on strong modification of few pixels [43]. For this problem, we choose to implement a two-tier biomimetic defence: we implement first the transformation defined in Eq. 5, that will remove influence of the FGSM types of perturbations by flattening the local landscape as in Fig. 2 D. In addition, we choose to add a second layer of defence where we simply average out locally pixel values. This can be interpreted biologically as a process of receptor clustering or time-averaging. Time-averaging has been shown to be necessary in a stochastic version of adaptive proofreading [13, 29], where temporal intrinsic noise would otherwise make the system cross the boundary back and forth endlessly. In the machine learning context, local averaging has been recently proposed as a way to defend against few pixel attacks [52], which thus can be considered as the analogous of biochemical noise.
We then train multiple classifiers between different pairs of handwritten digits. Following the approach of “one pixel” attack [43], we consider digits classified in presence of this two-tier defence, then sequentially fully turn pixels on or off ranked by their impact on the scoring function, until we reach the decision boundary.
Representative results of such few-pixels attacks with biomimetic defences are illustrated in Fig 3 C (“pre-filter” column, local average is shown for further comparison in column “mean-filter”), with other examples shown in Appendix 4 Figure 1 and details on the behaviour of scoring functions in Appendix 4 Figure 2. Clearly the attacked samples at the boundaries hide the “ground truth” of the initial digit, and as such can not be considered are meaningless adversarial perturbations. Samples at the boundary superficially look like printed Japanese “kanas” or Greek characters (e.g. attacks from 0 to 1 typically look like a ϕ, see Appendix 4), making them impossible to classify as Arabic digits even for a human observer. This is consistent with the ambiguous digits observed for big n by Krotov and Hopfield [25]. In other cases, samples at the boundary between two digits actually look like another digit: for instance, we see that the sample at the boundary between a 6 and an 9 look like a 5 (or a Japanese ). This observation is consistent with previous work attempting to interpolate in latent space between digits [4], where at the boundary a third digit corresponding to another category may appear. We also compare in Fig 3 C the sample seen by the classifier at the boundary after the biomimetic defences with a “control” corresponding to the average between the initial digit and the target of the attack (corresponding to the interpolation factor f = 0.5 in Fig. 2 C-D). It is then quite clear that the sample generated by the attack is rather close to this control boundary image. This, combined with the fact samples at the boundary still look like printed characters without clear ground truth indicate that the few pixel attacks implemented here actually select for “meaningful” features. The existence of meaningful features in the direction of the gradient have been identified as a characteristic of networks robust to adversarial perturbation [48] similar to results of [25] and our observation for adaptive proofreading models above.
Discussion
Complex systems (in vivo or in silico) integrate sophisticated decision making processes. Our work illustrates common features between neural networks and a general class of adaptive proofreading models, especially with regards to mechanisms of defence against targeted attacks. Parallels can be drawn between these past approaches, since the models of adaptive proofreading presented here were first generated with in silico evolution aiming at designing immune classifiers [29]. Strong antagonism naturally appeared in the simplest simulations, and required modification of objective functions very similar to adversarial training [16].
Through our analogy with adaptive proofreading, we are able to identify the presence of a critical point (due to kinetic proofreading) as the crucial mediator of robust adversarial defense, essentially squashing the spurious adversarial directions. Another layer of defence can be added with local averaging. This is in line with current research on adversarial robustness, showing that robust networks exhibit a flat loss landscape near each training sample [36].
An interesting by-product of local flatness is the appearance of an inflexion point in the gradient descent dynamics. If the scoring function is flat close to a sample far from the boundary, nonzero at the boundary, then flat close to another sample, an inflexion point is expected via Rolle’s theorem. This is visible in Fig. 2 D-F: while the score of non-robust classifiers is linear when moving towards the decision boundary, the scoring function of classifiers resistant to adversarial perturbations is flat at f = 0 and only significantly changes when the input becomes ambiguous near the inflexion point. The presence of this inflexion point is bound to strongly influence the gradient descent dynamics. For instance, for adaptive proofreading models, the ligand distribution following the dynamics of Eq. 6 changes from unimodal to bimodal at the boundary, creating ambiguous samples. For a robust classifier, such samples are thus expected to appear close to the decision boundary since they coincide with large gradients. As such they could correspond to meaningful features (contrasting the meaningless adversarial perturbations), as we show in Fig. 3 with our digit classifier with biomimetic defence. Examples in image classification might include the perturbed animal pictures fooling humans [8] with chimeric images that combine different animal parts (such as spider and snake) or the meaningful adversarial transformations between samples found in [48]. Similar properties have been observed experimentally for ambiguous samples in immune recognition: maximally antagonizing ligands have a binding time just below the decision threshold [2]. We interpret this property as a consequence of the flat landscape far from the decision threshold leading to a steep gradient close to it [13, 12].
A possible caveat of our analogy is that a clear decision axis in the n direction can be accounted for explicitly in adaptive proofreading models (even though it is still convolved with ligand quantity, thus requiring an adaptive proofreading architecture). In machine learning the space of inputs is much more complex, and there are generally more than two categories. Here, the algorithm effectively has to learn representations, such as pixel statistics and spatial correlations in images [24]. However, underlying, low manifold descriptions could locally still combine higher level information in ways similar to parameter n, so that the theory presented here could still apply once those directions are discovered. Coming back to biology, it was shown mathematically that for the classification problem of discriminating ligand quality irrespective of their quantity, one always gets antagonism close to the boundary [12]. This is precisely due to the necessary presence of a significant gradient in the direction of the decision-making. However, one can change the binding time of maximally antagonizing ligands via the nonlinearities in kinetic proofreading. Similar inevitability theorems might be generalizable to machine learning, for instance via a robustness-accuracy trade-off [48]. Adversarial examples are potentially impossible to fully remove, yet the effective adversarial perturbation may shift from a pile of meaningless features to a combination of meaningful features, giving ambiguous patterns at the decision boundary.
From the biology standpoint, new insights may come from the general study of computational systems built via machine learning. Our study of Fig. 3, inspired by gradient descent in machine learning, suggests that cellular decision-makers exist in two regimes. The difference between these regimes are geometric by nature, with or without critical points. The case m < 2 with a steep gradient could be more relevant in signalling contexts to separate mixtures of inputs, so that every weak perturbation should be detected. For olfaction it has been suggested that strong antagonism allows for a “rescaling” of the distribution of typical odor molecules, ensuring a broad range of detection irrespective of the quantity of molecules presented [41]. The case m ≥ 2 is much more resistant to adversarial perturbations, and could be most relevant in an immune context where T cells filter out antagonistic perturbations. This might be relevant for the pathology of HIV infections [23, 35, 21] or, more generally, could provide explanations on the diversity of altered peptide ligands [49]. We also expect similar classification problems to occur at the population-level, e.g. when T cells interact with each other to refine individual immune decision-making [5, 50].
We have connected machine learning algorithms to models of cellular decision-making, and in particular their defence strategies against “adversarial” attacks. More defences against adversarial examples might be found in the real world, for instance in biofilm-forming in bacteria [53], in size estimation of animals [28], or might be needed for proper detection of physical 3D objects [3] and road signs [9]. Understanding the whole range of possible antagonistic perturbations may also prove crucial for describing immune defects, including immune escape of cancer cells. It is thus important to further clarify possible scenarios for fooling the classifier in both artificial and living systems.
Acknowledgements
We thank Joelle Pineau and members of the Francois group for useful discussions. P.F. is supported by a Simons Investigator in Mathematical Modelling of Living Systems award, an Integrated Quantitative Biology Initiative award and a Natural Sciences and Engineering Research Council award (Discovery Grant). T.J.R. receives funding from the Centre for Applied Mathematics in Bioscience and Medicine (graduate award) and McGill Physics (Schulich award). E.B. acknowledges support from the Samsung Advanced Institute of Technology and the Fonds de Recherche du Quebec – Nature et Technologies (graduate award).
Appendix 1 Mathematical description of the adaptive proofreading models
We assume an idealized situation where a given receptor i, upon ligand binding (on-rate ) can exist in N biochemical states (corresponding to phosphorylation stages of the receptor tails in the immune context [34, 22]). Those states allow the receptor to effectively compute different quantities, such as , which can be done with kinetic proofreading [20, 38, 34]. In particular, ligands with larger τ give a relatively larger value of due to the geometric amplification associated with proofreading steps. We assume receptors to be identical, so that any downstream receptor processing by the cell must be done on the sum . We also consider a quenched situation in which only one ligand is locally available for binding to every receptor. In reality, there is a constant motion of ligands, such that and τi are functions of time and stochastic treatments are required [42, 30, 37], but on the time-scale of primary decision-making it is reasonable to assume that the ligand distribution does not change much [2]. Probability of decision-making in this context is a monotonically increasing function of the quantity
If L ligands with identical τ and kon are presented to the T cell, we have . In this situation, a threshold can be easily defined for decision, irrespective of the quantity L of ligands presented.
If we now add La antagonists with lower binding time τa < τ and equal on-rate kon, we have , which is smaller than the response τN–m for a single type of ligands, corresponding to ligand antagonism (Fig. 1 D, main text) [15, 7, 2, 11]
Appendix 2 Boundary tilting
To further draw the connection between machine learning and adaptive proofreading models, we will study a framework to interpret adversarial examples called boundary tilting [46]. We will first illustrate this effect on the discrimination of the original MNIST 3 vs 7 problem MNIST from [16]), after which we will interpret boundary tilting via proofreading.
Digit classification
A typical 3 and 7 (i), the averages and (ii), and the corresponding adversarial examples (iii, iv) are shown in Fig. 1 A. Tanay and Griffin [46] pointed out that the adversarial perturbation generated with the Fast Gradient Sign Method (FGSM) proposed in [16] can also be found via , Fig. 1 A (v). Note the similarity to the adversarial perturbation from the FGSM sign(w) = sign (∇x J) (Fig. 1 A (vi)).
To reveal the linearity of binary digit discrimination, we computed the principal components (PCs) of the traditional training set of 3s and 7s, and projected all digits in the test set on PC1 and PC2 (Fig. 1 B). With a linear Support Vector Classifier (ordinary linear regression) trained on the transformed coordinates PC1 and PC2 of the training set, we achieve over 95% accuracy in the test set. While such accuracy is far from the state-of-the-art in digit recognition, it is much higher than typical detection accuracy for single cells (e.g. T cells present false negative rates of 10 % for strong antagonists [2]). The red and blue star in Fig. 1 denote the average digit .
Next, we transformed the test set as 3 → 3′ = 3 – ϵtest D, 7 → 7′ = 7 + ϵtest D, where ϵtest = 0.4 is the strength of the adversarial perturbation (Fig. 1 A (iii)). and moved closer in Fig. 1 B, orthogonal to the decision boundary and along the line between the initial averages. This adversarial perturbation moves the digits in what we call an adversarial direction perpendicular to the decision boundary, and reduces the accuracy of the linear regression model to a mere 69%.
Goodfellow et al. proposed adversarial training as a method to mitigate adversarial effects by FGSM. We implemented adversarial training by adding the adversarial perturbation to the images in the training set, computing the new PCs and training the linear regression model. This effectively “tilts” the decision boundary, while preserving 95% accuracy. In the presence of the original adversarial perturbations, we see the effect of the tilted boundary: the perturbation moves digits parallel along the decision boundary, which results in good robust accuracy. This is a illustration example of the more general phenomenon studied in [46].
Boundary tilting and categorizing perturbations
We consider the change in TN,m for arbitrary N, m upon addition of many spurious ligands. Generalizing Eq. 2 in the main text gives
From this expression, we note that TN,m is changing significantly with respect to its initial value upon addition of many weakly bound ligands as soon as ϵm+1R is of order L. Thus, the effect described in the main text for weighted averages where (N, m) = (1, 0) also holds for nonlinear computations as long as m is small. It appears that the general strategy to defend against this adversarial perturbation is by increasing m, as previously observed in [29]. Biochemically, this is done with kinetic proofreading [34, 2, 13], i.e. we take an output TN,m with N > m ≥ 1. Here, the output is no longer sensitive to the addition of many weakly bound self ligands, yielding an inversion of the antagonistic hierarchy where the strongest antagonizing ligands exist closer to threshold [12]. An extreme case has been proposed for immune recognition where the strongest antagonists are found just below the threshold of activation [2].
We numerically compute how the decision boundary changes when L2 ligands at τ2 are added to the initial L1 ligands at τ1, i.e. we compute the manifold so that is equal to . We represent this boundary for fixed τ2 and variable L1, L2, τ1 in Fig. 1 C. Boundary tilting is studied with respect to the reference L2 = 0 plane corresponding to the situation of pure L1 ligands at τ1, where the boundary is the line τ1 = τc. The case (N, m) = (1, 0) (Fig. 1 C, left panel), corresponds to a very tilted boundary, close to the plane L2 = 0, and a strong antagonistic case. In this situation, assuming τ1 ~ τc, each new ligand added with τ2 close to 0 gives a reduction of T1,0 proportional to in the limit of small L2 (see next section, [11]), which is again of the order of the response T1,0 = τ1 ~ τc in the plane L2 = 0. This is clearly not infinitesimal, corresponding to a steep gradient of T1,0 in the L2 direction. We call the perturbation in this case “adversarial”. This should be contrasted to the case for higher m (Fig. 1 C, middle left) where the boundary is vertical, independent of L2, such that decision-making is based only on the initially present L1 ligands at τ1. Here, the change of response induced by the addition of each ligand with small binding time τ2 is , due to proofreading a very small number when τ2 ≃ 0 [11]. Contrary to the previous case, the gradient of TN,m with respect to this vertical direction is almost flat and very small compared to the response in the L2 = 0 plane. We call the perturbation in this case “non-adversarial”.
Tilting of the boundary only occurs when τ2 gets sufficiently close to the threshold binding time τc (Fig. 1 C, right panels). In this regime, each new ligand added with quality τ2 = τc – ϵ contributes an infinitesimal change of TN,m proportional to , which gives a weak gradient in the direction L2. But even with such small perturbations one can easily cross the boundary because of the proximity of τ2 to τc, which explains the tilting. The cases where the boundary is tilted and the gradient is weak are of a different nature compared to the adversarial case of Fig. 1 C, left panel. Here the boundary is tilted as well, but the gradient is steep, not weak. For this reason we term the cases on the right panels “ambiguous”. Similar ambiguity is observed experimentally: it is well known that antagonists (ligands close to thresholds) also weakly agonize an immune response [2]. Our categorization of perturbations is presented in Table 1. Scripts for boundary tilting in ligand discrimination and digit discrimination are available at https://github.com/tjrademaker/advxs-antagonism-figs/.
Gradient in the L2 direction
We recall results from [12] to show how the addition of subthreshold ligands one at a time changes the output. We first consider {L1,τc} threshold ligands with output
The main result of [12] is the linear response of TN,m(L1, τc) to the addition of {L2, τc – ϵ} subthreshold ligands. where we used the definition for the coefficient in a mean-field description. As the derivative , and ϵ = τ2 – τc, each additional subthreshold ligand at τ2 decreases the output with a value proportional to
In the case (N, m) = (1, 0), the mean-field approximation is exact, i.e. the first derivative of is the only nonzero derivative, given by
With the addition of a single subthreshold ligand τ2 ~ 0, so that ϵ ~ τc, the output is maximally reduced by , a finite quantity, as described in the main text. For higher m, the linear approximation holds only for ligands at τ2 close to threshold.
Appendix 3 Ligand distribution at the decision boundary
Adaptive proofreading is well-suited to characterize the decision boundary between two classes, because we can work with an analytical description. We want to know how to most efficiently change the binding time of the spurious binding ligand (with small τ) to cause the model to reach the decision boundary. We have taken inspiration from [25] and adapted our approach from the iterative FGSM [27]. At first, we sample Ls self ligands from a normal distribution folded around τ = 0 and LAg agonist ligands from a narrowly peaked normal distribution above τc. The agonist ligand distribution, the “signal” in the immune picture, remains constant. Next, we bin ligands in M equally spaced bins τb, b ∈ [1, M], and we compute the gradient for those bins for which τb < τc where Lb is the number of ligands in the bth bin. We subtract this value multiplied by a small number ϵ from the exact binding times, as in Eq. 6 in the main text, and we compute a new output TN,m. We repeat this procedure until TN,m dips just below the response threshold . We then display the ligand distributions. We bin ligands and compute the gradient in batches to prevent the gradient from becoming negligibly small. If we would compute the gradient for each ligand with an individual binding time, there would be exactly one ligand with that specific binding time, and because the gradient scales with L, we would need to go through many more iterations. Decreasing the binsize and step size e may enhance the resolution, but is not required. We found good results by considering bins with a binsize of 0.2s and ϵ = 0.2.
MTL pictures
We can visually recast immune recognition as an image recognition problem by placing pixels on a grid and coloring them based on their binding time with a given scale. We chose to let white pixels correspond to not self (τ > τc), gray pixels to antagonist ligands (τa < τ < τc) and black pixels to self ligands τ ≪ τa. We are free to introduce any kind of spatial correlation to create “immune pictures” from a ligand distribution. This results in what we term “MTL-pictures”, Fig. 1. The initial ligand distribution, MTL picture and scale are given on the left. We perform iterative gradient descent like in the main text, and plot the ligand distribution and the corresponding immune pictures at the boundary for various (N, m) (Fig. 1). The results are striking. For a T cell operating in the adversarial regime, the “signal” MTL is unaltered at the decision boundary. At the transition m = 2, we see a slight change of color, while in the ambiguous regime, the signal actually changes from MTL to ML. As we desire for a robust decision-maker, the response should switch when the signal becomes significantly different. From this we conclude, only in the robust regime can Montreal turn fully into the city of Machine Learning.
For the ligand distribution in the main text in Fig. 3 A, we have drawn Lself = 7000 from and Lag = 3000 from . For the MTL pictures in Fig. 1, we have distributed the pixels in the 179×431 frame – equal to R, the number of receptors – as Lself = 0.60R, La = 0.12R, Lag = 0.28R. We sampled self ligands from , antagonists from and agonists from , and set τc = 3. The picture is engineered such that the agonist ligands fill the M and the L, the antagonists fill the T (which is why the T is slightly darker than the M and L). The self ligands fill the area around the letters M, T and L, such that the self with highest binding time surround the T. We have chosen this example to make the effect of proofreading explicit (and of course because we are based in Montreal and study Machine Learning). This result is generic, and the ambiguity of instances at the decision boundary of a robust model can be visualized with any well-designed image. Scripts to reproduce Fig. 3 A and Fig. 1 are available at https://github.com/tjrademaker/advxs-antagonism-figs/.
Behaviour for small binding times
Consider a mixture with L1 ligands at τ1 > τc and L2 ligands with small binding time τ2 → τϵ = ϵτ1 ≪ τ1. To understand the behaviour of TN,m as a function of τϵ we expand TN,m in small variable as which confirms that up to a constant for m large and , as well as that
Appendix 4 Few-pixel attack
The few-pixel attack connects to ligand antagonism in the sense that few-pixels are needed to cause misclassification, corresponding to the addition of few maximally antagonizing ligands to a mixture fooling robust adaptive proofreading models. It is not the most efficient attack against a classifier without biomimetic defence, but it is the most efficient attack against classifiers with biomimetic defence, equivalent to adaptive proofreading models with m > 1. For these adaptive proofreading models, there exists a unique maximally antagonistic binding time, defined as the binding time that maximally reduces TN,m.
With this in mind, we decided to make pixels black or white in a controlled manner, until the neural network classifies the perturbed, initial digit as the target class. In the following, we will refer to several stages of the few-pixel attack using Fig. 1. We first computed what we term pixelmaps. Pixelmaps contain the change of score when making a pixel white or black. In Fig. 1, blue colors correspond to pixels that will lower the score when turned white or black, while red colors are for pixels that will increase the score for the same operation. A grey color means the score is unchanged when whitening or blacking the pixel. The pixelmaps are scaled to the maximum change in score. We proceed in merging and sorting the pixelmaps from maximum to minimum change in score towards the target class, iteratively following the sorted list to decide which pixels in our digit to turn white or black. We do this until we reach the decision boundary (first iteration in which the digit is misclassified). The pre-filtered digits (Fig. 1, red rectangle) are the resulting boundary digits. They already contain perturbations corresponding to real features, but have an air of artificiality to them which allows us to fairly easily distill the ground truth. We remove this with a mean filtering [52], which is a 3×3 convolutional block that computes mean pixel values as
Biologically, this is pure receptor clustering, where a perturbation to a single receptor locally affects other ligands. Such digits are truly ambiguous digits that are tough to classify even as humans. These are the type of digits we expect to find on the decision boundary. Finally, we compare the mean-filtered digit at the decision boundary to the control: the sum of the initial digit and the hill function (N = 3; θ = 0.5) on the average of all digits in the target class, then mean-filtered (Fig. 1 for a step-by-step composition). We apply the mean-filter to the control to again remove the artificiality of a digit plus an average, and make the comparison between boundary digit and control digit fairer. The similarity between mean-filtered boundary digit and control digit confirms our intuition that we are actually operating in the space between both classes when misclassification occurs.
We can also apply the mean-filter to the initial digit before generating the pixelmaps, and during the procedure, check the score on the mean-filtered perturbed image. This gives similar results, as we see by following the trajectory of the score for boundary-null and boundary-mean. We have shown the score explicitly in Fig. 2 for the digits in Fig. 1. The behavior of the score is remarkably similar to the interpolation between ligand mixtures (Fig. 2F, bottom panel in the main text). A nonlinear filtering method proposed in [52] is the median-filter, but this one works less well for black-and-white pixels.
We have shown examples that are generated when we select for instances where the number of iterations is large enough (20 suffices, we still consider this to be a few-pixel attack, keeping in mind that digits have 784 individual pixels). The authors of [43] specifically searched for single pixel attacks. Examples of single-pixel misclassification exist in our neural networks trained on two types of digits in MNIST too, but these we found non-informative. In cellular decision-making, this case corresponds to adding a single antagonist ligand to a ligand mixture to cause misclassification. This is only possible if the ligand mixture is already very close to the boundary. For such samples, we do not expect ambiguity to appear. Remember that near the boundary, the score landscape is steep, and small additions have a large effect.
Footnotes
↵1 an alternative choice without loss of generality is to consider a situation where for unoccupied receptors, ki; is 0 but τi is arbitrary, corresponding to a ligand available for binding
↵2 One difference with olfaction is that for competitive antagonism, the concentration C is of order 1 while the affinity κ−1 is big, conversely, here the concentration R is big while kon is low. Since we consider the product of both terms, both situations lead to similar effects, but our focus on a small change of kon makes the comparison with machine learning more direct.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵