Abstract
Transportation of livestock carries the risk of spreading foreign animal diseases throughout a susceptible population, leading to costly public and private sector expenditures on disease containment and eradication. Individual animal tracing systems that exist in countries other than the US have allowed epidemiologists and veterinarians in those countries to model the risks engendered by livestock movement and prepare responses designed to protect the livestock industry. Within the US, data on livestock movement is not sufficient for direct parameterization of disease models, but network models that assimilate limited data provide a path forward in model development to inform preparedness for disease outbreaks in the US. Here, we report on a novel data stream, the information publicly reported by US livestock markets on the origin of cattle consigned at live-auctions, and demonstrate such potential. By aggregating weekly auction reports from markets in several states, some spanning multiple years, we obtain an ego-centric sample of edges from the dynamic cattle transportation network in the US. We first demonstrate how the sample might be used to infer shipments to unobserved livestock markets in the US, although we find the assumptions of edge prediction by generalized linear models too restrictive. The sample itself, however, can still be used to parameterize simplified disease models; which we use to demonstrate that the temporal resolution of the data is sufficient to reveal seasonal trends in the risk of disease outbreaks. We conclude that future work on statistical models for dependence between edges will improve the inference of a complete cattle movement network model from market data, one able to addresses the capacity of markets to spread or control livestock disease.
Author Summary We have “crowd-sourced” the collection of previously unavailable cattle movement data, benefiting from buyers interest in the origins of cattle sold at live-auction markets, to implement a minimum level of movement surveillance. Using our novel dataset, we demonstrate potential to infer a complete dynamic transportation network and model national-scale livestock epidemics.
Introduction
Livestock operations within the United States (US) must be vigilant against transboundary animal diseases, including the critical threat to cattle producers posed by a re-introduction of foot-and-mouth disease [FMD; 1 ]. The 2001 FMD outbreak in the United Kingdom (UK) cost their agricultural sector £3 billion, and 5% of the nation’s 11 million cattle were culled to control the disease [2]. A study on FMD risk to California’s 5 million beef and dairy cattle predicts economic losses in the tens of billions of dollars, even for an outbreak artificially terminated at California’s border [3]. The potential impact of a full-blown epidemic in the US, putting at risk a 90 million strong cattle herd [4], compels us to study the likely patterns of disease spread from an initially infected cattle operation [5]. Mechanistic models that incorporate livestock transportation are needed to help guide prevention and control of FMD-like diseases, which are known to cause massive burdens on livestock industries, require costly public interventions, raise public health concerns and impact food security [6].
Studies of past livestock epidemics and disease simulations reveal that network models provide a useful abstraction of data on animal shipments between livestock operations [7]. Network models typically emphasize heterogeneity in the number of disease transmitting contacts attributed to infectious nodes, a pattern that emerged strongly during the initial spread of FMD in the UK’s 2001 epidemic [8] and one usually absent from simulations where transmission depends on distance alone. Network representations of cattle trade exist for several European livestock industries [9–14], where data for model development is generated from animal tracking systems mandated by the European Parliament [15]. Availability of these data have allowed for several advances in surveillance and control strategies: for example, (1) identification of “sentinel” livestock premises projected to become infected early during an outbreak in Italy [16], (2) validation of risk reduction from the standstill rules implemented in the UK after 2001 [17], and (3) evaluation of targeted movement bans that selectively eliminate network edges based on node centrality in transportation networks [14]. Network models for the UK cattle production system have additionally provided a foundation for livestock transportation strategies that promise efficient control of endemic diseases [18].
The US has opted against individual animal tracking in favor of animal disease traceability, which only requires that a paper trail on individual movements can be unearthed subsequent to disease detection. With respect to the development of models for disease prevention and preparation, the traceability principle promotes inadequate and belated data collection, and is significantly limited compared to the point-to-point data on individual animal movements that drive models of European systems. In addition, the findings gleaned from epidemiological models based on European livestock transportation patterns are not transferable to the US due to the nature and scale of the US industry: the US industry has developed an unparalleled feedlot system that relies on widely distributed calving and back-grounding farms to supply the 25 million head sold annually by feeding operations [4], and the US Department of Agriculture (USDA) census shows that 22% of the US herd are transported across state lines in a year [19].
The best data currently available is maintained by individual state agricultural agencies, which collect shipment origin and destination locations on interstate certificates of veterinary inspection (ICVIs) for cattle entering the state. One cattle transportation network model has been estimated from a 10% sample of year 2009 ICVIs acquired from 48 state agencies [20, 21]. The epidemiological network model constructed from these data [22] represents the state of the art for analysis of a nation-wide epidemic in the US, but limitations in the underlying data are substantial. Given that on average only 19.2 (SEM 3.1) percent of shipments onto US beef operations travel over 100 miles [23], shipping within states likely occurs at much higher rates than shipments documented on ICVIs. Interannual variability in the transportation network cannot be observed without repeating a major collection effort, nor can the type of origin or destination facility be determined from ICVIs alone. Finally, and of most relevance to the present study, ICVIs are not required for shipments to exempt facilities, including certain federally approved livestock markets [24].
Cattle in the US are commonly sold at live auction markets between stages of beef or dairy production (Box 1). In year 2007–08 surveys by the USDA, over half of beef producers sold non-breeding stock at auction markets in the US (60.7% steers, 58.3% cows), with internet auctions and private treaties being the main alternatives [25]. Dairy contributes fewer US cattle shipments, but cows removed from dairy operations are also predominantly sent to auction markets or stockyards [21, 26]. Local epidemiological studies have also found direct contacts with livestock markets prevalent in Colorado and Kansas [27] and California [28]. The economic reality now, and for the foreseeable future, is that cattle owners regularly buy and sell cattle at particular stages of production and rely on live-auction markets to obtain the best price [29, 30].
In the UK, and quite possibly in other countries with recent outbreaks of nonendemic FMD [31], livestock markets played a central role in early, rapid expansion of the 2001 FMD epidemic [8]. The epidemiological importance of livestock markets arises from their potentially high degree in both contact and epidemic networks—the same epidemic phenomena airports create as hubs for transmission and spread of human influenza [32]. When a livestock operation ships infected animals to market, two processes spread the disease: splitting of the original group of infected animals among multiple buyers, and transmission to susceptible animals passing through the same market [33]. Both processes act to give livestock markets high out-degree in an epidemic network for FMD [8,34], while a less contagious disease would primarily be affected by splitting up infected animals arriving from one premises. Livestock markets can also have high in-degree within the contact network, or a large number of operations from which cattle are sourced [35]. High in-degree markets are a natural point of surveillance for disease detection; indeed, markets seeking USDA approval are required to provide veterinary inspection of cattle at auction [24].
On the premise that transportation of cattle to and from markets is an important class of potential disease-transmitting contacts between farms and other longer-term animal holdings, we studied the potential for data-driven modeling of a market-based contact network for infectious livestock disease in the US. To this end, we collected data on the locations of origin for individual cattle sold at livestock markets that publicly share this information as a form of advertising. Because the data represent an opportunistic sample of livestock transportation, we first report basic trends in the data alongside some potential sources of bias in the sample. We then demonstrate two methods for inference of a network model of contacts between livestock operations from these data: (1) using the sample to estimate degree distributions of an otherwise random contact network, and (2) fitting coefficients to covariates that may predict the presence and weight of unobserved network edges. The results firmly establish that market-bound cattle shipments are dominated by intra-state movements, and are consistent with the possibility that transportation to markets also drives interstate flows. The daily resolution of this data source allows detection of sub-annual variation in trade volume and network degree distributions, and as a continuously updated data stream creates potential for both inter-annual trend and recent-event detection. We demonstrate high potential for inferring the properties of unobserved edges connected to non-reporting livestock markets, and conclude with a discussion of the critical gaps for building a complete epidemic network model that includes markets acting as hubs for the spread of livestock disease.
Methods
Data Collection
As an integral part of livestock production, stockyards are distributed across all parts of the US with beef or dairy operations, i.e. throughout the US [Box 1; 36]. Prices obtained at live auction are rapidly publicized to help consignors and buyers decide when and where to trade cattle. In some cases, professional market reporters attend sales and distribute volume and price information through the USDA Market News Service. Where market reporters are unavailable, or to provide additional information, sale reports might be generated by the market itself and publicized on its own website. A subset of markets list specific lots of cattle sold in their sale reports, including a location of origin, number of cattle, and other attributes. These data, sometimes labeled “representative sales” as we refer to them here, indicate cattle were transported from the origin to market on, or very near, the sale date.
We aggregated representative sales from several livestock markets and georeferenced each location of origin to a US county or county-equivalent (hereafter “county”). Overlapping and incomplete directories of US livestock markets are maintained by multiple regulatory agencies or business associations: we compiled four such directories to identify target markets [36]. The directory released by the Livestock Marketing Association [37] uniquely provides websites of livestock markets, when available. We manually searched the 322 listed websites for representative sales, and wrote software to parse data from sites that regularly (usually weekly) publish market reports and which permit crawling by the Robots Exclusion Protocol. For each lot provided as a representative sale, the software attempts to parse the consignor’s location along with cattle type (e.g. steer or heifer), number, average weight and price (either per head or per hundred weight). A single animal per lot was assumed whenever the number of cattle could not be parsed. We tuned the parser for each website until two researchers found no data extraction errors in independent spot checks of representative sales. Websites were subsequently checked twice weekly for new reports, and parsers returning data of the wrong type (i.e. string or numeric) were promptly corrected. This study addresses sale reports obtained between June 2014 and June 28, 2015, including some archived reports on sales dating from the first week of 2012.
Acceptable locations of origin are given as the name of a city or other populated place, with or without a state, and can be ambiguous. We matched each location, substituting common abbreviations for full words as needed to obtain a match, to names of populated places in North America using the GeoNames web-service [38] in order to identify the encompassing county. The county closest to the reporting market, as determined by the great-circle distance (GCD) between county centroids [39], is recorded as the true location of origin.
Comparison with Interstate Shipments
Interstate cattle shipments are present among representative sales, allowing a comparison to cattle shipping data obtained from state ICVI records. For each state with at least one market in our study, we correlated the number of cattle in representative sales originating in every other state with the analogous interstate flows reported by Shields and Mathews [5]. The ICVI data sampled shipments occurring in the 2001 calendar year, which pre-dates all sale reports collected for this study. To match the time scale of the certificate-derived data, we aggregated representative sales over the year preceding June 28, 2015 before calculating correlations. However, the comparison necessarily reflects over a decade of change in the livestock system on top of any differences between market shipments and shipments accompanied by ICVIs.
Analysis of Sampling Rate
We analyzed variation in the sample size for each sale report by fitting a GLMM to the number of head in representative sales, given the total head of cattle sold (“receipts”), using covariates from the agricultural census [4]. The analysis intends to address two issues: estimation of sampling rate for reports where the total receipts is unknown, and detection of potential bias among representative sales. Representative sales are not randomly sampled with equal weight from all cattle shipped to reporting markets, but are effectively stratified by sale report. Because markets may vary the proportion of sales listed as representative, each sale report should have an associated sample weight for shipments found in that report. Estimation that involves aggregation across sale reports should take this weight into account, but receipts are unknown for roughly one third of sale reports. By taking covariates into account while estimating unknown receipts with a fitted GLMM, resulting estimates will address certain biases that show up as statistical associations between known sampling rates and covariates for a given sale.
Receipts for a given sale are taken from USDA Market News Service reports [40] or, when available, from the market’s own report. Covariates included as fixed effects include the inventory and sales of cattle, as well as the number of cattle operations, in the county where the market is located [4], the number of representative sales, the sale year and the sale week. Numeric covariates were first log transformed and standardized. Each market is additionally allowed a random intercept, representing unexplained variation attributed to average behavior of individual markets. To avoid possible parameter bias, we added an observation level random intercept to eliminate overdispersion [41]. We fit a binomial family GLMM with logit link function using the ’lme4’ package [42], and performed Wald χ2 tests for significance of fixed effects using the ’car’ package [43], in R version 3.1.3 [44].
Network Inference: Edge Prediction
The definition of nodes and the meaning of edges in contact networks for disease models are flexible, facilitating data-driven modeling approaches. Models for disease spread among livestock incorporate transportation data as edges in the network of contacts between susceptible and infected individuals [e.g. 10 ]. Representative sale data is compatible with a model having two types of nodes: one representing all farms, ranches and other long-term animal holdings located within a given county, and one representing a single market. In order to study the contribution of markets to the livestock transportation network, and its consequence for disease spread, we ignore edges between counties that arise from fence-line contact, private sales, transportation for grazing, and other mechanisms that might transmit disease directly between counties. Observing no indication of market-to-market transportation in the representative sales, we assumed their absence as well. As a result, the only edges in a network derived from representative sales are between nodes of different types, yielding a bi-partite contact network.
Edge prediction is any process for inferring the properties of unobserved edges, which must be made explicit for disease models that use contact networks to drive infectious interactions between nodes [e.g. 22 ]. A primary goal of edge prediction is to build a model that reflects clustering within the transportation network, or the propensity of livestock operations within different counties to trade cattle at the same two markets, without directly observing these second-degree interactions. A model that represents higher-order structural attributes of the network, including clustering, may yield different predictions for the spread of disease, but direct estimation of these attributes requires particular sampling methods [45]. For example, a random sample of nodes is not appropriate for estimating clustering coefficients; a first-wave link tracing approach [sensu 46] is needed to avoid underestimating the number of triangles touching each focal node.
We applied a regression approach to the problem of edge prediction, using observed edges to estimate whether county and market covariates predict their connectivity. The response variable is the number of cattle shipped from a given county to a market, which we assume to arise from a zero-inflated negative binomial (ZINB) distribution. This GLM includes the sales, inventory and number of farms for cattle (including calves) for each county of origin, GCD distance between the centroids of each county and the county of each market, the square of this distance, the number of livestock markets giving an address in each origin county, a boolean factor indicating whether the market is in the county, and a boolean factor indicating whether both are in the same state. All numeric predictors are normalized to unit variance with zero mean. Finally, the model includes a fixed effect of market: in the simpler case of a Poisson GLM, this fixed effect would have no effect on the resulting multinomial probability of cattle originating from each county, given the total number of cattle by market. The more complicated ZINB, necessary to obtain a good fit to the observed edges, comes at the cost of incorporating a meaningful market effect which will interfere with edge prediction for non-reporting markets. We fit the ZINB GLM using the ’pscl’ package in R [47].
Disease Consequence of Degree Distribution
A key insight from network epidemiology is that the degree distribution for contacts among individuals, or nodes, is of primary importance for disease spread [48, 49]. Edge prediction is not needed to infer the degree distribution among livestock markets; we may assume the number of counties appearing in the representative sales data for each market are independent samples from this distribution, and then specify the remaining network properties parametrically. We define ki as the sampled degree for market i of m markets. Empirical estimates for the market degree distribution generating function, GM,0(x), and the generating function for market “excess degree” [e.g. 50 eq. 12], GM,1(x), are
We specify properties of the full network by choice of the algorithm for connectivity and the probability distribution for county degree: we assume a bi-partite configuration model for edges and a Poisson distribution on county degree. In a static network with these properties, the expected size of a disease outbreak in terms of the number, or proportion, of counties affected can be calculated exactly [51]. With τC and τM the disease transmission probabilities from counties and markets, respectively, and the mean county degree equal to λC, the epidemic threshold is a fixed value of ϕ = τCτMλC determined by the market degree distribution. The threshold occurs at
For parameterizations below this threshold, the expected number of counties affected by an outbreak is
For parameterizations above this threshold, the proportion of counties in the epidemic is where u is the smallest root of
See S1 Text for an explanation of these equations.
Two features of representative sales data conflict with applying this ’random graph’ approach to network modeling of livestock disease spread. First, the degree of each market potentially varies from sale to sale, admitting possible temporal variation in the observed degree distribution. We examine the extent of variation in degree by visualizing temporal variation and calculating seasonal estimates of epidemic size. Second, the representative sales may not include all the counties of origin, potentially biasing the observed distribution toward smaller degrees. Using the iNEXT R-package developed for analysis of species accumulation curves [52,53], we calculate complete degree estimates for each sale using extrapolation of the county accumulation curve as a function of the number of cattle in representative sales.
Results
Cattle transported to 55 markets located in 53 counties in 16 states are represented in this analysis (Box 1). The first section below describes seasonal trends observed in the representative sales, the fair to strong correlation with published records of inter-state shipments, and quantifies the unexplained variation in the proportion of sales that different markets report as representative of a live auction. In the next section, we relate edge presence and weight to their distance and county covariates from the agricultural census, as well as unexplained differences between markets. The last section provides a demonstration of network epidemiological inferences that incorporates degree distributions from representative sales data. Data on each movement, summarized in tables suitable for reproducing our analyses, are freely available online (S2 Text).
Characteristics of Representative Sales
The average number of cattle movements reported in representative sales for a live auction increases to a peak of 1000–1500 head in late fall and decreases to a few hundred during summer months (Fig 1A). This seasonal trend persists between the period for which representative sales come from a handful of markets with accessible archives and the period since mid-2014, when we began capturing representative sales posted weekly (Fig 1B). At a given sale, livestock markets report receiving cattle from 11.2 (SD 1.5) counties on average, with weekly average market degree showing seasonal variation peaking in late autumn (Fig 1C). While the timing of peak degree closely corresponds to the time of year when the number of representative sales is also greatest, the troughs in degree are flatter, broader and less pronounced than the summertime lows in representative sales. The seasonal fluctuation in market degree is weakest during the most recent year, for which the sample size is larger as well as geographically more expansive. Aggregating across all markets, the proportion of sales that originate in-state shows no trend in deviations from an average of 0.84 (SD 0.07) (Fig 1C). Markets selling the majority of out-of-state cattle appear to be clustered in Oklahoma and South Dakota, where it is not uncommon for less than half of representative sale cattle to originate in-state (Fig 1E).
The state of origin for interstate shipments show fair to strong correlation with certificate-derived data from 2001 (Table 1). Among the states we could compare to this published summary of interstate transportation, and aside from Idaho, the proportion of cattle shipped within state is above 71%, so the number of cattle shipments used for each correlation is relatively small. For example, a nearly exact correlation results for New Mexico, but the number of head in representative sales available for comparison is only a few thousand head. South Dakota has the largest number of cross-border representative sales that we observed, however, and also shows a strong correlations of 0.79. Montana and Colorado have similarly large sample sizes, the first showing a strong correlation of 0.88 while the latter is among the weakest at 0.46. Shipments into Texas and California are insufficient for a meaningful comparison, while the lowest correlation (0.24 for Idaho) is driven by one strong connection to Nye County in Nevada. Variation in the strength of correlation across sample sizes suggests the presence of real variation in the kind of interstate shipments sampled by different data streams. Differences here could be due to the relative proportions of shipments that are market bound versus non-market bound as well as heterogeneity in state requirements for health certificates.
Uncertainty about the total number of animals shipped to market for a given sale is amplified by unknown sources at the market level and to a lesser extent for each individual report. For roughly one third of market reports, the total receipts is not available for use in weighting the sale’s affect on estimates aggregating across sales or markets. In other reports, the proportion of sales given as representative is associated with the number of representative sales , with a positive regression coefficient (approx. 95% CI 0.88 to 0.95). Both year and week of year are associated with variation in the sampling probability, but none of the covariates taken from agricultural census data are significant. Overall, the fixed effects contribute the majority of variation in the fitted GLMM, leaving the random effect of market (SD = 0.82) and the observation level random effect (SD = 0.20) to explain the remainder of the variance between sampling probabilities. Average differences between markets account for 20% of the variance; however, the observed proportions remain overdispersed with respect to the model fitted without observation level random intercepts. In other words, variation in the binomial sampling probability predicted by the fitted model for each market underestimates actual variation observed in the representative proportion (Fig 2). Observation level random intercepts are included to account for the remaining 12% of variance in the proportion of cattle reported in the representative sales, but the random intercept for each market is the greater source of uncertainty.
Edge Prediction
The regression model returns a negative coefficient for the impact of distance (approx. 95% CI -2.07 to -1.42) on the average number of cattle shipped from a county to market, as well as a positive coefficient (approx. 95% CI 6.88 to 7.61) that associates greater distances with zero-inflation, or the absence of an edge between county and market. This confirms intuition that cattle are preferably shipped to nearby markets, and quantifies the effect of distance to use when extrapolating edges for non-reporting markets. In addition, both distance-related factors for edges linking counties to markets within that county or within the same state are significant predictors. The covariates extracted from the agricultural census have inconsistent results, possibly due to their strong inter-correlations. The effect of sales (approx. 95% CI -0.19 to -0.05) and inventory (approx. 95% CI 0.93 to 1.78) on the average head of cattle shipped are of opposite sign, while the number of farms is insignificant. Because the covariates are standardized, we can interpret the result to mean that the size of cattle operations measured by head is of greatest importance and is consistent with the hypothesis that counties with larger inventory contribute more heavily weighted edges. Among the census covariates, only the number of farms has a non-zero (approx. 95% CI -0.62 -0.40) effect on zero-inflation.
Judged by simulated response variables generated by the fitted ZINB model, the model shows a good fit to the observed transportation network (Fig 3). Market degree distributions obtained with simulated response variables are uniformly similar to the observed distribution for market in-degree aggregated over the full study period (Fig 3A). While a single realization of simulated edge data cannot reveal the model’s degree of uncertainty, mapping the edges provides visual confirmation of the role of distance and distance related factors on the weight of network edges (Fig 3C&D). The most striking difference between the observed and simulated response is the weight of long-distance edges, suggesting that observed shipments are either more clustered on a fewer number of edges (including long-distance edges) or are even more commonly from nearby counties than simulated shipments.
Inclusion of the fixed effect of market in the ZINB model greatly improves the fit (ΔAIC = −3340 on 55 degrees of freedom), but eliminates direct application of the model in predicting cattle shipments to non-reporting livestock markets. The fitted coefficients for market effects could instead be modeled as random effects, and extrapolation to a full network carried out under the assumption that reporting markets are an unbiased sample with respect to network attributes. Based on the fitted intercepts for each market, however, the usual assumption of normality for random intercepts may not be justified (Fig 3B).
Epidemic Size on Random Graphs
A model for disease spread that does not include a full contact network is possible under the assumption that epidemics develop as a tree-like graph, and is related to the representative sales data through estimates of the market degree distribution. The majority of seasonal variation in the distribution on market degree exists between a peak season (from the 39th (the last week of September) through years end) and the remaining off-peak portion of the year, which exhibit distinct empirical cumulative distribution functions (ECDFs; Fig 4A). The former is indistinguishable from a negative binomial distribution by Pearson’s goodness-of-fit test while the latter, although similar in shape, is not .
For a disease spreading on a bi-partite random graph with these market degree distributions, seasonal variation effects the location of the epidemic threshold with respect to the unknown parameters. Roughly 20% lower values of ϕ, the product of market and county transmisabilities and mean county degree, prompt an epidemic for the peak time of year relative to off-peak (Fig 4C). Above the epidemic threshold, the difference between seasons becomes negligible as transmisability increases; it is overwhelmed by the overall high degree of livestock markets. Even with the average excess degree of counties equal to one, nearly two-thirds of counties are affected in the extreme case that every contact between susceptible and infective cattle leads to successful disease transmission (Fig 4D).
The number of counties included in representative sales is, on average, 8.6% of the estimated number of counties extrapolated from county accumulation curves. In over half of sale reports, the estimate is less than 1% greater than the observed number of counties, and among the rest the most common increase in degree is just 10% (Fig 4B). The rarity of singleton counties (i.e. counties with only one individual in representative sales) and the sufficiently high sampling rates (Fig 2) are responsible for the completeness of the sample for counties of origin. Using the extrapolated values in calculating disease spread on a random graph has the same qualitative effect as the shift from off-peak to peak season market degree distributions. Quantitatively however, the difference between observed and extrapolated market degree distributions has less impact than season on the estimates of disease spread (Fig 4C&D).
Discussion
The epidemiological contact network is a fundamental component of models for the spread of diseases, and sale reports publicized by livestock auction markets contribute urgently needed data to support inference of such networks within the US livestock system. For this study, we initiated an ongoing process to archive representative sales as an opportunistic sample of cattle transported from counties with beef or dairy operations to livestock markets distributed across the US. The study complements previous efforts to summarize transportation of cattle within the US using data derived from certificates of veterinary inspection [5, 21], but extends our ability to model within-state shipping patterns. We demonstrate how inference of a bi-partite contact network, between nodes representing either cattle holding operations aggregated within a US county or a particular auction market, can allow for new models for the spread of economically disruptive livestock diseases.
Representative sales extracted from livestock market reports provide a reliable sample of cattle shipments and the corresponding potential for disease transmitting contacts. Seasonal variation in the volume of representative sales is consistent with beef cattle production systems, where calves are produced in spring and weaned cattle or yearlings sold to pasturing or feedlot operations in the fall and subsequent spring [54]. The proportion of receipts at a given sale whose origin can be identified is anywhere from a negligible fraction to around three quarters, and understanding this variability is important for scaling up assessments of transportation networks. The dominant source of uncertainty is variation between markets, but this can be quantified for future modeling efforts despite having no identified deterministic source. Covariates taken from the agricultural census on the county where markets are located do not influence the proportion of sales reported, which reduces concern about biasing population estimates from the representative sales.
Interstate shipments among representative sales correlate fairly well with ICVI data, while the remaining majority of representative sales provide unmatched data on cattle shipments that remain within states. Intrastate shipment data were previously unavailable and dominate market directed shipments at typically over 80% on any week. Although transportation of infectious cattle within a state would not immediately spark a regional epidemic, cattle movements at this scale could spread disease beyond the 10km control radius to be established around infected premises in response to FMD detection within the US [1]. Sale at livestock markets is not the only impetus for cattle transportation, but the correlation between representative sale origins and ICVI origins for transportation between states demonstrates its importance. Indeed, if the certificate-derived data do sample all movements without bias, then the magnitude of correlation with representative sales supports the hypothesis that most cattle (excepting slaughter animals) shipped between states are bound for a livestock market. Shipments leaving livestock markets would have to primarily remain in-state, and therefore be absent from certificate-derived data, for this hypothesis to hold: for example, it implies the testable conclusion that feedlots and back-grounders obtain most cattle born out-of-state from in-state livestock markets.
The market-derived data allows estimates for contact networks ranging in complexity from random graphs, which have many analytically tractable properties, to networks with non-trivial clustering, modularity, assortativity and other non-random features. A collection of edge data, resulting from sampling random nodes without tracing its edges to sample additional nodes, is an ego-centric sample that allows straightforward estimation of node, but not edge, attributes [45]. From this sample, we find market degree distributions that fit a negative binomial distribution with variance roughly twice the mean, which has more dispersion than a Poisson distribution but less than a power law. We also find seasonal shifts in the degree distribution that lower the epidemic threshold of a random graph during the peak cattle trading season. However, the overlapping marketsheds apparent in this sample suggest high potential for network clustering, which tends to dampen the spread of disease but is not easily assessed from an ego-centric sample [55]. Estimating this kind of structural attribute requires inference about unobserved edges, and a first analysis shows potential for incorporating linear effects of county and market attributes in exponential family likelihoods for edge weight. Extensions to this likelihood that include multiple response variables, particularly market degree and squares in the bi-partite network, may achieve a reliable fit to the representative sales that provides a data-driven, non-random graph for livestock disease simulations [46].
Despite the increased availability of data on livestock transportation in the US that our study provides, disease models here lag behind the relatively data rich European livestock systems. Research in these systems on the optimal spatial and temporal resolution at which to model contact networks is critical for efficient use of limited information available in the US and targeted development of new data streams. In a spatially embedded contact network, each node represents a geographically constrained subpopulation within an interacting metapopulation [56]. The constraint should reflect where mixing of susceptible and infected individuals occurs in proportion to their frequencies, but no theory exists for transferring constraints developed in one region (e.g. the UK) to any other (e.g. Pennsylvania). The abstraction of temporally discrete livestock shipments into static edges, representing the potential for disease transmission over time, is better understood [7,57]. An additional challenge, for a contact network distinguishing livestock markets from longer term animal holdings, is the synchronicity of shipments of cattle between two counties arriving at the same market. Extreme cases of complete segregation of cattle from different origins versus within market mixing should bracket the range of disease outcomes [33].
The greater purpose of collecting data on livestock transportation is to improve surveillance for disease outbreaks and to guide prevention or control of epidemics. The sources of nation-wide data on US livestock movements contributing to these goals have previously included health certificate records accompanying interstate movements [5, 21] and owner/operator surveys on animal health and management practices for representative animal holdings [58]. Future research should aim to combine these sources with representative sales data to jointly infer contact networks, because each data source addresses network attributes absent from the others. The primary deficiency of representative sales data is the absence of out-going shipment information, or the destination of cattle purchased at auction. Surveys of livestock operations presently include information on the in-shipment degree, source type and distance, which may provide evidence about the missing outgoing edges for livestock markets. Representative sales also only include market-directed shipments, while health certificate data provides information on network edges that may not have a livestock market at either end. Especially in combination, which we recognize to be a difficult task both conceptually and statistically, inference from multiple data sources will dramatically improve awareness of the network of potentially disease spreading contacts between livestock.
Supporting Information
S1 Text
Disease Percolation on a Directed Bi-Partite Random Graph.
S2 Text
Description of Data Release.
S1 Text: Disease Percolation on Directed, Bi-Partite Random Graphs
Equations 4 through 6 in the main text summarize results for the spread of disease transmitted with constant probability through a bi-partite random graph. The theory leading to these results is summarized by Newman [1] and Meyers et al. [2]. For convenience we re-present an extension of the basic theory to a directed, bi-partite random graph. But in doing so, we also clarify how the number of infected nodes of just one type, “counties” in our case, may be followed through the derivation.
In a directed, bi-partite graph, where nodes have type M or C for market or county, respectively, edges are either M → C or C → M. Let GC,0(x) be the generating function for the probability distribution on the number of C → M edges leaving a C node, marginalizing its in-degree. The county “excess degree” distribution is the probability distribution on the number of C → M edges departing from the county at the end of a randomly chosen C → M edge. Its generating function is
as usual. The market “excess degree” distribution is the same, with M instead of C.
The key random variable of interest is SC → M, the number of infected counties in the cluster of nodes reached by tracing the out-going edge of a particular infected county. Let’s denote the generating function for SC → M by HC,1(x), and highlight that we’re neither counting the number of markets mixed up in this cluster nor the original county. The function HM,1(x) will generate the distinct distribution on the number of infected counties reached by tracing a M → C edge. We determine these functions by deriving self-consistency equations from the following two observations. First, using superscipt (i) to indicate i of N independent samples of the random variable, N as the given out-degree of a market, and T as the given boolean variable for successful disease transmission:
This bookeeping equation results from following the instructions, “choose among all C → M edges going to markets with out-degree N and add up the number of counties infected, assuming the market either is or is not infected.” The second observation is
In the next step, we achieve the desired generating functions on the right hand side:
and
where τc and τM are the probabilities that T = True in Eq. 2 and Eq. 3, respectively. Averaging each equation over the appropriate “excess degree” distribution for N gives the coupled system:
These must be solved in order to obtain the distribution on the outbreak size starting in a randomly infected county, which is generated by HC,0(x) and derived starting from the observation that
The derivation is completed by averaging over the county out-degree distribution (not the county “excess degree” distribution) to obtain
The outbreak size and epidemic proportion calculations follow in the usual way. Let u = HC,1(1) and v = HM,1(1). Using the u = v = 1 solution to Eq. 6, the expected outbreak size reduces to
Numerically finding other solutions for u and v, with 0 < u < 1 and 0 < v < 1, leads to the epidemic size proportion as
To obtain Eqs. 4 through 6 in the main text, we assume that county out-degree is Poisson distributed with mean λC and that market in- and out-degree are perfectly correlated. Imperfect correlations could be included with an additional parameter.
S2 Text: Description of Data Release
Note: The data release will coincide with publication of the report — the DOI given below will remain inactive until release.
Representative sales data collected from several livestock market websites, as aggregated for the analyses in this report, are available for download from the Bansal Lab Dataverse [1]. Accompanything the data are scripts for R [2] that reproduce the results presented in the main text.
volume.csv ___________________________________________________date (char as YYYY-MM-DD) date of cattle auction given on each sale report
market (int) unique market identifier
orig_location (char) FIPS code for county at market street address
dest_location (char) FIPS code for nearest county containing the origin city[, state]
head (int) number of cattle in all lots (each defaults to 1 for missing data)
A disaggregated version of the representative sales data, sufficient to re-create the panels of Fig. 1 in the main text. Note that the first two characters of a FIPS code correspond to the state, allowing for in-state proportion calculations.
proportion.csv ___________________________________________________year (char) year of cattle auction
week (char) ISO week of year
market (int) unique market identifier
receipts (int) total head sold at auction from sale report or [3] (if unreported)
head (int) head given as representative sales (lot size defaults to 1 for missing data)
sales (int) head in county-wide sales*
inventory (int) head in county-wide inventory*
farms (int) farms in county-wide inventory*
The script proportion.R reads this file and fits a binomial family GLMM, associating the sampling probability for representative sales in each report with the covariates provided.
certificate.csv ___________________________________________________orig (char) State abbreviation for cattle origin
dest (char) State abbreviation for cattle destination
rep_sales (int) Head from representative sales, (see Methods for time-span)
flows (int) Head from certificate-derived data†
Note that the data from [5] are available in electronic form at http://webarchives.cdlib.org/sw12j6951w/http://www.ers.usda.gov/ Data/InterstateLivestockMovements/View.asp. No script is provided to calculate the correlations between interstate rep_sales and flows for each dest.
edge.csv ___________________________________________________market (char) year of cattle auction
dest_location (char) FIPS code for nearest county containing the origin city[, state]
sales (int) head in county-wide sales
inventory (int) head in county-wide inventory
farms (int) farms in county-wide inventory
distance (real) great-circle distance between orig_location and dest_location county centroids‡
head (int) head given as representative sales (lot size defaults to 1 for missing data)
instate (bool) zero if and only if dest_location and market are in different states
The script edge.R reads this file and fits a zero-inflated negative-binomail family GLM, associating a zero-inflation probability and mean head of cattle for each county-market pair with the covariates provided. The script additionally simulates counts for each pair, with the same random seed used in this report, and writes the counts to a new file.
Acknowledgments
The authors want to acknowledge several undergraduate or graduate assistants who contributed to software development: Daniel Anderson, Adam Graves, Ching-Hao Hu, and Xinyang Jiang. We thank Nancy Robinson at the Livestock Marketing Association for sharing a member directory, and Centennial Livestock Auction of Fort Collins, CO for answering questions on cattle market practices. We also thank Jason E. Lombard (USDA-APHIS-VS) for several helpful conversations about live-stock marketing practices. Funding for contributions by ITC and SB was provided through DHS Contract #HSHQDC-12-C-0014, with additional support from the RAPIDD Program of the Science & Technology Directorate, Department of Homeland Security, and the Fogarty International Center, National Institutes of Health. The analyses, views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the regulatory opinions, official policies, either expressed or implied, of the USDA-APHIS-Veterinary Services or the U.S. Department of Homeland Security.
Footnotes
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].
- [11].
- [12].
- [13].
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵
- [58].↵