Abstract
Lyme disease is the most widely reported vector-borne disease in the United States. 95% of human cases are reported in the Northeast and upper Midwest. Human cases typically occur in the spring and summer months when an infected nymph Ixodid tick takes a blood meal. Current federal surveillance strategies report data on an annual basis, leading to nearly a year lag in national data reporting. These lags in reporting make it difficult for public health agencies to assess and plan for the current burden of Lyme disease. Implementation of a nowcasting model, using historical data to predict current trends, provides a means for public health agencies to evaluate current Lyme disease burden and make timely priority-based budgeting decisions. The objective of this study was to develop and compare the performance of nowcasting models using free data from Google Trends and Centers of Disease Control and Prevention surveillance reports for Lyme Disease. We developed two sets of elastic net models for five regions of the United States first using monthly proportional hit data from 21 disease symptoms and tick related terms and second using monthly proportional hit data from all terms identified via Google correlate plus 21 disease symptom and vector terms. Elastic net models using the larger term list were highly accurate (Root Mean Square Error: 0.74, Mean Absolute Error: 0.52, R2: 0.97) for four of the five regions of the United States. Including these more environmental terms improved accuracy 1.33-fold while reducing error 0.5-fold compared to predictions from models using disease symptom and vector terms alone. Models using Google data similar to this could help local and state public health agencies accurately monitor Lyme disease burden during times of reporting lag from federal public health reporting agencies.