skip to main content

Title: Infection prediction in swine populations with machine learning

The pork industry is an essential part of the global food system, providing a significant source of protein for people around the world. A major factor restraining productivity and compromising animal wellbeing in the pork industry is disease outbreaks in pigs throughout the production process: widespread outbreaks can lead to losses as high as 10% of the U.S. pig population in extreme years. In this study, we present a machine learning model to predict the emergence of infection in swine production systems throughout the production process on a daily basis, a potential precursor to outbreaks whose detection is vital for disease prevention and mitigation. We determine features that provide the most value in predicting infection, which include nearby farm density, historical test rates, piglet inventory, feed consumption during the gestation period, and wind speed and direction. We utilize these features to produce a generalizable machine learning model, evaluate the model’s ability to predict outbreaks both seven and 30 days in advance, allowing for early warning of disease infection, and evaluate our model on two swine production systems and analyze the effects of data availability and data granularity in the context of our two swine systems with different volumes of data. Our results demonstrate good ability to predict infection in both systems with a balanced accuracy of$$85.3\%$$85.3%on any disease in the first system and balanced accuracies (average prediction accuracy on positive and negative samples) of$$58.5\%$$58.5%,$$58.7\%$$58.7%,$$72.8\%$$72.8%and$$74.8\%$$74.8%on porcine reproductive and respiratory syndrome, porcine epidemic diarrhea virus, influenza A virus, andMycoplasma hyopneumoniaein the second system, respectively, using the six most important predictors in all cases. These models provide daily infection probabilities that can be used by veterinarians and other stakeholders as a benchmark to more timely support preventive and control strategies on farms.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Analyzing the impact of the adaptive immune response during acute hepatitis B virus (HBV) infection is essential for understanding disease progression and control. Here we developed mathematical models of HBV infection which either lack terms for adaptive immune responses, or assume adaptive immune responses in the form of cytolytic immune killing, non-cytolytic immune cure, or non-cytolytic-mediated block of viral production. We validated the model that does not include immune responses against temporal serum hepatitis B DNA (sHBV) and temporal serum hepatitis B surface-antigen (HBsAg) experimental data from mice engrafted with human hepatocytes (HEP). Moreover, we validated the immune models against sHBV and HBsAg experimental data from mice engrafted with HEP and human immune system (HEP/HIS). As expected, the model that does not include adaptive immune responses matches the observed high sHBV and HBsAg concentrations in all HEP mice. By contrast, while all immune response models predict reduction in sHBV and HBsAg concentrations in HEP/HIS mice, the Akaike Information Criterion cannot discriminate between non-cytolytic cure (resulting in a class of cells refractory to reinfection) and antiviral block functions (of up to$$99\%$$99%viral production 1–3 weeks following peak viral load). We can, however, reject cytolytic killing, as it can only match the sHBV and HBsAg data when we predict unrealistic levels of hepatocyte loss.

    more » « less
  2. Abstract

    We present a proof of concept for a spectrally selective thermal mid-IR source based on nanopatterned graphene (NPG) with a typical mobility of CVD-grown graphene (up to 3000$$\hbox {cm}^2\,\hbox {V}^{-1}\,\hbox {s}^{-1}$$cm2V-1s-1), ensuring scalability to large areas. For that, we solve the electrostatic problem of a conducting hyperboloid with an elliptical wormhole in the presence of anin-planeelectric field. The localized surface plasmons (LSPs) on the NPG sheet, partially hybridized with graphene phonons and surface phonons of the neighboring materials, allow for the control and tuning of the thermal emission spectrum in the wavelength regime from$$\lambda =3$$λ=3to 12$$\upmu$$μm by adjusting the size of and distance between the circular holes in a hexagonal or square lattice structure. Most importantly, the LSPs along with an optical cavity increase the emittance of graphene from about 2.3% for pristine graphene to 80% for NPG, thereby outperforming state-of-the-art pristine graphene light sources operating in the near-infrared by at least a factor of 100. According to our COMSOL calculations, a maximum emission power per area of$$11\times 10^3$$11×103W/$$\hbox {m}^2$$m2at$$T=2000$$T=2000K for a bias voltage of$$V=23$$V=23V is achieved by controlling the temperature of the hot electrons through the Joule heating. By generalizing Planck’s theory to any grey body and deriving the completely general nonlocal fluctuation-dissipation theorem with nonlocal response of surface plasmons in the random phase approximation, we show that the coherence length of the graphene plasmons and the thermally emitted photons can be as large as 13$$\upmu$$μm and 150$$\upmu$$μm, respectively, providing the opportunity to create phased arrays made of nanoantennas represented by the holes in NPG. The spatial phase variation of the coherence allows for beamsteering of the thermal emission in the range between$$12^\circ$$12and$$80^\circ$$80by tuning the Fermi energy between$$E_F=1.0$$EF=1.0eV and$$E_F=0.25$$EF=0.25eV through the gate voltage. Our analysis of the nonlocal hydrodynamic response leads to the conjecture that the diffusion length and viscosity in graphene are frequency-dependent. Using finite-difference time domain calculations, coupled mode theory, and RPA, we develop the model of a mid-IR light source based on NPG, which will pave the way to graphene-based optical mid-IR communication, mid-IR color displays, mid-IR spectroscopy, and virus detection.

    more » « less
  3. Abstract Background

    Protein–protein interaction (PPI) is vital for life processes, disease treatment, and drug discovery. The computational prediction of PPI is relatively inexpensive and efficient when compared to traditional wet-lab experiments. Given a new protein, one may wish to find whether the protein has any PPI relationship with other existing proteins. Current computational PPI prediction methods usually compare the new protein to existing proteins one by one in a pairwise manner. This is time consuming.


    In this work, we propose a more efficient model, called deep hash learning protein-and-protein interaction (DHL-PPI), to predict all-against-all PPI relationships in a database of proteins. First, DHL-PPI encodes a protein sequence into a binary hash code based on deep features extracted from the protein sequences using deep learning techniques. This encoding scheme enables us to turn the PPI discrimination problem into a much simpler searching problem. The binary hash code for a protein sequence can be regarded as a number. Thus, in the pre-screening stage of DHL-PPI, the string matching problem of comparing a protein sequence against a database withMproteins can be transformed into a much more simpler problem: to find a number inside a sorted array of lengthM. This pre-screening process narrows down the search to a much smaller set of candidate proteins for further confirmation. As a final step, DHL-PPI uses the Hamming distance to verify the final PPI relationship.


    The experimental results confirmed that DHL-PPI is feasible and effective. Using a dataset with strictly negative PPI examples of four species, DHL-PPI is shown to be superior or competitive when compared to the other state-of-the-art methods in terms of precision, recall or F1 score. Furthermore, in the prediction stage, the proposed DHL-PPI reduced the time complexity from$$O(M^2)$$O(M2)to$$O(M\log M)$$O(MlogM)for performing an all-against-all PPI prediction for a database withMproteins. With the proposed approach, a protein database can be preprocessed and stored for later search using the proposed encoding scheme. This can provide a more efficient way to cope with the rapidly increasing volume of protein datasets.

    more » « less
  4. Abstract

    A method for modelling the prompt production of molecular states using the hadronic rescattering framework of the general-purpose Pythia event generator is introduced. Production cross sections of possible exotic hadronic molecules via hadronic rescattering at the LHC are calculated for the$$\chi _{c1}(3872)$$χc1(3872)resonance, a possible tetraquark state, as well as three possible pentaquark states,$$P_c^+(4312)$$Pc+(4312),$$P_c^+(4440)$$Pc+(4440), and$$P_c^+(4457)$$Pc+(4457). For the$$P_c^+$$Pc+states, the expected cross section from$$\Lambda _b$$Λbdecays is compared to the hadronic-rescattering production. The$$\chi _{c1}(3872)$$χc1(3872)cross section is compared to the fiducial$$\chi _{c1}(3872)$$χc1(3872)cross-section measurement by LHCb and found to contribute at a level of$${\mathcal {O}({1\%})}$$O(1%). Finally, the expected yields of$$\mathrm {P_c^{+}}$$Pc+production from hadronic rescattering during Run 3 of LHCb are estimated. The prompt background is found to be significantly larger than the prompt$$\mathrm {P_c^{+}}$$Pc+signal from hadronic rescattering.

    more » « less
  5. Abstract

    This paper presents the observation of four-top-quark ($$t\bar{t}t\bar{t}$$tt¯tt¯) production in proton-proton collisions at the LHC. The analysis is performed using an integrated luminosity of 140 $$\hbox {fb}^{-1}$$fb-1at a centre-of-mass energy of 13 TeV collected using the ATLAS detector. Events containing two leptons with the same electric charge or at least three leptons (electrons or muons) are selected. Event kinematics are used to separate signal from background through a multivariate discriminant, and dedicated control regions are used to constrain the dominant backgrounds. The observed (expected) significance of the measured$$t\bar{t}t\bar{t}$$tt¯tt¯signal with respect to the standard model (SM) background-only hypothesis is 6.1 (4.3) standard deviations. The$$t\bar{t}t\bar{t}$$tt¯tt¯production cross section is measured to be$$22.5^{+6.6}_{-5.5}$$22.5-5.5+6.6 fb, consistent with the SM prediction of$$12.0 \pm 2.4$$12.0±2.4fb within 1.8 standard deviations. Data are also used to set limits on the three-top-quark production cross section, being an irreducible background not measured previously, and to constrain the top-Higgs Yukawa coupling and effective field theory operator coefficients that affect$$t\bar{t}t\bar{t}$$tt¯tt¯production.

    more » « less