skip to main content


Search for: All records

Creators/Authors contains: "Kim, Jeonghoon"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Anomaly detection methods have a great potential to assist the detection of diseases in animal production systems. We used sequence data of Porcine Reproductive and Respiratory Syndrome (PRRS) to define the emergence of new strains at the farm level. We evaluated the performance of 24 anomaly detection methods based on machine learning, regression, time series techniques and control charts to identify outbreaks in time series of new strains and compared the best methods using different time series: PCR positives, PCR requests and laboratory requests. We introduced synthetic outbreaks of different size and calculated the probability of detection of outbreaks (POD), sensitivity (Se), probability of detection of outbreaks in the first week of appearance (POD1w) and background alarm rate (BAR). The use of time series of new strains from sequence data outperformed the other types of data but POD, Se, POD1w were only high when outbreaks were large. The methods based on Long Short-Term Memory (LSTM) and Bayesian approaches presented the best performance. Using anomaly detection methods with sequence data may help to identify the emergency of cases in multiple farms, but more work is required to improve the detection with time series of high variability. Our results suggest a promising application of sequence data for early detection of diseases at a production system level. This may provide a simple way to extract additional value from routine laboratory analysis. Next steps should include validation of this approach in different settings and with different diseases.

     
    more » « less
    Free, publicly-accessible full text available December 1, 2024
  2. Abstract

    The pork industry is an essential part of the global food system, providing a significant source of protein for people around the world. A major factor restraining productivity and compromising animal wellbeing in the pork industry is disease outbreaks in pigs throughout the production process: widespread outbreaks can lead to losses as high as 10% of the U.S. pig population in extreme years. In this study, we present a machine learning model to predict the emergence of infection in swine production systems throughout the production process on a daily basis, a potential precursor to outbreaks whose detection is vital for disease prevention and mitigation. We determine features that provide the most value in predicting infection, which include nearby farm density, historical test rates, piglet inventory, feed consumption during the gestation period, and wind speed and direction. We utilize these features to produce a generalizable machine learning model, evaluate the model’s ability to predict outbreaks both seven and 30 days in advance, allowing for early warning of disease infection, and evaluate our model on two swine production systems and analyze the effects of data availability and data granularity in the context of our two swine systems with different volumes of data. Our results demonstrate good ability to predict infection in both systems with a balanced accuracy of$$85.3\%$$85.3%on any disease in the first system and balanced accuracies (average prediction accuracy on positive and negative samples) of$$58.5\%$$58.5%,$$58.7\%$$58.7%,$$72.8\%$$72.8%and$$74.8\%$$74.8%on porcine reproductive and respiratory syndrome, porcine epidemic diarrhea virus, influenza A virus, andMycoplasma hyopneumoniaein the second system, respectively, using the six most important predictors in all cases. These models provide daily infection probabilities that can be used by veterinarians and other stakeholders as a benchmark to more timely support preventive and control strategies on farms.

     
    more » « less
  3. Antimicrobial resistance (AMR) is arguably one of the major health and economic challenges in our society. A key aspect of tackling AMR is rapid and accurate detection of the emergence and spread of AMR in food animal production, which requires routine AMR surveillance. However, AMR detection can be expensive and time-consuming considering the growth rate of the bacteria and the most commonly used analytical procedures, such as Minimum Inhibitory Concentration (MIC) testing. To mitigate this issue, we utilized machine learning to predict the future AMR burden of bacterial pathogens. We collected pathogen and antimicrobial data from >600 farms in the United States from 2010 to 2021 to generate AMR time series data. Our prediction focused on five bacterial pathogens (Escherichia coli, Streptococcus suis, Salmonella sp., Pasteurella multocida, andBordetella bronchiseptica). We found that Seasonal Auto-Regressive Integrated Moving Average (SARIMA) outperformed five baselines, including Auto-Regressive Moving Average (ARMA) and Auto-Regressive Integrated Moving Average (ARIMA). We hope this study provides valuable tools to predict the AMR burden not only of the pathogens assessed in this study but also of other bacterial pathogens.

     
    more » « less
  4. Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance. 
    more » « less