skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Drosophila genotypes can be predicted from their exploration locomotive trajectories using supervised machine learning
This study employs supervised machine learning algorithms to test whether locomotive features during exploratory activity in open field arenas can serve as predictors for the genotype of fruit flies. Because of the nonlinearity in locomotive trajectories, traditional statistical methods that are used to compare exploratory activity between genotypes of fruit flies may not reveal all insights. 10-minute-long trajectories of four different genotypes of fruit flies in an open-field arena environment were captured. Turn angles and step size features extracted from the trajectories were used for training supervised learning models to predict the genotype of the fruit flies. Using the first five minute locomotive trajectories, an accuracy of 83% was achieved in differentiating wild-type flies from three other mutant genotypes. Using the final 5 min and the entire ten minute duration decreased the performance indicating that the most variations between the genotypes in their exploratory activity are exhibited in the first few minutes. Feature importance analysis revealed that turn angle is a better predictor than step size in predicting fruit fly genotype. Overall, this study demonstrates that features of trajectories can be used to predict the genotype of fruit flies through supervised machine learning methods.  more » « less
Award ID(s):
2135305 2135306
PAR ID:
10495516
Author(s) / Creator(s):
; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Behavioural Processes
Volume:
212
Issue:
C
ISSN:
0376-6357
Page Range / eLocation ID:
104944
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Learning is central to our understanding of how behaviour is shaped by the environment. A key open question is whether learning across contexts evolves as an integrated process, or whether learning in each context is free to evolve separately. Here, we measured learning in two sensory contexts in multiple genotypes and both sexes of two closely related, but ecologically divergent, species of fruit flies, Drosophila simulans and Drosophila sechellia. These species are morphologically very similar but differ dramatically in ecology and population biology. We tested how flies from each genotype, sex and species responded to and learned about different gustatory and visual cues. This approach allowed us to test whether species differences in learning were independent or correlated across contexts. Surprisingly, we found no evidence that D. simulans learned in any of our treatments. In contrast, D. sechellia learned to avoid gustatory stimuli that were paired with an aversive stimulus, as predicted, but unexpectedly learned to approach visual stimuli that were paired with the aversive stimulus. At the genotype level, genotypes, but not species, differed in their naïve responses to stimuli, but genotypes did not differ in learning in either species. Our results demonstrate that D. sechellia indeed differs from D. simulans in both learning contexts, but in a stimulus-dependent way. We suggest that studies of additional species or population pairs that employ this framework will be critical for evaluating the dimensionality of learning and its evolution. 
    more » « less
  2. Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 2,399 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. 
    more » « less
  3. Multiple-particle tracking (MPT) is a microscopy technique capable of simultaneously tracking hundreds to thousands of nanoparticles in a biological sample and has been used extensively to characterize biological microenvironments, including the brain extracellular space (ECS). Machine learning techniques have been applied to MPT data sets to predict the diffusion mode of nanoparticle trajectories as well as more complex biological variables, such as biological age. In this study, we develop a machine learning pipeline to predict and investigate changes to the brain ECS due to injury using supervised classification and feature importance calculations. We first validate the pipeline on three related but distinct MPT data sets from the living brain ECS—age differences, region differences, and enzymatic degradation of ECS structure. We predict three ages with 86% accuracy, three regions with 90% accuracy, and healthy versus enzyme-treated tissue with 69% accuracy. Since injury across groups is normally compared with traditional statistical approaches, we first used linear mixed effects models to compare features between healthy control conditions and injury induced by two different oxygen glucose deprivation exposure times. We then used machine learning to predict injury state using MPT features. We show that the pipeline predicts between the healthy control, 0.5 h OGD treatment, and 1.5 h OGD treatment with 59% accuracy in the cortex and 66% in the striatum, and identifies nonlinear relationships between trajectory features that were not evident from traditional linear models. Our work demonstrates that machine learning applied to MPT data is effective across multiple experimental conditions and can find unique biologically relevant features of nanoparticle diffusion. 
    more » « less
  4. Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 2,399 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at \url{https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench}. 
    more » « less
  5. Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench. 
    more » « less