skip to main content


Title: Artificial Intelligence for Biology
Abstract Despite efforts to integrate research across different subdisciplines of biology, the scale of integration remains limited. We hypothesize that future generations of Artificial Intelligence (AI) technologies specifically adapted for biological sciences will help enable the reintegration of biology. AI technologies will allow us not only to collect, connect and analyze data at unprecedented scales, but also to build comprehensive predictive models that span various subdisciplines. They will make possible both targeted (testing specific hypotheses) and untargeted discoveries. AI for biology will be the cross-cutting technology that will enhance our ability to do biological research at every scale. We expect AI to revolutionize biology in the 21st century much like statistics transformed biology in the 20th century. The difficulties, however, are many, including data curation and assembly, development of new science in the form of theories that connect the subdisciplines, and new predictive and interpretable AI models that are more suited to biology than existing machine learning and AI techniques. Development efforts will require strong collaborations between biological and computational scientists. This white paper provides a vision for AI for Biology and highlights some challenges.  more » « less
Award ID(s):
1939739 1900572 1951345
NSF-PAR ID:
10293333
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Integrative and Comparative Biology
ISSN:
1540-7063
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Intellectual and Developmental Disabilities (IDDs), such as Down syndrome, Fragile X syndrome, Rett syndrome, and autism spectrum disorder, usually manifest at birth or early childhood. IDDs are characterized by significant impairment in intellectual and adaptive functioning, and both genetic and environmental factors underpin IDD biology. Molecular and genetic stratification of IDDs remain challenging mainly due to overlapping factors and comorbidity. Advances in high throughput sequencing, imaging, and tools to record behavioral data at scale have greatly enhanced our understanding of the molecular, cellular, structural, and environmental basis of some IDDs. Fueled by the “big data” revolution, artificial intelligence (AI) and machine learning (ML) technologies have brought a whole new paradigm shift in computational biology. Evidently, the ML-driven approach to clinical diagnoses has the potential to augment classical methods that use symptoms and external observations, hoping to push the personalized treatment plan forward. Therefore, integrative analyses and applications of ML technology have a direct bearing on discoveries in IDDs. The application of ML to IDDs can potentially improve screening and early diagnosis, advance our understanding of the complexity of comorbidity, and accelerate the identification of biomarkers for clinical research and drug development. For more than five decades, the IDDRC network has supported a nexus of investigators at centers across the USA, all striving to understand the interplay between various factors underlying IDDs. In this review, we introduced fast-increasing multi-modal data types, highlighted example studies that employed ML technologies to illuminate factors and biological mechanisms underlying IDDs, as well as recent advances in ML technologies and their applications to IDDs and other neurological diseases. We discussed various molecular, clinical, and environmental data collection modes, including genetic, imaging, phenotypical, and behavioral data types, along with multiple repositories that store and share such data. Furthermore, we outlined some fundamental concepts of machine learning algorithms and presented our opinion on specific gaps that will need to be filled to accomplish, for example, reliable implementation of ML-based diagnosis technology in IDD clinics. We anticipate that this review will guide researchers to formulate AI and ML-based approaches to investigate IDDs and related conditions.

     
    more » « less
  2. Summary

    Cell therapies are powerful technologies in which human cells are reprogrammed for therapeutic applications such as killing cancer cells or replacing defective cells. The technologies underlying cell therapies are increasing in effectiveness and complexity, making rational engineering of cell therapies more difficult. Creating the next generation of cell therapies will require improved experimental approaches and predictive models. Artificial intelligence (AI) and machine learning (ML) methods have revolutionized several fields in biology including genome annotation, protein structure prediction, and enzyme design. In this review, we discuss the potential of combining experimental library screens and AI to build predictive models for the development of modular cell therapy technologies. Advances in DNA synthesis and high‐throughput screening techniques enable the construction and screening of libraries of modular cell therapy constructs. AI and ML models trained on this screening data can accelerate the development of cell therapies by generating predictive models, design rules, and improved designs.

     
    more » « less
  3. Abstract Practitioner notes

    What is already known about this topic

    Scholarly attention has turned to examining Artificial Intelligence (AI) literacy in K‐12 to help students understand the working mechanism of AI technologies and critically evaluate automated decisions made by computer models.

    While efforts have been made to engage students in understanding AI through building machine learning models with data, few of them go in‐depth into teaching and learning of feature engineering, a critical concept in modelling data.

    There is a need for research to examine students' data modelling processes, particularly in the little‐researched realm of unstructured data.

    What this paper adds

    Results show that students developed nuanced understandings of models learning patterns in data for automated decision making.

    Results demonstrate that students drew on prior experience and knowledge in creating features from unstructured data in the learning task of building text classification models.

    Students needed support in performing feature engineering practices, reasoning about noisy features and exploring features in rich social contexts that the data set is situated in.

    Implications for practice and/or policy

    It is important for schools to provide hands‐on model building experiences for students to understand and evaluate automated decisions from AI technologies.

    Students should be empowered to draw on their cultural and social backgrounds as they create models and evaluate data sources.

    To extend this work, educators should consider opportunities to integrate AI learning in other disciplinary subjects (ie, outside of computer science classes).

     
    more » « less
  4. null (Ed.)
    The proposed Biology Integration Institute will bring together two major research institutions in the Upper Midwest—the University of Minnesota (UMN) and University of Wisconsin-Madison (UW)—to investigate the causes and consequences of plant biodiversity across scales in a rapidly changing world —from genes and molecules within cells and tissues to communities, ecosystems, landscapes and the biosphere. The Institute focuses on plant biodiversity, defined broadly to encompass the heterogeneity within life that occurs from the smallest to the largest biological scales. A premise of the Institute is that life is envisioned as occurring at different scales nested within several contrasting conceptions of biological hierarchies, defined by the separate but related fields of physiology, evolutionary biology and ecology. The Institute will emphasize the use of ‘spectral biology’—detection of biological properties based on the interaction of light energy with matter—and process-oriented predictive models to investigate the processes by which biological components at one scale give rise to emergent properties at higher scales. Through an iterative process that harnesses cutting edge technologies to observe a suite of carefully designed empirical systems—including the National Ecological Observatory Network (NEON) and some of the world’s longest running and state-of-the-art global change experiments—the Institute will advance biological understanding and theory of the causes and consequences of changes in biodiversity and at the interface of plant physiology, ecology and evolution. INTELLECTUAL MERIT The Institute brings together a diverse, gender-balanced and highly productive team with significant leadership experience that spans biological disciplines and career stages and is poised to integrate biology in new ways. Together, the team will harness the potential of spectral biology, experiments, observations and synthetic modeling in a manner never before possible to transform understanding of how variation within and among biological scales drives plant and ecosystem responses to global change over diurnal, seasonal and millennial time scales. In doing so, it will use and advance state-of-the-art theory. The institute team posits that the designed projects will unearth transformative understanding and biological rules at each of the various scales that will enable an unprecedented capacity to discern the linkages between physiological, ecological and evolutionary processes in relation to the multi-dimensional nature of biodiversity in this time of massive planetary change. A strength of the proposed Institute is that it leverages prior federal investments in research and formalizes partnerships with foreign institutions heavily invested in related biodiversity research. Most of the planned projects leverage existing research initiatives, infrastructure, working groups, experiments, training programs, and public outreach infrastructure, all of which are already highly synergistic and collaborative, and will bring together members of the overall research and training team. BROADER IMPACTS A central goal of the proposed Institute is to train the next generation of diverse integrative biologists. Post-doctoral, graduate student and undergraduate trainees, recruited from non-traditional and underrepresented groups, including through formal engagement with Native American communities, will receive a range of mentoring and training opportunities. Annual summer training workshops will be offered at UMN and UW as well as training experiences with the Global Change and Biodiversity Research Priority Program (URPP-GCB) at the University of Zurich (UZH) and through the Canadian Airborne Biodiversity Observatory (CABO). The Institute will engage diverse K-12 audiences, the general public and Native American communities through Market Science modules, Minute Earth videos, a museum exhibit and public engagement and educational activities through the Bell Museum of Natural History, the Cedar Creek Ecosystem Science Reserve (CCESR) and the Wisconsin Tribal Conservation Association. 
    more » « less
  5. It’s critical to foster artificial intelligence (AI) literacy for high school students, the first generation to grow up surrounded by AI, to understand working mechanism of data-driven AI technologies and critically evaluate automated decisions from predictive models. While efforts have been made to engage youth in understanding AI through developing machine learning models, few provided in-depth insights into the nuanced learning processes. In this study, we examined high school students’ data modeling practices and processes. Twenty-eight students developed machine learning models with text data for classifying negative and positive reviews of ice cream stores. We identified nine data modeling practices that describe students’ processes of model exploration, development, and testing and two themes about evaluating automated decisions from data technologies. The results provide implications for designing accessible data modeling experiences for students to understand data justice as well as the role and responsibility of data modelers in creating AI technologies. 
    more » « less