Methods for inferring geographic origin from the stable isotope composition of animal tissues are widely used in movement ecology, but few computational tools and standards for data interpretation are available. We introduce the The
Image‐based machine learning tools are an ascendant ‘big data’ research avenue. Citizen science platforms, like iNaturalist, and museum‐led initiatives provide researchers with an abundance of data and knowledge to extract. These include extraction of metadata, species identification, and phenomic data. Ecological and evolutionary biologists are increasingly using complex, multi‐step processes on data. These processes often include machine learning techniques, often built by others, that are difficult to reuse by other members in a collaboration. We present a conceptual workflow model for machine learning applications using image data to extract biological knowledge in the emerging field of imageomics. We derive an implementation of this conceptual workflow for a specific imageomics application that adheres to FAIR principles as a formal workflow definition that allows fully automated and reproducible execution, and consists of reusable workflow components. We outline technologies and best practices for creating an automated, reusable and modular workflow, and we show how they promote the reuse of machine learning models and their adaptation for new research questions. This conceptual workflow can be adapted: it can be semi‐automated, contain different components than those presented here, or have parallel components for comparative studies. We encourage researchers—both computer scientists and biologists—to build upon this conceptual workflow that combines machine learning tools on image data to answer novel scientific questions in their respective fields.
- PAR ID:
- 10502004
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Methods in Ecology and Evolution
- ISSN:
- 2041-210X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract assignR r package, which provides a structured, flexible toolkit for isotope‐based migration data analysis and interpretation using a widely adopted semi‐parametric Bayesian inversion method.assignR bundles data resources and functions that support data interpretation, hypothesis‐testing and quality assessment, allowing end‐to‐end data analysis with only a few lines of code. Tools for post hoc analysis offer robust, standardized methods for aggregating information from multiple individuals, assignment of individuals to a sub‐region of the study area and comparison of potential regions of origin using odds ratios. Assessment tools quantify the quality and power of the isotopic assignments and can be used to test prototype study designs.assignR package should increase the accessibility of isotopic geolocation methods.assignR supports flexible data sources and analysis decisions, making it suitable for a wide range of applications, but also promotes standardization that will help foster increased consistency and comparability among studies and a more holistic understanding of animal migration. Lastly,assignR can help make isotope‐based geolocation research more efficient by helping researchers plan projects to be optimally aligned with their research questions. -
Abstract Recent advances in generative artificial intelligence (AI) and multimodal learning analytics (MMLA) have allowed for new and creative ways of leveraging AI to support K12 students' collaborative learning in STEM+C domains. To date, there is little evidence of AI methods supporting students' collaboration in complex, open‐ended environments. AI systems are known to underperform humans in (1) interpreting students' emotions in learning contexts, (2) grasping the nuances of social interactions and (3) understanding domain‐specific information that was not well‐represented in the training data. As such, combined human and AI (ie, hybrid) approaches are needed to overcome the current limitations of AI systems. In this paper, we take a first step towards investigating how a human‐AI collaboration between teachers and researchers using an AI‐generated multimodal timeline can guide and support teachers' feedback while addressing students' STEM+C difficulties as they work collaboratively to build computational models and solve problems. In doing so, we present a framework characterizing the human component of our human‐AI partnership as a collaboration between teachers and researchers. To evaluate our approach, we present our timeline to a high school teacher and discuss the key insights gleaned from our discussions. Our case study analysis reveals the effectiveness of an iterative approach to using human‐AI collaboration to address students' STEM+C challenges: the teacher can use the AI‐generated timeline to guide formative feedback for students, and the researchers can leverage the teacher's feedback to help improve the multimodal timeline. Additionally, we characterize our findings with respect to two events of interest to the teacher: (1) when the students cross a
difficulty threshold, and (2) thepoint of intervention , that is, when the teacher (or system) should intervene to provide effective feedback. It is important to note that the teacher explained that there should be a lag between (1) and (2) to give students a chance to resolve their own difficulties. Typically, such a lag is not implemented in computer‐based learning environments that provide feedback.Practitioner notes What is already known about this topic
Collaborative, open‐ended learning environments enhance students' STEM+C conceptual understanding and practice, but they introduce additional complexities when students learn concepts spanning multiple domains.
Recent advances in generative AI and MMLA allow for integrating multiple datastreams to derive holistic views of students' states, which can support more informed feedback mechanisms to address students' difficulties in complex STEM+C environments.
Hybrid human‐AI approaches can help address collaborating students' STEM+C difficulties by combining the domain knowledge, emotional intelligence and social awareness of human experts with the general knowledge and efficiency of AI.
What this paper adds
We extend a previous human‐AI collaboration framework using a hybrid intelligence approach to characterize the human component of the partnership as a researcher‐teacher partnership and present our approach as a teacher‐researcher‐AI collaboration.
We adapt an AI‐generated multimodal timeline to actualize our human‐AI collaboration by pairing the timeline with videos of students encountering difficulties, engaging in active discussions with a high school teacher while watching the videos to discern the timeline's utility in the classroom.
From our discussions with the teacher, we define two types of
inflection points to address students' STEM+C difficulties—thedifficulty threshold and theintervention point —and discuss how thefeedback latency interval separating them can inform educator interventions.We discuss two ways in which our teacher‐researcher‐AI collaboration can help teachers support students encountering STEM+C difficulties: (1) teachers using the multimodal timeline to guide feedback for students, and (2) researchers using teachers' input to iteratively refine the multimodal timeline.
Implications for practice and/or policy
Our case study suggests that timeline gaps (ie, disengaged behaviour identified by off‐screen students, pauses in discourse and lulls in environment actions) are particularly important for identifying inflection points and formulating formative feedback.
Human‐AI collaboration exists on a dynamic spectrum and requires varying degrees of human control and AI automation depending on the context of the learning task and students' work in the environment.
Our analysis of this human‐AI collaboration using a multimodal timeline can be extended in the future to support students and teachers in additional ways, for example, designing pedagogical agents that interact directly with students, developing intervention and reflection tools for teachers, helping teachers craft daily lesson plans and aiding teachers and administrators in designing curricula.
-
Abstract Biodiversity studies rely heavily on estimates of species' distributions often obtained through ecological niche modelling. Numerous software packages exist that allow users to model ecological niches using machine learning and statistical methods. However, no existing package with a graphical user interface allows users to perform model calibration and selection based on convex forms such as ellipsoids, which may match fundamental ecological niche shapes better, incorporating tools for exploring, modelling, and evaluating niches and distributions that are intuitive for both novice and proficient users.
Here we describe an
r package, Niche Tool Box (ntbox ), that allows users to conduct all processing steps involved in ecological niche modelling: downloading and curating occurrence data, obtaining and transforming environmental data layers, selecting environmental variables, exploring relationships between geographic and environmental spaces, calibrating and selecting ellipsoid models, evaluating models using binomial and partial ROC tests, assessing extrapolation risk, and performing geographic information system operations via a graphical user interface. A summary of the entire workflow is produced for use as a stand‐alone algorithm or as part of research reports.The method is explained in detail and tested via modelling the threatened feline species
Leopardus wiedii . Georeferenced occurrence data for this species are queried to display both point occurrences and the IUCN extent of occurrence polygon (IUCN, 2007). This information is used to illustrate tools available for accessing, processing and exploring biodiversity data (e.g. number of occurrences and chronology of collecting) and transforming environmental data (e.g. a summary PCA for 19 bioclimatic layers). Visualizations of three‐dimensional ecological niches modelled as minimum volume ellipsoids are developed with ancillary statistics. This niche model is then projected to geographic space, to represent a corresponding potential suitability map.Using
ntbox allows a fast and straightforward means by which to retrieve and manipulate occurrence and environmental data, which can then be implemented in model calibration, projection and evaluation for assessing distributions of species in geographic space and their corresponding environmental combinations. -
Abstract Conceptual models are necessary to synthesize what is known about a topic, identify gaps in knowledge and improve understanding. The process of developing conceptual models that summarize the literature using ad hoc approaches has high potential to be incomplete due to the challenges of tracking information and hypotheses across the literature.
We present a novel, systematic approach to conceptual model development through qualitative synthesis and graphical analysis of hypotheses already present in the scientific literature. Our approach has five stages: researchers explicitly define the scope of the question, conduct a systematic review, extract hypotheses from prior studies, assemble hypotheses into a single network model and analyse trends in the model through network analysis.
The resulting network can be analysed to identify shifts in thinking over time, variation in the application of ideas over different axes of investigation (e.g. geography, taxonomy, ecosystem type) and the most important hypotheses based on the network structure. To illustrate the approach, we present examples from a case study that applied the method to synthesize decades of research on the effects of forest fragmentation on birds.
This approach can be used to synthesize scientific thinking across any field of research, guide future research to fill knowledge gaps efficiently and help researchers systematically build conceptual models representing alternative hypotheses.
-
Abstract The Molecular Sciences Software Institute's (MolSSI) Quantum Chemistry Archive (QCA
rchive ) project is an umbrella name that covers both a central server hosted by MolSSI for community data and the Python‐based software infrastructure that powers automated computation and storage of quantum chemistry (QC) results. The MolSSI‐hosted central server provides the computational molecular sciences community a location to freely access tens of millions of QC computations for machine learning, methodology assessment, force‐field fitting, and more through a Python interface. Facile, user‐friendly mining of the centrally archived quantum chemical data also can be achieved through web applications found athttps://qcarchive.molssi.org . The software infrastructure can be used as a standalone platform to compute, structure, and distribute hundreds of millions of QC computations for individuals or groups of researchers at any scale. The QCArchive Infrastructure is open‐source (BSD‐3C), code repositories can be found athttps://github.com/MolSSI , and releases can be downloaded via PyPI and Conda.This article is categorized under:
Electronic Structure Theory > Ab Initio Electronic Structure Methods
Software > Quantum Chemistry
Data Science > Computer Algorithms and Programming