skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2410668

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. In an era of information overload, research writing, particularly literature review composition, has become increasingly burdensome due to the sheer volume of scholarly publications released each year. This paper introduces {\em WriteAssist}, a novel standalone authoring system that helps researchers efficiently generate literature review sections. Given the title and abstract of a work-in-progress manuscript, WriteAssist automatically retrieves relevant and recent peer-reviewed articles, highlighting portions that offer supporting or contrasting perspectives. A key innovation lies in its personalized recommendation engine, which tailors results based on the user's prior publications and research profile, enabling context-aware synthesis. We position WriteAssist within the landscape of intelligent writing assistants, academic search platforms, and personalized recommender systems, and we detail its architecture -- integrating natural language processing and user modeling to streamline academic writing. The system represents a significant step toward alleviating cognitive overload in scholarly composition and offers a blueprint for smarter, adaptive tools in academic research support. 
    more » « less
    Free, publicly-accessible full text available September 15, 2026
  2. Discovering novel molecules with targeted properties remains a formidable challenge in materials science, often likened to finding a needle in a haystack. Traditional experimental approaches are slow, costly, and inefficient. In this study, we present an inverse design framework based on a molecular graph conditional variational autoencoder (CVAE) that enables the generation of new molecules with user-specified optical properties, particularly molar extinction coefficient ($$\varepsilon$$). Our model encodes molecular graphs, derived from SMILES strings, into a structured latent space, and then decodes them into valid molecular structures conditioned on a target $$\varepsilon$$ value. Trained on a curated dataset of known molecules with corresponding extinction coefficients, the CVAE learns to generate chemically valid structures, as verified by RDKit. Subsequent Density Functional Theory (DFT) simulations confirm that many of the generated molecules exhibit the electronic structures similar to those molecules with desired $$\varepsilon$$ values. We have also verified the $$\varepsilon$$ values of the generated molecules using a graph neural network (GNN) and the synthesizability of those molecules using an open-source module named ASKCOS. This approach demonstrates the potential of CVAEs to accelerate molecular discovery by enabling user-guided, property-driven molecule generation -- offering a scalable, data-driven alternative to traditional trial-and-error synthesis. 
    more » « less
    Free, publicly-accessible full text available September 15, 2026
  3. A vast proportion of scientific data remains locked behind dynamic web interfaces, often called the deep web—inaccessible to conventional search engines and standard crawlers. This gap between data availability and machine usability hampers the goals of open science and automation. While registries like FAIRsharing offer structured metadata describing data standards, repositories, and policies aligned with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, they do not enable seamless, programmatic access to the underlying datasets. We present FAIRFind, a system designed to bridge this accessibility gap. FAIRFind autonomously discovers, interprets, and operationalizes access paths to biological databases on the deep web, regardless of their FAIR compliance. Central to our approach is the Deep Web Communication Protocol (DWCP), a resource description language that represents web forms, HyperText Markup Language (HTML) tables, and file-based data interfaces in a machine-actionable format. Leveraging large language models (LLMs), FAIRFind combines a specialized deep web crawler and web-form comprehension engine to transform passive web metadata into executable workflows. By indexing and embedding these workflows, FAIRFind enables natural language querying over diverse biological data sources and returns structured, source-resolved results. Evaluation across multiple open-source LLMs and database types demonstrates over 90% success in structured data extraction and high semantic retrieval accuracy. FAIRFind advances existing registries by turning linked resources from static references into actionable endpoints, laying a foundation for intelligent, autonomous data discovery across scientific domains. 
    more » « less
    Free, publicly-accessible full text available July 26, 2026
  4. While scientific workflows have been established and used in a number of disciplines for specifying and executing experiments and data analysis, early and recent studies have demonstrated that an important proportion of workflows suffer from decay. This phenomena is exacerbated by legacy scientific workflow systems, notably Taverna, which was popular in e-science for orchestrating complex analyses. A step towards addressing this issue, we report on in this paper a feasibility study on using generative AI to revive decayed workflows, combining large language models with modern workflow technologies. Our approach automates critical revival tasks including parsing of legacy Taverna workflows, failure point identification, repair suggestion, and conversion to contemporary formats, viz. SnakeMake. The methodology integrates AI-driven workflow summarization, pseudocode abstraction, graph-based visualization, automated service substitution, and code generation. We demonstrate and evaluate this approach through a real-world decayed workflow case study. We conclude the paper with a discussion on key lessons that we learned and will guide development of a systematic workflow revival framework as part of our future work. 
    more » « less
    Free, publicly-accessible full text available July 19, 2026
  5. The rise of Large Language Models (LLMs) as powerful knowledge-processing tools has sparked a wave of innovation in tutoring and assessment systems. Despite their well-documented limitations, LLMs offer unique capabilities that have been effectively harnessed for automated feedback generation and grading in intelligent learning environments. In this paper, we introduce {\em Project 360}, an experimental intelligent tutoring system designed for teaching SQL. Project 360 leverages the concept of {\em query equivalence} to assess the accuracy of student queries, using ChatGPT’s advanced natural language analysis to measure their semantic distance from a reference query. By integrating LLM-driven evaluation, Project 360 significantly outperforms traditional SQL tutoring and grading systems, offering more precise assessments and context-aware feedback. This study explores the feasibility and limitations of using ChatGPT as the analytical backbone of Project 360, evaluating its reliability for autonomous tutoring and assessment in database education. Our findings provide valuable insights into the evolving role of LLMs in education, highlighting their potential to revolutionize SQL learning while identifying areas for further refinement and improvement. 
    more » « less
    Free, publicly-accessible full text available July 14, 2026
  6. The discovery of functional dye materials with superior optical properties is crucial for advancing technologies in biomedical imaging, organic photovoltaics, and quantum information systems. Recent advancements highlight the need to accelerate this discovery process by integrating computational strategies with experimental methods. In this regard, we have employed a computational approach to explore the latent space of dye materials, utilizing swarm optimization techniques to efficiently navigate complex chemical spaces and identify optimal values of molecular properties using machine learning methods based on target properties, such as high extinction coefficients ($$\varepsilon$$). The latent space based evaluation outperformed all available features of a domain. This approach enhances inverse material design by systematically correlating molecular parameters with desired optical characteristics by implementing VAEs. In this process, by defining target properties as inputs, the model effectively determines the key molecular features necessary for engineering high-performance dye compounds. 
    more » « less
    Free, publicly-accessible full text available June 23, 2026
  7. Scientific workflows are pivotal for managing complex computational tasks, including data analysis, processing, simulation, and visualization. However, their design and administration typically demand substantial programming expertise, limiting access for domain scientists. Many such workflow systems also lack real-time execution tracking, and streamlined data integration capabilities, hindering efficiency and repeatability in scientific experimentation. In response, we introduce VisFlow 2.0, a next-generation platform derived from the original VisFlow. We compare VisFlow 2.0 to traditional alternatives through a well-studied computational pipeline, highlighting its usability, flexibility, and effectiveness, especially for non-expert users. 
    more » « less
    Free, publicly-accessible full text available June 23, 2026
  8. Machine learning now drives the digital economy, yet most toolkits still demand low-level statistical and algorithmic expertise that excludes non-specialists. To remove this barrier, we present the Machine-learning Query Language (MQL) -- a fully declarative interface that lets users express analytic intent as succinctly as SQL expresses data retrieval. An MQL compiler faithfully translates each statement into an executable pipeline on mainstream frameworks such as Scikit-Learn, PyCaret, TPOT, TensorFlow or PyTorch, hiding all procedural detail. Experiments underscore its impact. Compared with hand-coded scripts, MQL cut development effort by 70–85 times for classification, 100–140 times for regression, and 65–80 times for clustering. In 95\% of trials the auto-generated pipelines matched or outperformed the most accurate manually tuned models, and MQL’s framework-selection logic chose the best backend 90\% of the time. By coupling SQL-style abstraction with robust code generation, MQL delivers a decisive leap toward true mass-market, self-service machine learning. 
    more » « less
    Free, publicly-accessible full text available May 3, 2026
  9. A brain tumor is an abnormal growth in the brain that disrupts its functionality and poses a significant threat to human life by damaging neurons. Early detection and classification of brain tumors are crucial to prevent complications and maintain good health. Recent advancements in deep learning techniques have shown immense potential in image classification and segmentation for tumor identification and classification. In this study, we present a platform, BrainView, for detection, and segmentation of brain tumors from Magnetic Resonance Images (MRI) using deep learning. We utilized EfficientNetB7 pre-trained model to design our proposed DeepBrainNet classification model for analyzing brain MRI images to classify its type. We also proposed a EfficinetNetB7 based image segmentation model, called the EffB7-UNet, for tumor localization. Experimental results show significantly high classification (99.96%) and segmentation (92.734%) accuracies for our proposed models. Finally, we discuss the contours of a cloud application for BrainView using Flask and Flutter to help researchers and clinicians use our machine learning models online for research purposes. 
    more » « less
    Free, publicly-accessible full text available May 1, 2026
  10. The rising popularity of data science and machine learning (ML) across diverse domains, often driven by users with limited computational expertise, reflects the growing commoditization of ML tools. However, the advanced technical and mathematical knowledge demanded by current ML frameworks poses a formidable barrier for non-experts, preventing them from fully exploiting these powerful platforms.In response, we introduce MQL, a novel declarative query language for ML application design, alongside its corresponding query processing engine. We demonstrate that abstracting ML concepts -- similarly to SQL -- can preserve both processing efficiency and analytical fidelity. Our implementation defines MQL semantics through a semantics-preserving mapping to widely understood ML code fragments. By leveraging task-specific meta-features, heuristic knowledge, and standard assessment methods, our system ranks candidate ML libraries, selects optimal algorithms, and frees users from these choices.We introduce mapping algorithms to ensure that each MQL program retains its intended semantics and present experimental evaluations demonstrating that MQL’s algorithmic selections not only match but surpass human-engineered solutions in terms of performance and model accuracy. By offering declarative queries as a high-level alternative to traditional coding, MQL significantly reduces the complexity of data analysis pipeline construction, thereby democratizing machine learning application design. To foster shared community development, this work is maintained as an open-source project at \url{https://github.com/hmjamil/mql}. 
    more » « less
    Free, publicly-accessible full text available April 8, 2026