skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Foundation Models using Self-Improving Data Foundation Models using Self-Improving Data Augmentation
Optical multilayer thin film structures are widely used in many photonic applica- tions, including filters, absorbers, photovoltaics, display devices. The important part to enable these applications is the inverse design, which seeks to identify a suitable structure that satisfy desired optical responses. Recently, a Foundation model-based OptoGPT is proposed and has shown great potential to solve a wide range of inverse design problems. However, OptoGPT fails to design certain types of optical responses that are important to practical applications. The major rea- son is that the training data is randomly sampled and it is highly probable that these design targets are not selected in training, leading to the out-of-distribution issue. In this work, we propose a self-improving data augmentation technique by leveraging neural networks’ extrapolation ability. Using this method, we show sig- nificant improvement in various application design tasks with minimum fine-tuning. The approach can be potentially generalized to other inverse scientific foundation models.  more » « less
Award ID(s):
2309403
PAR ID:
10644985
Author(s) / Creator(s):
; ;
Publisher / Repository:
Foundation Models for Science Workshop, 38th Conference on Neural Information Processing Systems (NeurIPS 2024).
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Jovanovic, Jelena; Chounta, Irene-Angelica; Uhomoibhi, James; McLaren, Bruce (Ed.)
    Computer-supported education studies can perform two important roles. They can allow researchers to gather important data about student learning processes, and they can help students learn more efficiently and effectively by providing automatic immediate feedback on what the students have done so far. The evaluation of student work required for both of these roles can be relatively easy in domains like math, where there are clear right answers. When text is involved, however, automated evaluations become more difficult. Natural Language Processing (NLP) can provide quick evaluations of student texts. However, traditional neural network approaches require a large amount of data to train models with enough accuracy to be useful in analyzing student responses. Typically, educational studies collect data but often only in small amounts and with a narrow focus on a particular topic. BERT-based neural network models have revolutionized NLP because they are pre-trained on very large corpora, developing a robust, contextualized understanding of the language. Then they can be “fine-tuned” on a much smaller set of data for a particular task. However, these models still need a certain base level of training data to be reasonably accurate, and that base level can exceed that provided by educational applications, which might contain only a few dozen examples. In other areas of artificial intelligence, such as computer vision, model performance on small data sets has been improved by “data augmentation” — adding scaled and rotated versions of the original images to the training set. This has been attempted on textual data; however, augmenting text is much more difficult than simply scaling or rotating images. The newly generated sentences may not be semantically similar to the original sentence, resulting in an improperly trained model. In this paper, we examine a self-augmentation method that is straightforward and shows great improvements in performance with different BERT-based models in two different languages and on two different tasks that have small data sets. We also identify the limitations of the self-augmentation procedure. 
    more » « less
  2. We apply foundation models to data discovery and exploration tasks. Foundation models are large language models (LLMS) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly applicable to the data discovery and data exploration domain. When carefully used, they have superior capability on three representative tasks: table-class detection, column-type annotation and join-column prediction. On all three tasks, we show that a foundation-model-based approach outperforms the task-specific models and so the state of the art. Further, our approach often surpasses human-expert task performance. We investigate the fundamental characteristics of this approach including generalizability to several foundation models and the impact of non-determinism on the outputs. All in all, this suggests a future direction in which disparate data management tasks can be unified under foundation models. 
    more » « less
  3. We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In contrast, foundation models pretrained on internet-scale data appear to have superior generalization capabilities, and in some instances display an emergent ability to find zero-shot solutions to problems that are not present in the training data. Foundation models may hold the potential to enhance various components of the robot autonomy stack, from perception to decision-making and control. For example, large language models can generate code or provide common sense reasoning, while vision-language models enable open-vocabulary visual recognition. However, significant open research challenges remain, particularly around the scarcity of robot-relevant training data, safety guarantees and uncertainty quantification, and real-time execution. In this survey, we study recent papers that have used or built foundation models to solve robotics problems. We explore how foundation models contribute to improving robot capabilities in the domains of perception, decision-making, and control. We discuss the challenges hindering the adoption of foundation models in robot autonomy and provide opportunities and potential pathways for future advancements. The GitHub project corresponding to this paper can be found here: https://github.com/robotics-survey/Awesome-Robotics-Foundation-Models . 
    more » « less
  4. Self-supervised learning (SSL) is at the core of training modern large machine learning models, providing a scheme for learning powerful representations that can be used in a variety of downstream tasks. However, SSL strategies must be adapted to the type of training data and downstream tasks required. We propose resimulation-based self-supervised representation learning (RS3L), a novel simulation-based SSL strategy that employs a method of to drive data augmentation for contrastive learning in the physical sciences, particularly, in fields that rely on stochastic simulators. By intervening in the middle of the simulation process and rerunning simulation components downstream of the intervention, we generate multiple realizations of an event, thus producing a set of augmentations covering all physics-driven variations available in the simulator. Using experiments from high-energy physics, we explore how this strategy may enable the development of a foundation model; we show how RS3L pretraining enables powerful performance in downstream tasks such as discrimination of a variety of objects and uncertainty mitigation. In addition to our results, we make the RS3L dataset publicly available for further studies on how to improve SSL strategies. Published by the American Physical Society2025 
    more » « less
  5. Abstract Machine learning provides a promising platform for both forward modeling and the inverse design of photonic structures. Relying on a data-driven approach, machine learning is especially appealing for situations when it is not feasible to derive an analytical solution for a complex problem. There has been a great amount of recent interest in constructing machine learning models suitable for different electromagnetic problems. In this work, we adapt a region-specified design approach for the inverse design of multilayered nanoparticles. Given the high computational cost of dataset generation for electromagnetic problems, we specifically investigate the case of a small training dataset, enhanced via random region specification in an inverse convolutional neural network. The trained model is used to design nanoparticles with high absorption levels and different ratios of absorption over scattering. The central design wavelength is shifted across 350–700 nm without re-training. We discuss the implications of wavelength, particle size, and the training dataset size on the performance of the model. Our approach may find interesting applications in the design of multilayer nanoparticles for biological, chemical, and optical applications as well as the design of low-scattering absorbers and antennas. 
    more » « less