While artificial intelligence and machine learning (AI/ML) frameworks gain prominence in science and engineering, most researchers face significant challenges in adopting complex AI/ML workflows to campus and national cyberinfrastructure (CI) environments. Data from the Texas A&M High Performance Computing (HPRC) researcher training program indicate that researchers increasingly want to learn how to migrate and work with their pre-existing AI/ML frameworks on large scale computing environments. Building on the continuing success of our work in developing innovative pedagogical approaches for CI- training approaches, we expand CI-infused pedagogical approaches to teach technology-based AI and data sciences. We revisit the pedagogical approaches used in the decades-old tradition of laboratories in the Physical Sciences that taught concepts via experiential learning. Here, we structure a series of exercises on interactive computing environments that give researchers immediate hands-on experience in AI/ML and data science technologies that they will use as they work on larger CI resources. These exercises, called “tech-labs,” assume that participating researchers are familiar with AI/ML approaches and focus on hands-on exercises that teach researchers how to use these approaches on large-scale CI. The tech-labs offer four consecutive sessions, each introducing a learner to specific technologies offered in CI environments for AI/ML and data workflows. We report on our tech-lab offered for Python-based AI/ML approaches during which learners are introduced to Jupyter Notebooks followed by exercises using Pandas, Matplotlib, Scikit-learn, and Keras. The program includes a series of enhancements such as container support and easy launch of virtual environments in our Web-based computing interface. The approach is scalable to programs using a command line interface (CLI) as well. In all, the program offers a shift in focus from teaching AI/ML toward increasing adoption of AI/ML in large-scale CI.
more »
« less
Creating intelligent cyberinfrastructure for democratizing AI
Abstract Artificial intelligence (AI) has the potential for vast societal and economic gain; yet applications are developed in a largely ad hoc manner, lacking coherent, standardized, modular, and reusable infrastructures. The NSF‐funded Intelligent CyberInfrastructure with Computational Learning in the Environment AI Institute (“ICICLE”) aims to fundamentally advanceedge‐to‐center, AI‐as‐a‐Service, achieved through intelligent cyberinfrastructure (CI) that spans the edge‐cloud‐HPCcomputing continuum,plug‐and‐playnext‐generation AI and intelligent CI services, and a commitment to design for broad accessibility and widespread benefit. This design is foundational to the institute's commitment to democratizing AI. The institute's CI activities are informed by three high‐impact domains:animal ecology,digital agriculture, andsmart foodsheds. The institute's workforce development and broadening participation in computing efforts reinforce the institute's commitment todemocratizing AI. ICICLE seeks to serve asthe national nexus for AI and intelligent CI, and welcomes engagement across its wide set of programs.
more »
« less
- Award ID(s):
- 2112606
- PAR ID:
- 10505606
- Publisher / Repository:
- Wiley Online Library
- Date Published:
- Journal Name:
- AI Magazine
- Volume:
- 45
- Issue:
- 1
- ISSN:
- 0738-4602
- Page Range / eLocation ID:
- 22 to 28
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The needs of cyberinfrastructure (CI) Users are different from those of CI Contributors. Typically, much of the training in advanced CI addresses developer topics such as MPI, OpenMP, CUDA and application profiling, leaving a gap in training for these users. To remedy this situation, we developed a new program: COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure (COMPLECS). COMPLECS focuses exclusively on helping CI Users acquire the skills and knowledge they need to efficiently accomplish their compute- and data-intensive research, covering topics such as parallel computing concepts, data management, batch computing, cybersecurity, HPC hardware overview, and high throughput computing.more » « less
-
With the increase in data-driven analytics, the demand for high performing computing resources has risen. There are many high-performance computing centers providing cyberinfrastructure (CI) for academic research. However, there exists access barriers in bringing these resources to a broad range of users. Users who are new to data analytics field are not yet equipped to take advantage of the tools offered by CI. In this paper, we propose a framework to lower the access barriers that exist in bringing the high-performance computing resources to users that do not have the training to utilize the capability of CI. The framework uses divide-and-conquer (DC) paradigm for data-intensive computing tasks. It consists of three major components - user interface (UI), parallel scripts generator (PSG) and underlying cyberinfrastructure (CI). The goal of the framework is to provide a user-friendly method for parallelizing data-intensive computing tasks with minimal user intervention. Some of the key design goals are usability, scalability and reproducibility. The users can focus on their problem and leave the parallelization details to the framework.more » « less
-
Abstract Significant investments to upgrade and construct large-scale scientific facilities demand commensurate investments in R&D to design algorithms and computing approaches to enable scientific and engineering breakthroughs in the big data era. Innovative Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology that now drive a multi-billion dollar industry, and which play an ever increasing role shaping human social patterns. As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single-GPU solutions for training, validation, and testing are no longer sufficient for computational grand challenges brought about by scientific facilities that produce data at a rate and volume that outstrip the computing capabilities of available cyberinfrastructure platforms. This realization has been driving the confluence of AI and high performance computing (HPC) to reduce time-to-insight, and to enable a systematic study of domain-inspired AI architectures and optimization schemes to enable data-driven discovery. In this article we present a summary of recent developments in this field, and describe specific advances that authors in this article are spearheading to accelerate and streamline the use of HPC platforms to design and apply accelerated AI algorithms in academia and industry.more » « less
-
Abstract Recent advances in AI culminate a shift in science and engineering away from strong reliance on algorithmic and symbolic knowledge towards new data-driven approaches. How does the emerging intelligent data-centric world impact research on real-time and embedded computing? We argue for two effects: (1) new challenges in embedded system contexts, and (2) new opportunities for community expansion beyond the embedded domain. First,on the embedded system side, the shifting nature of computing towardsdata-centricityaffects the types of bottlenecks that arise. At training time, the bottlenecks are generallydata-related. Embedded computing relies onscarcesensor data modalities, unlike those commonly addressed in mainstream AI, necessitating solutions forefficient learningfrom scarce sensor data. At inference time, the bottlenecks areresource-related, calling forimproved resource economyandnovel scheduling policies. Further ahead, the convergence of AI around large language models (LLMs) introduces additionalmodel-relatedchallenges in embedded contexts. Second,on the domain expansion side, we argue that community expertise in handling resource bottlenecks is becoming increasingly relevant to a new domain: thecloudenvironment, driven by AI needs. The paper discusses the novel research directions that arise in the data-centric world of AI, covering data-, resource-, and model-related challenges in embedded systems as well as new opportunities in the cloud domain.more » « less
An official website of the United States government

