skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: mlr3pipelines - Flexible Machine Learning Pipelines in R
Recent years have seen a proliferation of ML frameworks. Such systems make ML accessible to non-experts, especially when combined with powerful parameter tuning and AutoML techniques. Modern, applied ML extends beyond direct learning on clean data, however, and needs an expressive language for the construction of complex ML workflows beyond simple pre- and post-processing. We present mlr3pipelines, an R framework which can be used to define linear and complex non-linear ML workflows as directed acyclic graphs. The framework is part of the mlr3 ecosystem, leveraging convenient resampling, benchmarking, and tuning components.  more » « less
Award ID(s):
1813537
NSF-PAR ID:
10289803
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Journal of machine learning research
Volume:
22
Issue:
184
ISSN:
1532-4435
Page Range / eLocation ID:
1-7
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Machine learning (ML) is being applied in a number of everyday contexts from image recognition, to natural language processing, to autonomous vehicles, to product recommendation. In the science realm, ML is being used for medical diagnosis, new materials development, smart agriculture, DNA classification, and many others. In this article, we describe the opportunities of using ML in the area of scientific workflow management. Scientific workflows are key to today’s computational science, enabling the definition and execution of complex applications in heterogeneous and often distributed environments. We describe the challenges of composing and executing scientific workflows and identify opportunities for applying ML techniques to meet these challenges by enhancing the current workflow management system capabilities. We foresee that as the ML field progresses, the automation provided by workflow management systems will greatly increase and result in significant improvements in scientific productivity. 
    more » « less
  2. null (Ed.)
    Storage systems and their OS components are designed to accommodate a wide variety of applications and dynamic workloads. Storage components inside the OS contain various heuristic algorithms to provide high performance and adaptability for different workloads. These heuristics may be tunable via parameters, and some system calls allow users to optimize their system performance. These parameters are often predetermined based on experiments with limited applications and hardware. Thus, storage systems often run with these predetermined and possibly suboptimal values. Tuning these parameters manually is impractical: one needs an adaptive, intelligent system to handle dynamic and complex workloads. Machine learning (ML) techniques are capable of recognizing patterns, abstracting them, and making predictions on new data. ML can be a key component to optimize and adapt storage systems. In this position paper, we propose KML, an ML framework for storage systems. We implemented a prototype and demonstrated its capabilities on the well-known problem of tuning optimal readahead values. Our results show that KML has a small memory footprint, introduces negligible overhead, and yet enhances throughput by as much as 2.3x. 
    more » « less
  3. Abstract

    We examined neural mechanisms associated with the learning of novel morphologically derived words in native-Hebrew speakers within the Complementary Learning Systems (CLS) framework. Across four sessions, 28 participants were trained on an artificial language, which included two types of morphologically complex words: linear (root + suffix) with a salient structure, and non-linear (root interleaved with template), with a prominent derivational structure in participants' first language (L1). A third simple monomorphemic condition, which served as baseline, was also included. On the first and fourth sessions, training was followed by testing in an fMRI scanner. Our behavioural results showed decomposition of both types of complex words, with the linear structure more easily learned than the non-linear structure. Our fMRI results showed involvement of frontal areas, associated with decomposition, only for the non-linear condition, after just the first session. We also observed training-related increases in activation in temporal areas specifically for the non-linear condition which was correlated with participants' L1 morphological awareness. These results demonstrate that morphological decomposition of derived words occurs in the very early stages of word learning, is influenced by L1 experience, and can facilitate word learning. However, in contrast to the CLS framework, we found no support for a shift from reliance on hippocampus to reliance on cortical areas in any of our conditions. Instead, our findings align more closely with recent theories showing a positive correlation between changes in hippocampus and cortical areas, suggesting that these representations co-exist and continue to interact with one another beyond initial learning.

     
    more » « less
  4. Machine settings and tuning are critical for digital fabrication outcomes. However, exploring these parameters is non-trivial. We seek to enable exploration of the full design space of digital fabrication. To identify where we might intervene, we studied how practitioners approach 3D printing. We found that beyond using CAD/CAM, they create bespoke routines and workflows to explore interdependent material and machine settings. We seek to provide a system that supports this workflow development. We identified design goals around material exploration, fine-tuned control, and iteration. Based on these, we present p5.fab, a system for controlling digital fabrication machines from the creative coding environment p5.js. We demonstrate p5.fab with examples of 3D prints that cannot be made with traditional 3D printing software. We evaluate p5.fab in workshops and find that it encourages novel printing workflows and artifacts. Finally, we discuss implications for future digital fabrication systems. 
    more » « less
  5. Abstract

    Computational workflows are widely used in data analysis, enabling automated tracking of steps and storage of provenance information, leading to innovation and decision-making in the scientific community. However, the growing popularity of workflows has raised concerns about reproducibility and reusability which can hinder collaboration between institutions and users. In order to address these concerns, it is important to standardize workflows or provide tools that offer a framework for describing workflows and enabling computational reusability. One such set of standards that has recently emerged is the Common Workflow Language (CWL), which offers a robust and flexible framework for data analysis tools and workflows. To promote portability, reproducibility, and interoperability of AI/ML workflows, we developedgeoweaver_cwl, a Python package that automatically describes AI/ML workflows from a workflow management system (WfMS) named Geoweaver into CWL. In this paper, we test our Python package on multiple use cases from different domains. Our objective is to demonstrate and verify the utility of this package. We make all the code and dataset open online and briefly describe the experimental implementation of the package in this paper, confirming thatgeoweaver_cwlcan lead to a well-versed AI process while disclosing opportunities for further extensions. Thegeoweaver_cwlpackage is publicly released online athttps://pypi.org/project/geoweaver-cwl/0.0.1/and exemplar results are accessible at:https://github.com/amrutakale08/geoweaver_cwl-usecases.

     
    more » « less