skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: mlr3pipelines - Flexible Machine Learning Pipelines in R
Recent years have seen a proliferation of ML frameworks. Such systems make ML accessible to non-experts, especially when combined with powerful parameter tuning and AutoML techniques. Modern, applied ML extends beyond direct learning on clean data, however, and needs an expressive language for the construction of complex ML workflows beyond simple pre- and post-processing. We present mlr3pipelines, an R framework which can be used to define linear and complex non-linear ML workflows as directed acyclic graphs. The framework is part of the mlr3 ecosystem, leveraging convenient resampling, benchmarking, and tuning components.  more » « less
Award ID(s):
1813537
PAR ID:
10289803
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Journal of machine learning research
Volume:
22
Issue:
184
ISSN:
1532-4435
Page Range / eLocation ID:
1-7
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present POPPER, a dataflow system for building Machine Learning (ML) workflows. A novel aspect of POPPER is its built-in support for in-flight error handling, which is crucial in developing effective ML workflows. POPPER provides a convenient API that allows users to create and execute complex workflows comprising traditional data processing operations (such as map, filter, and join) and user-defined error handlers. The latter enables inflight detection and correction of errors introduced by ML models in the workflows. Inside POPPER, we model the workflow as a reactive dataflow, a directed cyclic graph, to achieve efficient execution through pipeline parallelization. We demonstrate the in-flight error-handling capabilities of POPPER, for which we have built a graphical interface, allowing users to specify workflows, visualize and interact with its reactive dataflow, and delve into the internals of POPPER. 
    more » « less
  2. Machine settings and tuning are critical for digital fabrication outcomes. However, exploring these parameters is non-trivial. We seek to enable exploration of the full design space of digital fabrication. To identify where we might intervene, we studied how practitioners approach 3D printing. We found that beyond using CAD/CAM, they create bespoke routines and workflows to explore interdependent material and machine settings. We seek to provide a system that supports this workflow development. We identified design goals around material exploration, fine-tuned control, and iteration. Based on these, we present p5.fab, a system for controlling digital fabrication machines from the creative coding environment p5.js. We demonstrate p5.fab with examples of 3D prints that cannot be made with traditional 3D printing software. We evaluate p5.fab in workshops and find that it encourages novel printing workflows and artifacts. Finally, we discuss implications for future digital fabrication systems. 
    more » « less
  3. Machine learning (ML) is being applied in a number of everyday contexts from image recognition, to natural language processing, to autonomous vehicles, to product recommendation. In the science realm, ML is being used for medical diagnosis, new materials development, smart agriculture, DNA classification, and many others. In this article, we describe the opportunities of using ML in the area of scientific workflow management. Scientific workflows are key to today’s computational science, enabling the definition and execution of complex applications in heterogeneous and often distributed environments. We describe the challenges of composing and executing scientific workflows and identify opportunities for applying ML techniques to meet these challenges by enhancing the current workflow management system capabilities. We foresee that as the ML field progresses, the automation provided by workflow management systems will greatly increase and result in significant improvements in scientific productivity. 
    more » « less
  4. null (Ed.)
    Storage systems and their OS components are designed to accommodate a wide variety of applications and dynamic workloads. Storage components inside the OS contain various heuristic algorithms to provide high performance and adaptability for different workloads. These heuristics may be tunable via parameters, and some system calls allow users to optimize their system performance. These parameters are often predetermined based on experiments with limited applications and hardware. Thus, storage systems often run with these predetermined and possibly suboptimal values. Tuning these parameters manually is impractical: one needs an adaptive, intelligent system to handle dynamic and complex workloads. Machine learning (ML) techniques are capable of recognizing patterns, abstracting them, and making predictions on new data. ML can be a key component to optimize and adapt storage systems. In this position paper, we propose KML, an ML framework for storage systems. We implemented a prototype and demonstrated its capabilities on the well-known problem of tuning optimal readahead values. Our results show that KML has a small memory footprint, introduces negligible overhead, and yet enhances throughput by as much as 2.3x. 
    more » « less
  5. Abstract Insect pests cause significant damage to food production, so early detection and efficient mitigation strategies are crucial. There is a continual shift toward machine learning (ML)‐based approaches for automating agricultural pest detection. Although supervised learning has achieved remarkable progress in this regard, it is impeded by the need for significant expert involvement in labeling the data used for model training. This makes real‐world applications tedious and oftentimes infeasible. Recently, self‐supervised learning (SSL) approaches have provided a viable alternative to training ML models with minimal annotations. Here, we present an SSL approach to classify 22 insect pests. The framework was assessed on raw and segmented field‐captured images using three different SSL methods, Nearest Neighbor Contrastive Learning of Visual Representations (NNCLR), Bootstrap Your Own Latent, and Barlow Twins. SSL pre‐training was done on ResNet‐18 and ResNet‐50 models using all three SSL methods on the original RGB images and foreground segmented images. The performance of SSL pre‐training methods was evaluated using linear probing of SSL representations and end‐to‐end fine‐tuning approaches. The SSL‐pre‐trained convolutional neural network models were able to perform annotation‐efficient classification. NNCLR was the best performing SSL method for both linear and full model fine‐tuning. With just 5% annotated images, transfer learning with ImageNet initialization obtained 74% accuracy, whereas NNCLR achieved an improved classification accuracy of 79% for end‐to‐end fine‐tuning. Models created using SSL pre‐training consistently performed better, especially under very low annotation, and were robust to object class imbalances. These approaches help overcome annotation bottlenecks and are resource efficient. 
    more » « less