skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 1, 2026

Title: The bottlenecks of AI: challenges for embedded and real-time research in a data-centric age
Abstract Recent advances in AI culminate a shift in science and engineering away from strong reliance on algorithmic and symbolic knowledge towards new data-driven approaches. How does the emerging intelligent data-centric world impact research on real-time and embedded computing? We argue for two effects: (1) new challenges in embedded system contexts, and (2) new opportunities for community expansion beyond the embedded domain. First,on the embedded system side, the shifting nature of computing towardsdata-centricityaffects the types of bottlenecks that arise. At training time, the bottlenecks are generallydata-related. Embedded computing relies onscarcesensor data modalities, unlike those commonly addressed in mainstream AI, necessitating solutions forefficient learningfrom scarce sensor data. At inference time, the bottlenecks areresource-related, calling forimproved resource economyandnovel scheduling policies. Further ahead, the convergence of AI around large language models (LLMs) introduces additionalmodel-relatedchallenges in embedded contexts. Second,on the domain expansion side, we argue that community expertise in handling resource bottlenecks is becoming increasingly relevant to a new domain: thecloudenvironment, driven by AI needs. The paper discusses the novel research directions that arise in the data-centric world of AI, covering data-, resource-, and model-related challenges in embedded systems as well as new opportunities in the cloud domain.  more » « less
Award ID(s):
2038817
PAR ID:
10642809
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
Journal of Real-time Systems, Springer
Date Published:
Journal Name:
Real-Time Systems
Volume:
61
Issue:
2
ISSN:
0922-6443
Page Range / eLocation ID:
185 to 236
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper articulates our vision for a learning-based untrustworthy distributed database. We focus on permissioned blockchain systems as an emerging instance of untrustworthy distributed databases and argue that as novel smart contracts, modern hardware, and new cloud platforms arise, future-proof permissioned blockchain systems need to be designed withfull-stack adaptivityin mind. At the application level, a future-proof system must adaptively learn the best-performing transaction processing paradigm and quickly adapt to new hardware and unanticipated workload changes on the fly. Likewise, the Byzantine consensus layer must dynamically adjust itself to the workloads, faulty conditions, and network configuration while maintaining compatibility with the transaction processing paradigm. At the infrastructure level, cloud providers must enable cross-layer adaptation, which identifies performance bottlenecks and possible attacks, and determines at runtime the degree of resource disaggregation that best meets application requirements. Within this vision of the future, our paper outlines several research challenges together with some preliminary approaches. 
    more » « less
  2. With each passing year, the state-of-the-art deep learning neural networks grow larger in size, requiring larger computing and power resources. The high compute resources required by these large networks are alienating the majority of the world population that lives in low-resource settings and lacks the infrastructure to benefit from these advancements in medical AI. Current state-of-the-art medical AI, even with cloud resources, is a bit difficult to deploy in remote areas where we don’t have good internet connectivity. We demonstrate a cost-effective approach to deploying medical AI that could be used in limited resource settings using Edge Tensor Processing Unit (TPU). We trained and optimized a classification model on the Chest X-ray 14 dataset and a segmentation model on the Nerve ultrasound dataset using INT8 Quantization Aware Training. Thereafter, we compiled the optimized models for Edge TPU execution. We find that the inference performance on edge TPUs is 10x faster compared to other embedded devices. The optimized model is 3x and 12x smaller for the classification and segmentation respectively, compared to the full precision model. In summary, we show the potential of Edge TPUs for two medical AI tasks with faster inference times, which could potentially be used in low-resource settings for medical AI-based diagnostics. We finally discuss some potential challenges and limitations of our approach for real-world deployments. 
    more » « less
  3. Abstract Modern science’s ability to produce, store, and analyze big datasets is changing the way that scientific research is practiced. Philosophers have only begun to comprehend the changed nature of scientific reasoning in this age of “big data.” We analyze data-focused practices in biology and climate modeling, identifying distinct species of data-centric science: phenomena-laden in biology and phenomena-agnostic in climate modeling, each better suited for its own domain of application, though each entail trade-offs. We argue that data-centric practices in science are not monolithic because the opportunities and challenges presented by big data vary across scientific domains. 
    more » « less
  4. null (Ed.)
    Due to the amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become a bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and near-memory processing (NMP) paradigms, aims to accelerate these types of applications by moving the computation closer to the data. Over the past few years, researchers have proposed various memory architectures that enable DCC systems, such as logic layers in 3D-stacked memories or charge-sharing-based bitwise operations in dynamic random-access memory (DRAM). However, application-specific memory access patterns, power and thermal concerns, memory technology limitations, and inconsistent performance gains complicate the offloading of computation in DCC systems. Therefore, designing intelligent resource management techniques for computation offloading is vital for leveraging the potential offered by this new paradigm. In this article, we survey the major trends in managing PIM and NMP-based DCC systems and provide a review of the landscape of resource management techniques employed by system designers for such systems. Additionally, we discuss the future challenges and opportunities in DCC management. 
    more » « less
  5. Serverless computing platforms have gained popularity because they allow easy deployment of services in a highly scalable and cost-effective manner. By enabling just-in-time startup of container-based services, these platforms can achieve good multiplexing and automatically respond to traffic growth, making them particularly desirable for edge cloud data centers where resources are scarce. Edge cloud data centers are also gaining attention because of their promise to provide responsive, low-latency shared computing and storage resources. Bringing serverless capabilities to edge cloud data centers must continue to achieve the goals of low latency and reliability. The reliability guarantees provided by serverless computing however are weak, with node failures causing requests to be dropped or executed multiple times. Thus serverless computing only provides a best effort infrastructure, leaving application developers responsible for implementing stronger reliability guarantees at a higher level. Current approaches for providing stronger semantics such as “exactly once” guarantees could be integrated into serverless platforms, but they come at high cost in terms of both latency and resource consumption. As edge cloud services move towards applications such as autonomous vehicle control that require strong guarantees for both reliability and performance, these approaches may no longer be sufficient. In this paper we evaluate the latency, throughput, and resource costs of providing different reliability guarantees, with a focus on these emerging edge cloud platforms and applications. 
    more » « less