skip to main content


This content will become publicly available on May 1, 2025

Title: Batched Low-Rank Adaptation of Foundation Models
Low-Rank Adaptation (LoRA) has recently gained attention for fine-tuning foundation models by incorporating trainable low-rank matrices, thereby reducing the number of trainable parameters. While LoRA offers numerous advantages, its applicability for real-time serving to a diverse and global user base is constrained by its incapability to handle multiple task-specific adapters efficiently. This imposes a performance bottleneck in scenarios requiring personalized, task-specific adaptations for each incoming request. To mitigate this constraint, we introduce Fast LoRA (FLoRA), a framework in which each input example in a minibatch can be associated with its unique low-rank adaptation weights, allowing for efficient batching of heterogeneous requests. We empirically demonstrate that FLoRA retains the performance merits of LoRA, showcasing competitive results on the MultiPL-E code generation benchmark spanning over 8 languages and a multilingual speech recognition task across 6 languages.  more » « less
Award ID(s):
1918839
NSF-PAR ID:
10498688
Author(s) / Creator(s):
Publisher / Repository:
ICLR 2024
Date Published:
Journal Name:
ICLR 2024
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In the U.S. inpatient payment system, the Diagnosis-Related Group (DRG) is pivotal, but its assignment process is inefficient. The study introduces , an advanced large language model (LLM) fine-tuned on clinical notes to enhance DRGs assignment. Utilizing LLaMA as the foundational model and optimizing it through Low-Rank Adaptation (LoRA) on 236,192 MIMIC-IV discharge summaries, our -7B model exhibited a noteworthy macro-averaged F1 score of 0.327, a top-1 prediction accuracy of 52.0%, and a macro-averaged Area Under the Curve (AUC) of 0.986, with a maximum input token length of 512. This model surpassed the performance of prior leading models in DRG prediction, showing a relative improvement of 40.3% and 35.7% in macro-averaged F1 score compared to ClinicalBERT and CAML, respectively. Applied to base DRG and complication or comorbidity (CC)/major complication or comorbidity (MCC) prediction, achieved a top-1 prediction accuracy of 67.8% and 67.5%, respectively. Additionally, our findings indicate that ’s performance correlates with increased model parameters and input context lengths.

     
    more » « less
  2. Low-Power Wide-Area Networks (LPWANs) are an emerging Internet-of-Things (IoT) paradigm, which caters to large-scale and long-term sensory data collection demand. Among the commercialized LPWAN technologies, LoRa (Long Range) attracts much interest from academia and industry due to its open-source physical (PHY) layer and standardized networking stack. In the flourishing LoRa community, many observations and countermeasures have been proposed to understand and improve the performance of LoRa networking in practice. From the perspective of the LoRa networking stack; however, we lack a whole picture to comprehensively understand what has been done or not and reveal what the future trends are. This survey proposes a taxonomy of a two-dimensional (i.e., networking layers, performance metrics) to categorize and compare the cutting-edge LoRa networking techniques. One dimension is the layered structure of the LoRa networking stack. From down to the top, we have the PHY layer, Link layer, Media-access Control (MAC) layer, and Application (App) layer. In each layer, we focus on the three most representative layer-specific research issues for fine-grained categorizing. The other dimension is LoRa networking performance metrics, including range, throughput, energy, and security. We compare different techniques in terms of these metrics and further overview the open issues and challenges, followed by our observed future trends. According to our proposed taxonomy, we aim at clarifying several ways to achieve a more effective LoRa networking stack and find more LoRa applicable scenarios, leading to a brand-new step toward a large-scale and long-term IoT. 
    more » « less
  3. On any modern computer architecture today, parallelism comes with a modest cost, born from the creation and management of threads or tasks. Today, programmers battle this cost by manually optimizing/tuning their codes to minimize the cost of parallelism without harming its benefit, performance. This is a difficult battle: programmers must reason about architectural constant factors hidden behind layers of software abstractions, including thread schedulers and memory managers, and their impact on performance, also at scale. In languages that support higher-order functions, the battle hardens: higher order functions can make it difficult, if not impossible, to reason about the cost and benefits of parallelism.

    Motivated by these challenges and the numerous advantages of high-level languages, we believe that it has become essential to manage parallelism automatically so as to minimize its cost and maximize its benefit. This is a challenging problem, even when considered on a case-by-case, application-specific basis. But if a solution were possible, then it could combine the many correctness benefits of high-level languages with performance by managing parallelism without the programmer effort needed to ensure performance. This paper proposes techniques for such automatic management of parallelism by combining static (compilation) and run-time techniques. Specifically, we consider the Parallel ML language with task parallelism, and describe a compiler pipeline that embeds potential parallelism directly into the call-stack and avoids the cost of task creation by default. We then pair this compilation pipeline with a run-time system that dynamically converts potential parallelism into actual parallel tasks. Together, the compiler and run-time system guarantee that the cost of parallelism remains low without losing its benefit. We prove that our techniques have no asymptotic impact on the work and span of parallel programs and thus preserve their asymptotic properties. We implement the proposed techniques by extending the MPL compiler for Parallel ML and show that it can eliminate the burden of manual optimization while delivering good practical performance.

     
    more » « less
  4. Dynamic adaptation is an error-driven process of adjusting planned motor actions to changes in task dynamics (Shadmehr, 2017). Adapted motor plans are consolidated into memories that contribute to better performance on re-exposure. Consolidation begins within 15 min following training (Criscimagna-Hemminger and Shadmehr, 2008), and can be measured via changes in resting state functional connectivity (rsFC). For dynamic adaptation, rsFC has not been quantified on this timescale, nor has its relationship to adaptative behavior been established. We used a functional magnetic resonance imaging (fMRI)-compatible robot, the MR-SoftWrist (Erwin et al., 2017), to quantify rsFC specific to dynamic adaptation of wrist movements and subsequent memory formation in a mixed-sex cohort of human participants. We acquired fMRI during a motor execution and a dynamic adaptation task to localize brain networks of interest, and quantified rsFC within these networks in three 10-min windows occurring immediately before and after each task. The next day, we assessed behavioral retention. We used a mixed model of rsFC measured in each time window to identify changes in rsFC with task performance, and linear regression to identify the relationship between rsFC and behavior. Following the dynamic adaptation task, rsFC increased within the cortico-cerebellar network and decreased interhemispherically within the cortical sensorimotor network. Increases within the cortico-cerebellar network were specific to dynamic adaptation, as they were associated with behavioral measures of adaptation and retention, indicating that this network has a functional role in consolidation. Instead, decreases in rsFC within the cortical sensorimotor network were associated with motor control processes independent from adaptation and retention.

    SIGNIFICANCE STATEMENTMotor memory consolidation processes have been studied via functional magnetic resonance imaging (fMRI) by analyzing changes in resting state functional connectivity (rsFC) occurring more than 30 min after adaptation. However, it is unknown whether consolidation processes are detectable immediately (<15 min) following dynamic adaptation. We used an fMRI-compatible wrist robot to localize brain regions involved in dynamic adaptation in the cortico-thalamic-cerebellar (CTC) and cortical sensorimotor networks and quantified changes in rsFC within each network immediately after adaptation. Different patterns of change in rsFC were observed compared with studies conducted at longer latencies. Increases in rsFC in the cortico-cerebellar network were specific to adaptation and retention, while interhemispheric decreases in the cortical sensorimotor network were associated with alternate motor control processes but not with memory formation.

     
    more » « less
  5. Africa has over 2000 indigenous languages but they are under-represented in NLP research due to lack of datasets. In recent years, there have been progress in developing labelled corpora for African languages. However, they are often available in a single domain and may not generalize to other domains. In this paper, we focus on the task of sentiment classification for cross-domain adaptation. We create a new dataset, NollySenti—based on the Nollywood movie reviews for five languages widely spoken in Nigeria (English, Hausa, Igbo, Nigerian-Pidgin, and Yorùbá). We provide an extensive empirical evaluation using classical machine learning methods and pre-trained language models. Leveraging transfer learning, we compare the performance of cross-domain adaptation from Twitter domain, and cross-lingual adaptation from English language. Our evaluation shows that transfer from English in the same target domain leads to more than 5% improvement in accuracy compared to transfer from Twitter in the same language. To further mitigate the domain difference, we leverage machine translation (MT) from English to other Nigerian languages, which leads to a further improvement of 7% over cross-lingual evaluation. While MT to low-resource languages are often of low quality, through human evaluation, we show that most of the translated sentences preserve the sentiment of the original English reviews. 
    more » « less