skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Thursday, February 12 until 1:00 AM ET on Friday, February 13 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Chen, B."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Current PEFT methods for LLMs can achieve either high quality, efficient training, or scalable serving, but not all three simultaneously. To address this limitation, we investigate sparse fine-tuning and observe a remarkable improvement in generalization ability. Utilizing this key insight, we propose a family of \underline{S}tructured \underline{S}parse \underline{F}ine-\underline{T}uning (\textbf{\model}) methods for LLMs, which \textit{concurrently achieve state-of-the-art fine-tuning performance, training efficiency, and inference scalability}. \model \mbox{accomplishes this by ``selecting sparsely and computing densely". It selects a few} heads and channels in the MHA and FFN modules for each Transformer block, respectively. Next, it co-permutes weight matrices on both sides of the coupled structures in LLMs to connect the selected components in each layer into a dense submatrix. Finally, \model performs in-place gradient updates on all submatrices. Through theoretical analysis and empirical results, our method prevents overfitting and forgetting, delivers SOTA performance on both commonsense and arithmetic reasoning with 4.6$$\%$$ and 1.3$$\%$$ average improvements compared to LoRA, and surpasses full FT by 11.5$$\%$$ when generalizing to various domains after instruction tuning. Using our partial backpropagation algorithm, \model saves training memory up to 3$$\times$$ and improves latency by 1.5-2.7$$\times$$ compared to full FT, while delivering an average 10\% improvement over LoRA on both metrics. We further demonstrate that the weight updates in \model can be decoupled into adapters, enabling effective fusion, fast switch, and efficient parallelism for serving multiple fine-tuned models. 
    more » « less
  2. Data literacy is increasingly relevant to everyday life and is a priority for educators across disciplinary boundaries. This study introduces a framework for characterizing data literacy instrucFon along five key dimensions. It then applies this framework to examine instances of data literacy instrucFon like explanaFons of data-related concepts and tasks/quesFons that invite learners to acFvely engage in data-related pracFces in a sample of lessons from a science and social studies high school textbook. By juxtaposing findings from science and social studies contexts, it examines how these disciplinary approaches compare with each other and idenFfies areas where these approaches could expand and build on each other to support more effecFve and holisFc data literacy development. 
    more » « less
  3. Data science is increasingly relevant to daily life and has garnered significant attention in education. While data science education has been traditionally focused on technical training, justice considerations are increasingly brought up given growing concerns over fairness and justice in data science. This paper introduces a framework for justice-oriented data science education that comprises five areas grounded in a broad range of literature. To explore and refine the framework in authentic contexts, we applied it to discourse data from one participatory design workshop with teachers. Analysis demonstrated the presence of this framework’s areas and their rich connections in teachers’ thinking. The framework offers educators a tool to integrate data science, justice issues, and disciplinary content in K-12 classrooms. 
    more » « less