NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping

Li, Ryan; Zhang, Yanzhe; Yang, Diyi (April 2025, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers))

Free, publicly-accessible full text available April 1, 2026
Design2Code: Benchmarking Multimodal Code Generation for Automated Front-End Engineering

Si, Chenglei; Zhang, Yanzhe; Li, Ryan; Yang, Zhengyuan; Liu, Ruibo; Yang, Diyi (April 2025, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers))

Free, publicly-accessible full text available April 1, 2026
Aligning Language Models with Demonstrated Feedback

Shaikh, Omar; Lam, Michelle S; Hejna, Joey; Shao, Yijia; Cho, Hyundong; Bernstein, Michael S; Yang, Diyi (April 2025, International Conference on Learning Representations (ICLR 2025))

Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (< 10) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. Concretely, DITTO operates by having an LLM generate examples that are presumed to be inferior to expert demonstrations. The method iteratively constructs pairwise preference relationships between these LLM-generated samples and expert demonstrations, potentially including comparisons between different training checkpoints. These constructed preference pairs are then used to train the model using a preference optimization algorithm (e.g. DPO). We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants (N = 16). Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an avg. of 19% points. By using demonstrations as feedback directly, DITTO offers a novel method for effective customization of LLMs.
more » « less
Free, publicly-accessible full text available April 25, 2026
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

Zhang, Zhehao; Chen, Jiaao; Yang, Diyi (December 2024, Annual Conference on Neural Information Processing Systems)

Full Text Available
Benchmarking Machine Translation with Cultural Awareness

https://doi.org/10.18653/v1/2024.findings-emnlp.765

Yao, Binwei; Jiang, Ming; Bobinac, Tara; Yang, Diyi; Hu, Junjie (November 2024, Association for Computational Linguistics)

Translating culture-related content is vital for effective cross-cultural communication. However, many culture-specific items (CSIs) often lack viable translations across languages, making it challenging to collect high-quality, diverse parallel corpora with CSI annotations. This difficulty hinders the analysis of cultural awareness of machine translation (MT) systems, including traditional neural MT and the emerging MT paradigm using large language models (LLM). To address this gap, we introduce a novel parallel corpus, enriched with CSI annotations in 6 language pairs for investigating Culturally-Aware Machine Translation---CAMT. Furthermore, we design two evaluation metrics to assess CSI translations, focusing on their pragmatic translation quality. Our findings show the superior ability of LLMs over neural MTs in leveraging external cultural knowledge for translating CSIs, especially those lacking translations in the target culture.
more » « less
Full Text Available
Culturebank: An online community-driven knowledge base towards culturally aware language technologies

Shi, Weiyan; Li, Ryan; Zhang, Yutong; Ziems, Caleb; Horesh, Raya; Abreu_de_Paula, Rogério; Yang, Diyi (November 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing)

Full Text Available
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

Yang, John; Jimenez, Carlos E; Zhang, Alex L; Lieret, Kilian; Yang, Joyce; Wu, Xindi; Press, Ori; Muennighoff, Niklas; Synnaeve, Gabriel; Narasimhan, Karthik R; et al (January 2025, International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available January 22, 2026
Simulated Misinformation Susceptibility (SMISTS): Enhancing Misinformation Research with Large Language Model Simulations

https://doi.org/10.18653/v1/2024.findings-acl.162

Ma, Weicheng; Deng, Chunyuan; Moossavi, Aram; Wang, Lili; Vosoughi, Soroush; Yang, Diyi (August 2024, Association for Computational Linguistics)

Full Text Available
Social Intelligence Data Infrastructure: Structuring the Present and Navigating the Future

https://doi.org/10.18653/v1/2024.findings-acl.163

Li, Minzhi; Shi, Weiyan; Ziems, Caleb; Yang, Diyi (January 2024, Association for Computational Linguistics)

Full Text Available
What Makes Digital Support Effective? How Therapeutic Skills Affect Clinical Well-Being

https://doi.org/10.1145/3641029

Yang, Wenjie; Fang, Anna; Shah, Raj Sanjay; Mathur, Yash; Yang, Diyi; Zhu, Haiyi; Kraut, Robert E (April 2024, Proceedings of the ACM on Human-Computer Interaction)

Online mental health support communities, in which volunteer counselors provide accessible mental and emotional health support, have grown in recent years. Despite millions of people using these platforms, the clinical effectiveness of these communities on mental health symptoms remains unknown. Although volunteers receive some training on the therapeutic skills proven effective in face-to-face environments, such as active listening and motivational interviewing, it is unclear how the usage of these skills in an online context affects people's mental health. In our work, we collaborate with one of the largest online peer support platforms and use both natural language processing and machine learning techniques to examine how one-on-one support chats on the platform affect clients' depression and anxiety symptoms. We measure how characteristics of support-providers, such as their experience on the platform and use of therapeutic skills (e.g. affirmation, showing empathy), affect support-seekers' mental health changes. Based on a propensity-score matching analysis to approximate a random-assignment experiment, results shows that online peer support chats improve both depression and anxiety symptoms with a statistically significant but relatively small effect size. Additionally, support providers' techniques such as emphasizing the autonomy of the client lead to better mental health outcomes. However, we also found that the use of some behaviors, such as persuading and providing information, are associated with worsening of mental health symptoms. Our work provides key understanding for mental health care in the online setting and designing training systems for online support providers.
more » « less
Full Text Available

« Prev Next »

Search for: All records