NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Digital Documentation for Diasporic Data: challenges, opportunities, and solutions for working with Diaspora Communities

Taguchi, Chihiro; Liebl, J Elizabeth; Anastasopoulos, Antonios; Chiang, David; Walther, Géraldine (May 2025, 9th International Conference on Language Documentation & Conservation (ICLDC))

Language documentation involving diaspora communities presents a combination of challenges and opportunities for approaches leveraging large data collection and assisted transcription and annotation. Demonstrating our projects on Kichwa and Mapudungun, we will present a suite of our computational tools designed to effectively work with diaspora communities for language documentation.
more » « less
Free, publicly-accessible full text available May 3, 2026
Improving Rare Word Translation With Dictionaries and Attention Masking

Sible, Kenneth J; Chiang, David (September 2024, Association for Machine Translation in the Americas)

Full Text Available
Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn’t

Taguchi, Chihiro; Chiang, David (August 2024, ACL Anthology)

Full Text Available
Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information

Taguchi, Chihiro; Saransig, Jefferson; Velásquez, Dayana; Chiang, David (May 2024, ACL Anthology)

Full Text Available
DialectBench: An NLP Benchmark for Dialects, Varieties, and Closely-Related Languages

Faisal, Fahim; Ahia, Orevaoghene; Srivastava, Aarohi; Ahuja, Kabir; Chiang, David; Tsvetkov, Yulia; Anastasopoulos, Antonios (July 2024, ACL)

Full Text Available
DIALECTBENCH: An NLP Benchmark for Dialects, Varieties, and Closely-Related Languages

https://doi.org/10.18653/v1/2024.acl-long.777

Faisal, Fahim; Ahia, Orevaoghene; Srivastava, Aarohi; Ahuja, Kabir; Chiang, David; Tsvetkov, Yulia; Anastasopoulos, Antonios (January 2024, Association for Computational Linguistics)

Full Text Available
Exact Recursive Probabilistic Programming

https://doi.org/10.1145/3586050

Chiang, David; McDonald, Colin; Shan, Chung-chieh (April 2023, Proceedings of the ACM on Programming Languages)

Recursive calls over recursive data are useful for generating probability distributions, and probabilistic programming allows computations over these distributions to be expressed in a modular and intuitive way. Exact inference is also useful, but unfortunately, existing probabilistic programming languages do not perform exact inference on recursive calls over recursive data, forcing programmers to code many applications manually. We introduce a probabilistic language in which a wide variety of recursion can be expressed naturally, and inference carried out exactly. For instance, probabilistic pushdown automata and their generalizations are easy to express, and polynomial-time parsing algorithms for them are derived automatically. We eliminate recursive data types using program transformations related to defunctionalization and refunctionalization. These transformations are assured correct by a linear type system, and a successful choice of transformations, if there is one, is guaranteed to be found by a greedy algorithm.
more » « less
Full Text Available
BERTwich: Extending BERT’s Capabilities to Model Dialectal and Noisy Text

https://doi.org/10.18653/v1/2023.findings-emnlp.1037

Srivastava, Aarohi; Chiang, David (January 2023, Association for Computational Linguistics)

Full Text Available
Learning Hyperedge Replacement Grammars for Graph Generation

https://doi.org/10.1109/TPAMI.2018.2810877

Aguinaga, Salvador; Chiang, David; Weninger, Tim (March 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence)

Full Text Available
Neural Machine Translation of Text from Non-Native Speakers

Anastasopoulos, Antonios; Lui, Alison; Nguyen, Toan Q.; Chiang, David (June 2019, Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics)

Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.0 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.
more » « less
Full Text Available

« Prev Next »

Search for: All records