NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Large Language Models as a Normalizer for Transliteration and Dialectal Translation

Alam, Md_Mahfuz_Ibn; Anastasopoulos, Antonios (January 2025, Association for Computational Linguistics)

Full Text Available
Testing the Boundaries of LLMs: Dialectal and Language-Variety Tasks

Faisal, Fahim; Anastasopoulos, Antonios (January 2025, Association for Computational Linguistics)

Full Text Available
Dialect Normalization using Large Language Models and Morphological Rules

https://doi.org/10.18653/v1/2025.findings-acl.1215

Dimakis, Antonios; Pavlopoulos, John; Anastasopoulos, Antonios (January 2025, Association for Computational Linguistics)

Full Text Available
An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models

https://doi.org/10.18653/v1/2024.mrl-1.4

Faisal, Fahim; Anastasopoulos, Antonios (November 2024, Association for Computational Linguistics)

Full Text Available
Speech Recognition for Greek Dialects: A Challenging Benchmark

https://doi.org/10.21437/Interspeech.2024-2443

Vakirtzian, Socrates; Tsoukala, Chara; Bompolas, Stavros; Mouzou, Katerina; Stamou, Vivian; Paraskevopoulos, Georgios; Dimakis, Antonios; Markantonatou, Stella; Ralli, Angela; Anastasopoulos, Antonios (September 2024, ISCA)

Full Text Available
FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN

https://doi.org/10.18653/v1/2024.iwslt-1.1

Ahmad, Ibrahim Said; Anastasopoulos, Antonios; Bojar, Ondřej; Borg, Claudia; Carpuat, Marine; Cattoni, Roldano; Cettolo, Mauro; Chen, William; Dong, Qianqian; Federico, Marcello; et al (August 2024, Association for Computational Linguistics)

Full Text Available
Extracting Lexical Features from Dialects via Interpretable Dialect Classifiers

https://doi.org/10.18653/v1/2024.naacl-short.5

Xie, Roy; Ahia, Orevaoghene; Tsvetkov, Yulia; Anastasopoulos, Antonios (June 2024, Association for Computational Linguistics)

Full Text Available
Language and Speech Technology for Central Kurdish Varieties

Ahmadi, Sina; Jaff, Daban; Ibn_Alam, Md_Mahfuz; Anastasopoulos, Antonios (May 2024, ELRA and ICCL)

Kurdish, an Indo-European language spoken by over 30 million speakers, is considered a dialect continuum and known for its diversity in language varieties. Previous studies addressing language and speech technology for Kurdish handle it in a monolithic way as a macro-language, resulting in disparities for dialects and varieties for which there are few resources and tools available. In this paper, we take a step towards developing resources for language and speech technology for varieties of Central Kurdish, creating a corpus by transcribing movies and TV series as an alternative to fieldwork. Additionally, we report the performance of machine translation, automatic speech recognition, and language identification as downstream tasks evaluated on Central Kurdish subdialects. Data and models are publicly available under an open license at https://github.com/sinaahmadi/CORDI.
more » « less
Full Text Available
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization

https://doi.org/10.1109/ICASSP48485.2024.10446102

Hussein, Amir; Yan, Brian; Anastasopoulos, Antonios; Watanabe, Shinji; Khudanpur, Sanjeev (April 2024, IEEE)

Full Text Available
CODET: A Benchmark for Contrastive Dialectal Evaluation of Machine Translation

Ibn_Alam, Md_Mahfuz; Ahmadi, Sina; Anastasopoulos, Antonios (March 2024, Association for Computational Linguistics)

Neural machine translation (NMT) systems exhibit limited robustness in handling source-side linguistic variations. Their performance tends to degrade when faced with even slight deviations in language usage, such as different domains or variations introduced by second-language speakers. It is intuitive to extend this observation to encompass dialectal variations as well, but the work allowing the community to evaluate MT systems on this dimension is limited. To alleviate this issue, we compile and release CODET, a contrastive dialectal benchmark encompassing 891 different variations from twelve different languages. We also quantitatively demonstrate the challenges large MT models face in effectively translating dialectal variants. All the data and code have been released.
more » « less
Full Text Available

« Prev Next »

Search for: All records