CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
- Award ID(s):
- 1747798
- PAR ID:
- 10581108
- Publisher / Repository:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Date Published:
- Page Range / eLocation ID:
- 4226-4237
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
No document suggestions found
An official website of the United States government

