skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 6, 2026

Title: Kuene: A Web Platform for Facilitating Hawaiian Word Neologism
This paper presents Kuene, a web-based collaborative dictionary editing platform designed to facilitate the creation and publication of Hawaiian neologisms by the Hawaiian Lexicon Committee. Through Kuene, the Committee can create, edit, and refine new dictionary entries with a multi-round approval process, ensuring accuracy and consistency. The platform's technical features enable flexible access control, fine-grained approval states, and support for multimedia content and AI-assisted orthography modernization. Just in the past two months, Kuene has enabled the publication of over 400 new Hawaiian words. By streamlining the dictionary editing process, Kuene aims to alleviate the scarcity of modern Hawaiian words and fa- cilitate the revitalization efforts of the Hawaiian language.  more » « less
Award ID(s):
2422413
PAR ID:
10633433
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Association for Computational Linguistics
Date Published:
Format(s):
Medium: X
Location:
https://aclanthology.org/2025.computel-main.21/
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We describe a bundle for UCSF ChimeraX called SEQCROW that provides advanced structure editing capabilities and quantum chemistry utilities designed for complex organic and organometallic compounds. SEQCROW includes graphical presets and bond editing tools that facilitate the generation of publication‐quality molecular structure figures while also allowing users to build molecular structures quickly and efficiently by mapping new ligands onto existing organometallic complexes as well as adding rings and substituents. Other capabilities include the ability to visualize vibrational modes and simulated IR spectra, to compute and visualize molecular descriptors including percent buried volume, ligand cone angles, and Sterimol parameters, to process thermochemical corrections from quantum mechanical computations, to generate input files for ORCA, Psi4, and Gaussian, and to run and manage computational jobs. 
    more » « less
  2. We provide the first large-scale data collection of real-world approval-based committee elections. These elections have been conducted on the Polkadot blockchain as part of their Nominated Proof-of-Stake mechanism and contain around one thousand candidates and tens of thousands of (weighted) voters each. We conduct an in-depth study of application-relevant questions, including a quantitative and qualitative analysis of the outcomes returned by different voting rules. Besides considering proportionality measures that are standard in the multiwinner voting literature, we pay particular attention to less-studied measures of overrepresentation, as these are closely related to the security of the Polkadot network. We also analyze how different design decisions such as the committee size affect the examined measures. 
    more » « less
  3. Pool, Robert (Ed.)
    On October 10-11, 2023, the National Academies of Sciences, Engineering, and Medicine hosted the U.S. Research Data Summit at the National Academy of Sciences Building in Washington, DC. The summit was undertaken by a planning committee organized under the U.S. National Committee for CODATA. The summit was informed by input from 29 organizations, including leaders from federal government agencies, the private sector, public and nonprofit organizations, and research institutions. This publication summarizes the presentations and discussion of the summit. 
    more » « less
  4. Task-oriented dialogue research has mainly focused on a few popular languages like English and Chinese, due to the high dataset creation cost for a new language. To reduce the cost, we apply manual editing to automatically translated data. We create a new multilingual benchmark, X-RiSAWOZ, by translating the Chinese RiSAWOZ to 4 languages: English, French, Hindi, Korean; and a code-mixed English- Hindi language. X-RiSAWOZ has more than 18,000 human-verified dialogue utterances for each language, and unlike most multilingual prior work, is an end-to-end dataset for building fully-functioning agents. The many difficulties we encountered in creating X-RiSAWOZ led us to develop a toolset to accelerate the post-editing of a new language dataset after translation. This toolset improves machine translation with a hybrid entity alignment technique that combines neural with dictionary-based methods, along with many automated and semi-automated validation checks. We establish strong baselines for X-RiSAWOZ by training dialogue agents in the zero- and few-shot settings where limited gold data is available in the target language. Our results suggest that our translation and post-editing methodology and toolset can be used to create new high-quality multilingual dialogue agents cost-effectively. Our dataset, 
    more » « less
  5. Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora. Since PLMs capture word semantics in different contexts, the quality of word representations highly depends on word frequency, which usually follows a heavy-tailed distributions in the pre-training corpus. Therefore, the embeddings of rare words on the tail are usually poorly optimized. In this work, we focus on enhancing language model pre-training by leveraging definitions of the rare words in dictionaries (e.g., Wiktionary). To incorporate a rare word definition as a part of input, we fetch its definition from the dictionary and append it to the end of the input text sequence. In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary. We evaluate the proposed Dict-BERT model on the language understanding benchmark GLUE and eight specialized domain benchmark datasets. Extensive experiments demonstrate that Dict-BERT can significantly improve the understanding of rare words and boost model performance on various NLP downstream tasks. 
    more » « less