This paper discusses the need for including morphological features in Japanese Universal Dependencies (UD). In the current version (v2.11) of the Japanese UD treebanks, sentences are tokenized at the morpheme level, and almost no morphological feature annotation is used. However, Japanese is not an isolating language that lacks morphological inflection but is an agglutinative language. Given this situation, we introduce a tentative scheme for retokenization and morphological feature annotation for Japanese UD. Then, we measure and compare the morphological complexity of Japanese with other languages to demonstrate that the proposed tokenizations show similarities to synthetic languages reflecting the linguistic typology.
more »
« less
Using Universal Dependencies in cross-linguistic complexity research
We evaluate corpus-based measures of linguistic complexity obtained using Universal Dependencies (UD) treebanks. We propose a method of estimating robustness of the complexity values obtained using a given measure and a given treebank. The results indicate that measures of syntactic complexity might be on average less robust than those of morphological complexity. We also estimate the validity of complexity measures by comparing the results for very similar languages and checking for unexpected differences. We show that some of those differences that arise can be diminished by using parallel treebanks and, more importantly from the practical point of view, by harmonizing the language-specific solutions in the UD annotation.
more »
« less
- Award ID(s):
- 1734260
- PAR ID:
- 10119528
- Date Published:
- Journal Name:
- Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
- Page Range / eLocation ID:
- 8-17
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Uniform Meaning Representation (UMR) is a semantic annotation framework designed to be applicable across typologically diverse languages. However, UMR annotation is a labor-intensive task, requiring significant effort and time especially when no prior annotations are available. In this paper, we present a method for bootstrapping UMR graphs by leveraging Universal Dependencies (UD), one of the most comprehensive multilingual resources, encompassing languages across a wide range of language families. Given UMR’s strong typological and cross-linguistic orientation, UD serves as a particularly suitable starting point for the conversion. We describe and evaluate an approach that automatically derives partial UMR graphs from UD trees, providing annotators with an initial representation to build upon. While UD is not a semantic resource, our method extracts useful structural information that aligns with the UMR formalism, thereby facilitating the annotation process. By leveraging UD’s broad typological coverage, this approach offers a scalable way to support UMR annotation across different languages.more » « less
-
null (Ed.)This paper describes the development of the first Universal Dependencies (UD) treebank for St. Lawrence Island Yupik, an endangered language spoken in the Bering Strait region. While the UD guidelines provided a general framework for our annotations, language-specific decisions were made necessary by the rich morphology of the polysynthetic language. Most notably, we annotated a corpus at the morpheme level as well as the word level. The morpheme level annotation was conducted using an existing morphological analyzer and manual disambiguation. By comparing the two resulting annotation schemes, we argue that morpheme-level annotation is essential for polysynthetic languages like St. Lawrence Island Yupik. Word-level annotation results in degenerate trees for some Yupik sentences and often fails to capture syntactic relations that can be manifested at the morpheme level. Dependency parsing experiments provide further support for morpheme-level annotation. Implications for UD annotation of other polysynthetic languages are discussed.more » « less
-
Workplace environments are characterized by frequent interruptions that can lead to stress. However, measures of stress due to interruptions are typically obtained through self-reports, which can be affected by memory and emotional biases. In this paper, we use a thermal imaging system to obtain objective measures of stress and investigate personality differences in contexts of high and low interruptions. Since a major source of workplace interruptions is email, we studied 63 participants while multitasking in a controlled office environment with two different email contexts: managing email in batch mode or with frequent interruptions. We discovered that people who score high in Neuroticism are significantly more stressed in batching environments than those low in Neuroticism. People who are more stressed finish emails faster. Last, using Linguistic Inquiry Word Count on the email text, we find that higher stressed people in multitasking environments use more anger in their emails. These findings help to disambiguate prior conflicting results on email batching and stress.more » « less
-
Bijker, R; Marín_Lámbarri, DJ; Yépez_Martínez, TC (Ed.)The Cabibbo-Kobayashi-Maskawa quark mixing matrix currently does not satisfy unitarity at the 2σ-level. This could be the result of an inaccurate value of one or both of its largest matrix elementsVusandVud. In the case ofVud, the most precise measurement is obtained from thef t-value measurements of superallowed beta-transitions between 0+states. The accuracy of this determination can, in turn, be tested by extractingVudin other transitions including superallowed transitions between mirror nuclei. The Superallowed Transition Beta-Neutrino Decay Ion Coincidence Trap (St. Benedict) is currently under construction at the Nuclear Science Laboratory of the University of Notre Dame to perform such a determination, with the goal of shedding more light on this tension with unitarity. St. Benedict will take a radioactive ion beam produced byTwinSol, thermalize it in a large volume gas catcher, then transport it in two separate differentially-pumped volumes using a radio-frequency (RF) carpet and a radio-frequency quadrupole (RFQ) ion guide before injecting it in an RFQ trap to create cool ion bunches for injection in the measurement Paul trap. In this paper, we detail the installation of the beam preparation components of St. Benedict, and present the results of the first RIBs successfully stopped and extracted from its gas catcher.more » « less
An official website of the United States government

