skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Using Universal Dependencies in cross-linguistic complexity research
We evaluate corpus-based measures of linguistic complexity obtained using Universal Dependencies (UD) treebanks. We propose a method of estimating robustness of the complexity values obtained using a given measure and a given treebank. The results indicate that measures of syntactic complexity might be on average less robust than those of morphological complexity. We also estimate the validity of complexity measures by comparing the results for very similar languages and checking for unexpected differences. We show that some of those differences that arise can be diminished by using parallel treebanks and, more importantly from the practical point of view, by harmonizing the language-specific solutions in the UD annotation.  more » « less
Award ID(s):
1734260
PAR ID:
10119528
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
Page Range / eLocation ID:
8-17
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper discusses the need for including morphological features in Japanese Universal Dependencies (UD). In the current version (v2.11) of the Japanese UD treebanks, sentences are tokenized at the morpheme level, and almost no morphological feature annotation is used. However, Japanese is not an isolating language that lacks morphological inflection but is an agglutinative language. Given this situation, we introduce a tentative scheme for retokenization and morphological feature annotation for Japanese UD. Then, we measure and compare the morphological complexity of Japanese with other languages to demonstrate that the proposed tokenizations show similarities to synthetic languages reflecting the linguistic typology. 
    more » « less
  2. A rapid fatigue characterization method using full-field temporal surface temperature measurements has been used to study the effect of microstructural modification in unidirectional carbon fiber reinforced plastics (UD- CFRP) via electrically aligned Z-threaded carbon nanofibers (CNF). 1 wt% CNF were aligned in the Z-direction via electric means using a patented roll-to-roll process, enabling ZT-CNF-CFRP prepreg production. Three conf igurations were tested under fatigue: ZT-CNF-UD-CFRP (ZTE), UD-CFRPs with Unaligned CNF, and UD-CFRPs without CNF (Control). Mean surface temperatures measured via passive infrared thermography (IRT) was used to estimate the fatigue limit for these materials using a staircase loading method. Further, harmonic analysis of the obtained temporal full-field temperature data was used to monitor the damage evolution. Finally, the fatigue limit was also determined using the residual threshold method based on the second harmonic signal. Fatigue limits obtained for the three configurations via the bi-linear method were 62.36 ± 0.42 % σ 64.7 ± 1.83 % σ uts for Unaligned and 49.29 ± 2.47 % σ uts uts for ZTE, for Control. While the presence of 1 wt% CNF improves the fatigue limit; the effect of Z-threading could not be accurately quantified since the Z-threading manufacturing process was found to increase the matrix content of the composite. CNF Z-threads increased thermal conductivity, enabling better in situ damage monitoring. Different failure modes were found and discussed to understand the roles of CNF in the fatigue behavior of UD-CFRP laminates. 
    more » « less
  3. Uniform Meaning Representation (UMR) is a semantic annotation framework designed to be applicable across typologically diverse languages. However, UMR annotation is a labor-intensive task, requiring significant effort and time especially when no prior annotations are available. In this paper, we present a method for bootstrapping UMR graphs by leveraging Universal Dependencies (UD), one of the most comprehensive multilingual resources, encompassing languages across a wide range of language families. Given UMR’s strong typological and cross-linguistic orientation, UD serves as a particularly suitable starting point for the conversion. We describe and evaluate an approach that automatically derives partial UMR graphs from UD trees, providing annotators with an initial representation to build upon. While UD is not a semantic resource, our method extracts useful structural information that aligns with the UMR formalism, thereby facilitating the annotation process. By leveraging UD’s broad typological coverage, this approach offers a scalable way to support UMR annotation across different languages. 
    more » « less
  4. Workplace environments are characterized by frequent interruptions that can lead to stress. However, measures of stress due to interruptions are typically obtained through self-reports, which can be affected by memory and emotional biases. In this paper, we use a thermal imaging system to obtain objective measures of stress and investigate personality differences in contexts of high and low interruptions. Since a major source of workplace interruptions is email, we studied 63 participants while multitasking in a controlled office environment with two different email contexts: managing email in batch mode or with frequent interruptions. We discovered that people who score high in Neuroticism are significantly more stressed in batching environments than those low in Neuroticism. People who are more stressed finish emails faster. Last, using Linguistic Inquiry Word Count on the email text, we find that higher stressed people in multitasking environments use more anger in their emails. These findings help to disambiguate prior conflicting results on email batching and stress. 
    more » « less
  5. null (Ed.)
    This paper describes the development of the first Universal Dependencies (UD) treebank for St. Lawrence Island Yupik, an endangered language spoken in the Bering Strait region. While the UD guidelines provided a general framework for our annotations, language-specific decisions were made necessary by the rich morphology of the polysynthetic language. Most notably, we annotated a corpus at the morpheme level as well as the word level. The morpheme level annotation was conducted using an existing morphological analyzer and manual disambiguation. By comparing the two resulting annotation schemes, we argue that morpheme-level annotation is essential for polysynthetic languages like St. Lawrence Island Yupik. Word-level annotation results in degenerate trees for some Yupik sentences and often fails to capture syntactic relations that can be manifested at the morpheme level. Dependency parsing experiments provide further support for morpheme-level annotation. Implications for UD annotation of other polysynthetic languages are discussed. 
    more » « less