skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: USING INTERLINEAR GLOSS TEXTS TO IMPROVE LANGUAGE DESCRIPTION
Interlinear-glossed text (IGT) is a method of representing semantic, morphological and phonological information about lexemes along with phrase and clause level translations of connected text. While the Leipzig Glossing Rules (LGR) provide general standards and principles for IGT, we argue here that language-family specific guidelines are necessary to facilitate rapid creation of new interpretable IGT that can be used for language description, typological discovery, and cross-language comparison. Using selected examples of Tibeto-Burman IGTs, we demonstrate how linguists create their own terminology and conventions for representing linguistic phenomena which fall outside the scope of the LGR. To date, there are few, at least within the Sino-Tibetan linguistics community, that have discussed language-family specific IGT conventions, so new annotators lack guidance on IGT creation. This paper examines how typical Tibeto-Burman constructions (e.g., reduplication, verb stem alternation, directionals) are represented in IGT from several South Central Tibeto-Burman languages. We offer some remarks on the purposes of IGT and some principles for new IGT creators.  more » « less
Award ID(s):
1953296
PAR ID:
10383818
Author(s) / Creator(s):
; ;
Editor(s):
Pappuswamy, Umarani
Date Published:
Journal Name:
Indian linguistics
Volume:
82
Issue:
1-2
ISSN:
0378-0759
Page Range / eLocation ID:
1-25
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Interlinear glossed text (IGT) is a popular format in language documentation projects, where each morpheme is labeled with a descriptive annotation. Automating the creation of interlinear glossed text would be desirable to reduce annotator effort and maintain consistency across annotated corpora. Prior research has explored a number of statistical and neural methods for automatically producing IGT. As large language models (LLMs) have showed promising results across multilingual tasks, even for rare, endangered languages, it is natural to wonder whether they can be utilized for the task of generating IGT. We explore whether LLMs can be effective at the task of interlinear glossing with in-context learning, without any traditional training. We propose new approaches for selecting examples to provide in-context, observing that targeted selection can significantly improve performance. We find that LLM-based methods beat standard transformer baselines, despite requiring no training at all. These approaches still underperform state-of-the-art supervised systems for the task, but are highly practical for researchers outside of the NLP community, requiring minimal effort to use. 
    more » « less
  2. Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (Ed.)
    Uniform Meaning Representation (UMR) is a semantic labeling system in the AMR family designed to be uniformly applicable to typologically diverse languages. The UMR labeling system is quite thorough and can be time-consuming to execute, especially if annotators are starting from scratch. In this paper, we focus on methods for bootstrapping UMR annotations for a given language from existing resources, and specifically from typical products of language documentation work, such as lexical databases and interlinear glossed text (IGT). Using Arapaho as our test case, we present and evaluate a bootstrapping process that automatically generates UMR subgraphs from IGT. Additionally, we describe and evaluate a method for bootstrapping valency lexicon entries from lexical databases for both the target language and English. We are able to generate enough basic structure in UMR graphs from the existing Arapaho interlinearized texts to automate UMR labeling to a significant extent. Our method thus has the potential to streamline the process of building meaning representations for new languages without existing large-scale computational resources. 
    more » « less
  3. This paper investigates the homophony/polysemy between a morphological agentive marker and a contrastive focus marker in Sümi, a Tibeto-Burman language of Northeast India. Both are realized by a phrasal suffix -no that attaches to grammatical subjects, but the interpretation of the suffix varies by clause type. The present study examines whether transitive and intransitive subjects in contrastive focus receive any special prosodic marking that is recognizable to native listeners. The study has implications for understanding the development of agentive/focus marking in Sümi, as well as other languages of the Himalayas, and in New Guinea and Australia where similar homophony/polysemy between agentive and focus markers has been found. 
    more » « less
  4. We lay out the conjugation patterns for declarative affirmatives and negatives in Lamkang [lmk], a language of the South Central subgroup of the Tibeto-Burman (a.k.a. Trans-Himalayan) family. As for many languages of this family, conjugation patterns differ according to tense. This includes different patterning with respect to participant prefixes and agreement suffixes as well as stem shape. Lamkang also employs a person hierarchy: with 2nd >1st , 3rd >1st , and 3rd >2nd , an inverse marker t- is used if the verb is in the nonfuture affirmative. The verb template includes tense, negative, and copular auxiliaries which are inflected for agent except when agent is otherwise indicated. For example, in negative conjugations with an inclusive prefix, the expected PATIENT-Stem Auxiliary-AGENT pattern for the paradigm flips to AGENT-Stem Auxiliary-PATIENT. Within the clusive forms a great deal of variation exists for which prefixes are used for inclusive and exclusive. We also see variation in the use of plural markers. All this hints at a highly complex system in a state of flux. 
    more » « less
  5. Abstract An infrequently attested pattern of isomorphism between markers of valence-affecting constructions involves overlap between causatives and applicatives, with a transitivizing function, and middles, which are detransitivizing. Such synchronic overlap must be due to some feature of the development of the constructions; however, so far there has been little detailed argumentation for directionality in such cases. We consider an instance of this pattern in the South Central Tibeto-Burman languages. The representation of multiple stages of the phenomena allows us to posit directionality and motivations for the evolutionary pathways involved. Here, the source morphology developed into a comitative applicative; in some languages causative meanings emerged, and in others, the construction developed an array of middle meanings via a portative applicative bridging context. The study provides new evidence for how patterns may arise in which the same morphology marks both causative or applicative constructions, on the one hand, and middles, on the other. 
    more » « less