skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: USING INTERLINEAR GLOSS TEXTS TO IMPROVE LANGUAGE DESCRIPTION
Interlinear-glossed text (IGT) is a method of representing semantic, morphological and phonological information about lexemes along with phrase and clause level translations of connected text. While the Leipzig Glossing Rules (LGR) provide general standards and principles for IGT, we argue here that language-family specific guidelines are necessary to facilitate rapid creation of new interpretable IGT that can be used for language description, typological discovery, and cross-language comparison. Using selected examples of Tibeto-Burman IGTs, we demonstrate how linguists create their own terminology and conventions for representing linguistic phenomena which fall outside the scope of the LGR. To date, there are few, at least within the Sino-Tibetan linguistics community, that have discussed language-family specific IGT conventions, so new annotators lack guidance on IGT creation. This paper examines how typical Tibeto-Burman constructions (e.g., reduplication, verb stem alternation, directionals) are represented in IGT from several South Central Tibeto-Burman languages. We offer some remarks on the purposes of IGT and some principles for new IGT creators.  more » « less
Award ID(s):
1953296
PAR ID:
10383818
Author(s) / Creator(s):
; ;
Editor(s):
Pappuswamy, Umarani
Date Published:
Journal Name:
Indian linguistics
Volume:
82
Issue:
1-2
ISSN:
0378-0759
Page Range / eLocation ID:
1-25
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (Ed.)
    Uniform Meaning Representation (UMR) is a semantic labeling system in the AMR family designed to be uniformly applicable to typologically diverse languages. The UMR labeling system is quite thorough and can be time-consuming to execute, especially if annotators are starting from scratch. In this paper, we focus on methods for bootstrapping UMR annotations for a given language from existing resources, and specifically from typical products of language documentation work, such as lexical databases and interlinear glossed text (IGT). Using Arapaho as our test case, we present and evaluate a bootstrapping process that automatically generates UMR subgraphs from IGT. Additionally, we describe and evaluate a method for bootstrapping valency lexicon entries from lexical databases for both the target language and English. We are able to generate enough basic structure in UMR graphs from the existing Arapaho interlinearized texts to automate UMR labeling to a significant extent. Our method thus has the potential to streamline the process of building meaning representations for new languages without existing large-scale computational resources. 
    more » « less
  2. This paper investigates the homophony/polysemy between a morphological agentive marker and a contrastive focus marker in Sümi, a Tibeto-Burman language of Northeast India. Both are realized by a phrasal suffix -no that attaches to grammatical subjects, but the interpretation of the suffix varies by clause type. The present study examines whether transitive and intransitive subjects in contrastive focus receive any special prosodic marking that is recognizable to native listeners. The study has implications for understanding the development of agentive/focus marking in Sümi, as well as other languages of the Himalayas, and in New Guinea and Australia where similar homophony/polysemy between agentive and focus markers has been found. 
    more » « less
  3. We lay out the conjugation patterns for declarative affirmatives and negatives in Lamkang [lmk], a language of the South Central subgroup of the Tibeto-Burman (a.k.a. Trans-Himalayan) family. As for many languages of this family, conjugation patterns differ according to tense. This includes different patterning with respect to participant prefixes and agreement suffixes as well as stem shape. Lamkang also employs a person hierarchy: with 2nd >1st , 3rd >1st , and 3rd >2nd , an inverse marker t- is used if the verb is in the nonfuture affirmative. The verb template includes tense, negative, and copular auxiliaries which are inflected for agent except when agent is otherwise indicated. For example, in negative conjugations with an inclusive prefix, the expected PATIENT-Stem Auxiliary-AGENT pattern for the paradigm flips to AGENT-Stem Auxiliary-PATIENT. Within the clusive forms a great deal of variation exists for which prefixes are used for inclusive and exclusive. We also see variation in the use of plural markers. All this hints at a highly complex system in a state of flux. 
    more » « less
  4. Interlinear glossing provides a vital type of morphosyntactic annotation, both for linguists and language revitalists, and numerous conventions exist for representing it formally and computationally. Some of these formats are human readable; others are machine readable. Some are easy to edit with general-purpose tools. Few represent non-concatentative processes like infixation, reduplication, mutation, truncation, and tonal overwriting in a consistent and formally rigorous way (on par with affixation). We propose an annotation convention—Generalized Glossing Guidelines (GGG) that combines all of these positive properties using an Item-and-Process (IP) framework. We describe the format, demonstrate its linguistic adequacy, and compare it with two other interlinear glossed text annotation schemes. 
    more » « less
  5. This is an account of the forms and semantic dimensions of spatial relations in Manange (Tibeto-Burman, Tamangic; Nepal), with comparison to sister language Nar-Phu. Topological relations (“IN/ON/AT/ NEAR”) in these languages are encoded by locative enclitics and also by a set of noun-like objects termed as “locational nouns.” In Manange, the general locative enclitic is more frequently encountered for a wide range of topological relations, while in Nar-Phu, the opposite pattern is observed, i.e. more frequent use of locational nouns. While the linguistic frame of reference system encoded in these forms is primarily relative (i.e. oriented on the speaker’s own viewing perspective), a more extrinsic/absolute system emerges with certain verbs of motion in these languages, with verbs like “come,” “go,” and certain verbs of placement or posture orienting to arbitrary fixed bearings such as slope. This account also provides some examples of cultural or metaphorical extensions of spatial forms as they are encountered in connected speech. 
    more » « less