This paper examines how the lexicon is organized in a typical South Central language. Items like nouns,
verbs, and adverbial expressions belong to open classes; pronominals, demonstratives, numerals, quantifiers,
interjections, and onomatopoetic words form closed classes. Middle markers, case markers, directionals,
tense/aspect markers, valence-changing elements, verbal classifiers, elaborate expressions, and reduplicative
patterns are treated as bound elements.
more »
« less
What do oranges and hammers have in common? The classifier ‘round’ in Wa’ikhana and other East Tukano languages
East Tukano languages are known for their developed nominal classification systems. Wa’ikhana (Piratapuyo) is in this sense a typical member of the family, since it has an open system with a large number of classes and with class markers which exercise derivational and agreement functions. Among all the Wa’ikhana inanimate classes, the class ‘round’ stands out for its semantic and morphosyntactic features. It is one of the most (if not the most) extensive classes, which includes round objects as well as objects of less prototypical shapes. Its markers in non-plural number have the biggest number of allomorphs, even though allomorphy of classifiers is not typical for this language. Besides, the class ‘round’ has a distinct plural marker, another feature absent from most classifiers. Comparison between Wa’ikhana and other related languages demonstrates that these peculiarities are shared by many East Tukano languages. Thus, the present paper aims to describe the class ‘round’ in Wa’ikhana and other languages of the family, and to show their common features as well as the features that distinguish Wa’ikhana.
more »
« less
- Award ID(s):
- 1664348
- PAR ID:
- 10137303
- Date Published:
- Journal Name:
- Liames
- Volume:
- 19
- Issue:
- 1
- ISSN:
- 2177-7160
- Page Range / eLocation ID:
- 2-24
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We examine the role of referential properties and lexical stipulation in three closely related languages of eastern Indonesia, the Alor-Pantar languages Abui, Kamang, and Teiwa. Our focus is on the continuum along which event properties (e.g. volitionality, affectedness) are highly important at one extreme or play virtually no role at the other. These languages occupy different points along this continuum. In Abui, event semantics play the greatest role, while in Teiwa they play the smallest role (the lexical property animacy being dominant in the formation of verb classes). Kamang occupies an intermediate position. Teiwa has conventionalised the relation between a verb and its class along the lines of animacy so that classes become associated with the animacy value of the objects with which the verbs in a given class typically occur. Paying attention to a lexical property like animacy, in contrast with event properties, has meant greater potential for arbitrary classes to emerge.more » « less
-
In a closed world setting, classifiers are trained on examples from a number of classes and tested with unseen examples belonging to the same set of classes. However, in most real-world scenarios, a trained classifier is likely to come across novel examples that do not belong to any of the known classes. Such examples should ideally be categorized as belonging to an unknown class. The goal of an open set classifier is to anticipate and be ready to handle test examples of classes unseen during training. The classifier should be able to declare that a test example belongs to a class it does not know, and possi- bly, incorporate it into its knowledge as an example of a new class it has encoun- tered. There is some published research in open world image classification, but open set text classification remains mostly un- explored. In this paper, we investigate the suitability of Convolutional Neural Net- works (CNNs) for open set text classifi- cation. We find that CNNs are good fea- ture extractors and hence perform better than existing state-of-the-art open set clas- sifiers in smaller domains, although their open set classification abilities in general still need to be investigated.more » « less
-
null (Ed.)Contemporary machine learning applications often involve classification tasks with many classes. Despite their extensive use, a precise understanding of the statistical properties and behavior of classification algorithms is still missing, especially in modern regimes where the number of classes is rather large. In this paper, we take a step in this direction by providing the first asymptotically precise analysis of linear multiclass classification. Our theoretical analysis allows us to precisely character- ize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors. Specifically, our analysis reveals that the classification accuracy is highly distribution-dependent with different algorithms achieving optimal per- formance for different data distributions and/or training/features sizes. Unlike linear regression/binary classification, the test error in multiclass classification relies on intricate functions of the trained model (e.g., correlation between some of the trained weights) whose asymptotic behavior is difficult to characterize. This challenge is already present in simple classifiers, such as those minimizing a square loss. Our novel theoretical techniques allow us to overcome some of these chal- lenges. The insights gained may pave the way for a precise understanding of other classification algorithms beyond those studied in this paper.more » « less
-
A bstract There has been substantial progress in applying machine learning techniques to classification problems in collider and jet physics. But as these techniques grow in sophistication, they are becoming more sensitive to subtle features of jets that may not be well modeled in simulation. Therefore, relying on simulations for training will lead to sub-optimal performance in data, but the lack of true class labels makes it difficult to train on real data. To address this challenge we introduce a new approach, called Tag N’ Train (TNT), that can be applied to unlabeled data that has two distinct sub-objects. The technique uses a weak classifier for one of the objects to tag signal-rich and background-rich samples. These samples are then used to train a stronger classifier for the other object. We demonstrate the power of this method by applying it to a dijet resonance search. By starting with autoencoders trained directly on data as the weak classifiers, we use TNT to train substantially improved classifiers. We show that Tag N’ Train can be a powerful tool in model-agnostic searches and discuss other potential applications.more » « less