skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The DOI auto-population feature in the Public Access Repository (PAR) will be unavailable from 4:00 PM ET on Tuesday, July 8 until 4:00 PM ET on Wednesday, July 9 due to scheduled maintenance. We apologize for the inconvenience caused.


Title: Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation
Chinese author names are known to be more difficult to disambiguate than other ethnic names because they tend to share surnames and forenames, thus creating many homonyms. In this study, we demonstrate how using Chinese characters can affect machine learning for author name disambiguation. For analysis, 15K author names recorded in Chinese are transliterated into English and simplified by initialising their forenames to create counterfactual scenarios, reflecting real-world indexing practices in which Chinese characters are usually unavailable. The results show that Chinese author names that are highly ambiguous in English or with initialised forenames tend to become less confusing if their Chinese characters are included in the processing. Our findings indicate that recording Chinese author names in native script can help researchers and digital libraries enhance authority control of Chinese author names that continue to increase in size in bibliographic data.  more » « less
Award ID(s):
1917663
PAR ID:
10292508
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Information Science
ISSN:
0165-5515
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    How related is skin to a quilt or door to worry? Here, we show that linguistic experience strongly informs people’s judgments of such word pairs. We asked Chinese-speakers, English-speakers, and Chinese-English bilinguals to rate semantic and visual similarity between pairs of Chinese words and of their English translation equivalents. Some pairs were unrelated, others were also unrelated but shared a radical (e.g., “expert” and “dolphin” share the radical meaning “pig”), others also shared a radical which invokes a metaphorical relationship. For example, a quilt covers the body like skin; understand, with a sun radical, invokes understanding as illumination. Importantly, the shared radicals are not part of the pronounced word form. Chinese speakers rated word pairs with metaphorical connections as more similar than other pairs. English speakers did not even though they were sensitive to shared radicals. Chinese-English bilinguals showed sensitivity to the metaphorical connections even when tested with English words. 
    more » « less
  2. Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language. Moreover, there is a dearth of annotated corpora for investigating the cross-linguistic variation of adposition semantics, or for building multilingual disambiguation systems. This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese; to the best of our knowledge, this is the first Chinese corpus to be broadly annotated with adposition semantics. Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria, though its development focused primarily on English prepositions (Schneider et al., 2018). We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English. On a Mandarin translation of The Little Prince, we achieve high inter-annotator agreement and analyze semantic correspondences of adposition tokens in bitext. 
    more » « less
  3. Abstract Previous work has shown that English native speakers interpret sentences as predicted by a noisy‐channel model: They integrate both the real‐world plausibility of the meaning—the prior—and the likelihood that the intended sentence may be corrupted into the perceived sentence. In this study, we test the noisy‐channel model in Mandarin Chinese, a language taxonomically different from English. We present native Mandarin speakers sentences in a written modality (Experiment 1) and an auditory modality (Experiment 2) in three pairs of syntactic alternations. The critical materials are literally implausible but require differing numbers and types of edits in order to form more plausible sentences. Each sentence is followed by a comprehension question that allows us to infer whether the speakers interpreted the item literally, or made an inference toward a more likely meaning. Similar to previous research on related English constructions, Mandarin participants made the most inferences for implausible materials that could be inferred as plausible by deleting a single morpheme or inserting a single morpheme. Participants were less likely to infer a plausible meaning for materials that could be inferred as plausible by making an exchange across a preposition. And participants were least likely to infer a plausible meaning for materials that could be inferred as plausible by making an exchange across a main verb. Moreover, we found more inferences in written materials than spoken materials, possibly a result of a lack of word boundaries in written Chinese. Overall, the fact that the results were so similar to those found in related constructions in English suggests that the noisy‐channel proposal is robust. 
    more » « less
  4. Because Chinese reading and writing systems are not phonetic, Mandarin Chinese learners must construct six-way mental connections in order to learn new words, linking characters, meanings, and sounds. Little research has focused on the difficulties inherent to each specific component involved in this process, especially within digital learning environments. The present work examines Chinese word acquisition within ASSISTments, an online learning platform traditionally known for mathematics education. Students were randomly assigned to one of three conditions in which researchers manipulated a learning assignment to exclude one of three bi-directional connections thought to be required for Chinese language acquisition (i.e., sound-meaning and meaning-sound). Researchers then examined whether students’ performance differed significantly when the learning assignment lacked sound-character, character-meaning, or meaning-sound connection pairs, and whether certain problem types were more difficult for students than others. Assessment of problems by component type (i.e., characters, meanings, and sounds) revealed support for the relative ease of problems that provided sounds, with students exhibiting higher accuracy with fewer attempts and less need for system feedback when sounds were included. However, analysis revealed no significant differences in word acquisition by condition, as evidenced by next-day post-test scores or pre- to post-test gain scores. Implications and suggestions for future work are discussed. 
    more » « less
  5. null (Ed.)
    This study offers the first investigation on the normative processes through which Chinese form impressions of others in social interaction. Using affect control theory and its archived sentiment data from China, I estimate the Chinese impression formation models with a new Bayesian method. I then compare the Chinese models to the impression formation dynamics in U.S. English. Results show cross-cultural commonality in the affective processing of cultural concepts, with determinants of impression formation processes being largely universal. Findings also reveal two cultural variations that align with patterns uncovered by comparative cross-cultural research: 1) the Chinese models show less rigidity in the definition of situation; and 2) across two cultural models, the balance term has opposite effects on actor and behavior evaluation. To explore the implications of the impression models, I present a series of simulations, illustrating the predictive power of affect control theory as well as the impact of different cultural rules on social interaction. 
    more » « less