Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation

Kim, Jinseok; Kim, Jenna; Kim, Jinmo

doi:10.1177/01655515211018171

Citation Details

Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation

Chinese author names are known to be more difficult to disambiguate than other ethnic names because they tend to share surnames and forenames, thus creating many homonyms. In this study, we demonstrate how using Chinese characters can affect machine learning for author name disambiguation. For analysis, 15K author names recorded in Chinese are transliterated into English and simplified by initialising their forenames to create counterfactual scenarios, reflecting real-world indexing practices in which Chinese characters are usually unavailable. The results show that Chinese author names that are highly ambiguous in English or with initialised forenames tend to become less confusing if their Chinese characters are included in the processing. Our findings indicate that recording Chinese author names in native script can help researchers and digital libraries enhance authority control of Chinese author names that continue to increase in size in bibliographic data. more »

Award ID(s):: 1917663

PAR ID:: 10292508

Author(s) / Creator(s):: Kim, Jinseok; Kim, Jenna; Kim, Jinmo

Date Published:: 2021-01-01

Journal Name:: Journal of Information Science

ISSN:: 0165-5515

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1177/01655515211018171

More Like this