For many years now, modern software is known to be developed in multiple languages (hence termed asmultilingualormulti-languagesoftware). Yet, to date, we still only have very limited knowledge about how multilingual software systems are constructed. For instance, it is not yet really clear how different languages are used, selected together, and why they have been so in multilingual software development. Given the fact that using multiple languages in a single software project has become a norm, understanding language use and selection (i.e.,language profile) as a basic element of themultilingual constructionin contemporary software engineering is an essential first step. In this article, we set out to fill this gap with a large-scale characterization study on language use and selection in open-source multilingual software. We start with presentingan updated overviewof language use in 7,113 GitHub projects spanning the 5 past years by characterizing overall statistics of language profiles, followed bya deeper lookinto the functionality relevance/justification of language selection in these projects through association rule mining. We proceed with an evolutionary characterization of 1,000 GitHub projects for each of the 10 past years to providea longitudinal viewof how language use and selection have changed over the years, as well as how the association between functionality and language selection has been evolving. Among many other findings, our study revealed a growing trend of using three to five languages in one multilingual software project and the noticeable stableness of top language selections. We found a non-trivial association between language selection and certain functionality domains, which was less stable than that with individual languages over time. In a historical context, we also have observed major shifts in these characteristics of multilingual systems both in contrast to earlier peer studies and along the evolutionary timeline. Our findings offer essential knowledge on the multilingual construction in modern software development. Based on our results, we also provide insights and actionable suggestions for both researchers and developers of multilingual systems.
more »
« less
Does Mobility Drive Language Use? A Dual‐Spatialization Perspective
ABSTRACT Multilingualism refers to a phenomenon where individuals routinely use three or more languages. Spatial processes, such as mobility, may shape the outcome of multilingual linguistic behaviors but are considerably under‐explored. We evaluate the effect of mobility on language use in the framework of dual spatialization in a small‐scale multilingual society. We use a footpath network to characterize mobility in absolute space, and a language network to characterize language use in relational space; we then assess the correspondence between the two networks. Redundancy analysis and thek‐means method are used to support the research goal. We found a high correspondence between mobility and language use. The results identify the absence of regional “centers of gravity” as a distinctive feature in language use, as mobility has fostered local clusters of language use. Conceptually, this study showcases the power of dual spatialization in understanding the mechanisms underlying the space–language connection.
more »
« less
- Award ID(s):
- 1761639
- PAR ID:
- 10550241
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Transactions in GIS
- ISSN:
- 1361-1682
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract This study contributes to a growing body of scholarship at the intersection of bilingual education and education policy and examines reclassification, or the transition out of formal English language services in schools, as one potential lever in accelerating or decelerating multilingual learners’ science learning. More specifically, it traces multilingual learners’ science academic achievement vis‐à‐vis science test scores over a six‐year period using the nationally‐representative Early Childhood Longitudinal Study of 2010–11 (ECLS‐K:2011) data set. We use regression analyses with panel data to explore the relationship of reclassification with MLs’ science achievement at a national scale, and then, how variation in contextual factors (including family, school, and individual characteristics) shapes this relationship. Results show that, after controlling for covariates and prior test scores, reclassification is not significantly associated with differential science test scores when compared to students that retain their EL status. Results further show that reclassification is associated with higher science achievement for MLs who were previously in a dual‐language program but lower scores for those with higher prior achievement. We conclude with implications for the reclassification process, as well as directions for future research on reclassification, multilingual learners, and academic achievement.more » « less
-
Abstract Commonly recommended methods for documenting endangered languages are built around the assumption that a given documentary project will focus on a single language rather than a multilingual ecology. This hinders the potential usability of documentary materials for the study of language contact. Research in domains such as ethnography and sociolinguistics has developed conceptual and analytical tools for understanding patterns of multilingual usage, but the insights of such work have yet to be translated into concrete recommendations for enhancements to documentary practice. This paper considers how standard documentary approaches can be adapted to multilingual contexts with respect to activities such as the collection of metadata, the use of ethnographic methods, and the recording and annotation of naturalistic multilingual discourse. A particular focus of the discussion are ways in which documentary projects can create better records of multilingual practices even if these are not the focus of the work.more » « less
-
ABSTRACT Automated scoring is a current hot topic in creativity research. However, most research has focused on the English language and popular verbal creative thinking tasks, such as the alternate uses task. Therefore, in this study, we present a large language model approach for automated scoring of a scientific creative thinking task that assesses divergent ideation in experimental tasks in the German language. Participants are required to generate alternative explanations for an empirical observation. This work analyzed a total of 13,423 unique responses. To predict human ratings of originality, we used XLM‐RoBERTa (Cross‐lingual Language Model‐RoBERTa), a large, multilingual model. The prediction model was trained on 9,400 responses. Results showed a strong correlation between model predictions and human ratings in a held‐out test set (n = 2,682;r = 0.80; CI‐95% [0.79, 0.81]). These promising findings underscore the potential of large language models for automated scoring of scientific creative thinking in the German language. We encourage researchers to further investigate automated scoring of other domain‐specific creative thinking tasks.more » « less
-
Abstract The brain can be decomposed into large-scale functional networks, but the specific spatial topographies of these networks and the names used to describe them vary across studies. Such discordance has hampered interpretation and convergence of research findings across the field. We have developed theNetwork Correspondence Toolbox(NCT) to permit researchers to examine and report spatial correspondence between their novel neuroimaging results and multiple widely used functional brain atlases. We provide several exemplar demonstrations to illustrate how researchers can use the NCT to report their own findings. The NCT provides a convenient means for computing Dice coefficients with spin test permutations to determine the magnitude and statistical significance of correspondence among user-defined maps and existing atlas labels. The adoption of the NCT will make it easier for network neuroscience researchers to report their findings in a standardized manner, thus aiding reproducibility and facilitating comparisons between studies to produce interdisciplinary insights.more » « less
An official website of the United States government
