For many years now, modern software is known to be developed in multiple languages (hence termed asmultilingualormulti-languagesoftware). Yet, to date, we still only have very limited knowledge about how multilingual software systems are constructed. For instance, it is not yet really clear how different languages are used, selected together, and why they have been so in multilingual software development. Given the fact that using multiple languages in a single software project has become a norm, understanding language use and selection (i.e.,language profile) as a basic element of themultilingual constructionin contemporary software engineering is an essential first step. In this article, we set out to fill this gap with a large-scale characterization study on language use and selection in open-source multilingual software. We start with presentingan updated overviewof language use in 7,113 GitHub projects spanning the 5 past years by characterizing overall statistics of language profiles, followed bya deeper lookinto the functionality relevance/justification of language selection in these projects through association rule mining. We proceed with an evolutionary characterization of 1,000 GitHub projects for each of the 10 past years to providea longitudinal viewof how language use and selection have changed over the years, as well as how the association between functionality and language selection has been evolving. Among many other findings, our study revealed a growing trend of using three to five languages in one multilingual software project and the noticeable stableness of top language selections. We found a non-trivial association between language selection and certain functionality domains, which was less stable than that with individual languages over time. In a historical context, we also have observed major shifts in these characteristics of multilingual systems both in contrast to earlier peer studies and along the evolutionary timeline. Our findings offer essential knowledge on the multilingual construction in modern software development. Based on our results, we also provide insights and actionable suggestions for both researchers and developers of multilingual systems.
more »
« less
Does Mobility Drive Language Use? A Dual‐Spatialization Perspective
ABSTRACT Multilingualism refers to a phenomenon where individuals routinely use three or more languages. Spatial processes, such as mobility, may shape the outcome of multilingual linguistic behaviors but are considerably under‐explored. We evaluate the effect of mobility on language use in the framework of dual spatialization in a small‐scale multilingual society. We use a footpath network to characterize mobility in absolute space, and a language network to characterize language use in relational space; we then assess the correspondence between the two networks. Redundancy analysis and thek‐means method are used to support the research goal. We found a high correspondence between mobility and language use. The results identify the absence of regional “centers of gravity” as a distinctive feature in language use, as mobility has fostered local clusters of language use. Conceptually, this study showcases the power of dual spatialization in understanding the mechanisms underlying the space–language connection.
more »
« less
- Award ID(s):
- 1761639
- PAR ID:
- 10573033
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Transactions in GIS
- Volume:
- 28
- Issue:
- 8
- ISSN:
- 1361-1682
- Page Range / eLocation ID:
- 2763 to 2774
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT Automated scoring is a current hot topic in creativity research. However, most research has focused on the English language and popular verbal creative thinking tasks, such as the alternate uses task. Therefore, in this study, we present a large language model approach for automated scoring of a scientific creative thinking task that assesses divergent ideation in experimental tasks in the German language. Participants are required to generate alternative explanations for an empirical observation. This work analyzed a total of 13,423 unique responses. To predict human ratings of originality, we used XLM‐RoBERTa (Cross‐lingual Language Model‐RoBERTa), a large, multilingual model. The prediction model was trained on 9,400 responses. Results showed a strong correlation between model predictions and human ratings in a held‐out test set (n = 2,682;r = 0.80; CI‐95% [0.79, 0.81]). These promising findings underscore the potential of large language models for automated scoring of scientific creative thinking in the German language. We encourage researchers to further investigate automated scoring of other domain‐specific creative thinking tasks.more » « less
-
Abstract Commonly recommended methods for documenting endangered languages are built around the assumption that a given documentary project will focus on a single language rather than a multilingual ecology. This hinders the potential usability of documentary materials for the study of language contact. Research in domains such as ethnography and sociolinguistics has developed conceptual and analytical tools for understanding patterns of multilingual usage, but the insights of such work have yet to be translated into concrete recommendations for enhancements to documentary practice. This paper considers how standard documentary approaches can be adapted to multilingual contexts with respect to activities such as the collection of metadata, the use of ethnographic methods, and the recording and annotation of naturalistic multilingual discourse. A particular focus of the discussion are ways in which documentary projects can create better records of multilingual practices even if these are not the focus of the work.more » « less
-
Abstract The brain can be decomposed into large-scale functional networks, but the specific spatial topographies of these networks and the names used to describe them vary across studies. Such discordance has hampered interpretation and convergence of research findings across the field. We have developed theNetwork Correspondence Toolbox(NCT) to permit researchers to examine and report spatial correspondence between their novel neuroimaging results and multiple widely used functional brain atlases. We provide several exemplar demonstrations to illustrate how researchers can use the NCT to report their own findings. The NCT provides a convenient means for computing Dice coefficients with spin test permutations to determine the magnitude and statistical significance of correspondence among user-defined maps and existing atlas labels. The adoption of the NCT will make it easier for network neuroscience researchers to report their findings in a standardized manner, thus aiding reproducibility and facilitating comparisons between studies to produce interdisciplinary insights.more » « less
-
A<sc>bstract</sc> Realizations of the holographic correspondence in String/M theory typically involve spacetimes of the formAdS×YwhereYis some internal space which geometrizes an internal symmetry of the dual field theory, hereafter referred to as an “Rsymmetry”. It has been speculated that areas of Ryu-Takayanagi surfaces anchored on the boundary of a subregion ofY, and smeared over the base space of the dual field theory, quantify entanglement of internal degrees of freedom. A natural candidate for the corresponding operators are linear combinations of operators with definiteRcharge with coefficients given by the “spherical harmonics” of the internal space: this is natural when the product spaces appear as IR geometries of higher dimensional AdS spaces. We study clustering properties of such operators both for pureAdS×Yand for flow geometries, whereAdS×Yarises in the IR from a different spacetime in the UV, for example higher dimensional AdS or asymptotically flat spacetime. We show, in complete generality, that the two point functions of such operators separated along the internal space obey clustering properties at scales sufficiently larger than the AdS scale. For non-compactY, this provides a notion of approximate locality. WhenYis compact, clustering happens only when the size ofYis parametrically larger than the AdS scale. This latter situation is realized in flow geometries where the product spaces arise in the IR from an asymptotically AdS geometry at UV, but not typically when they arise near black hole horizons in asymptotically flat spacetimes. We discuss the significance of this result for entanglement and comment on the role of color degrees of freedom.more » « less
An official website of the United States government

