skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Linguistics of Programming
Research in programming languages and software engineering are broadly concerned with the study of aspects of computer programs: their syntactic structure, the relationship between form and meaning (semantics), empirical properties of how they are constructed and deployed, and more. We could equally well apply this description to the range of ways in which linguistics studies the form, meaning, and use of natural language. We argue that despite some notable examples of PL and SE research drawing on ideas from natural language processing, there are still a wealth of concepts, techniques, and conceptual framings originating in linguistics which would be of use to PL and SE research. Moreover we show that beyond mere parallels, there are cases where linguistics research has complementary methodologies, may help explain or predict study outcomes, or offer new perspectives on established research areas in PL and SE. Broadly, we argue that researchers across PL and SE are investigating close cousins of problems actively studied for years by linguists, and familiarity with linguistics research seems likely to bear fruit for many PL and SE researchers.  more » « less
Award ID(s):
2220991
PAR ID:
10553785
Author(s) / Creator(s):
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400712159
Page Range / eLocation ID:
162 to 182
Format(s):
Medium: X
Location:
Pasadena CA USA
Sponsoring Org:
National Science Foundation
More Like this
  1. This article reviews the study of language across disciplines. We focus on epistemological and methodological frameworks in the study of language broadly within linguistics departments and across disciplinary areas that concentrates on the organizational structure across departments, degree programs, organizations, and professions. We then emphasize emerging transdisciplinary trends in the study of language and communication. We highlight pressing research required to recenter humans and the human communicative experience. We use examples from lexical and morphological investigations to illustrate the complexity and relevance of the study of language across areas and paradigms. 
    more » « less
  2. This article reviews the study of language across disciplines. We focus on epistemological and methodological frameworks in the study of language broadly within linguistics departments and across disciplinary areas that concentrates on the organizational structure across departments, degree programs, organizations, and professions. We then emphasize emerging transdisciplinary trends in the study of language and communication. We highlight pressing research required to recenter humans and the human communicative experience. We use examples from lexical and morphological investigations to illustrate the complexity and relevance of the study of language across areas and paradigms. 
    more » « less
  3. In standard models of language production or comprehension, the elements which are retrieved from memory and combined into a syntactic structure are “lemmas” or “lexical items.” Such models implicitly take a “lexicalist” approach, which assumes that lexical items store meaning, syntax, and form together, that syntactic and lexical processes are distinct, and that syntactic structure does not extend below the word level. Across the last several decades, linguistic research examining a typologically diverse set of languages has provided strong evidence against this approach. These findings suggest that syntactic processes apply both above and below the “word” level, and that both meaning and form are partially determined by the syntactic context. This has significant implications for psychological and neurological models of language processing as well as for the way that we understand different types of aphasia and other language disorders. As a consequence of the lexicalist assumptions of these models, many kinds of sentences that speakers produce and comprehend—in a variety of languages, including English—are challenging for them to account for. Here we focus on language production as a case study. In order to move away from lexicalism in psycho- and neuro-linguistics, it is not enough to simply update the syntactic representations of words or phrases; the processing algorithms involved in language production are constrained by the lexicalist representations that they operate on, and thus also need to be reimagined. We provide an overview of the arguments against lexicalism, discuss how lexicalist assumptions are represented in models of language production, and examine the types of phenomena that they struggle to account for as a consequence. We also outline what a non-lexicalist alternative might look like, as a model that does not rely on a lemma representation, but instead represents that knowledge as separate mappings between (a) meaning and syntax and (b) syntax and form, with a single integrated stage for the retrieval and assembly of syntactic structure. By moving away from lexicalist assumptions, this kind of model provides better cross-linguistic coverage and aligns better with contemporary syntactic theory. 
    more » « less
  4. Chasins, Sarah; Glassman, Elena; Sunshine, Joshua (Ed.)
    A domain-specific language (DSL) design space describes a collection of related languages via a series of, often orthogonal, dimensions. While PL and HCI researchers have independently developed methods for working with design spaces, the communities have yet to fully benefit from each others' insights. In pursuit of new approaches informed by both PL and HCI, we first review existing approaches researchers employ to conceptualize, develop, and use design spaces in DSL design across the two disciplines. For example, HCI researchers, when developing interfaces backed by DSLs, often treat the design process as core to their research contributions and theory-building. In PL, researchers have explored formal approaches to design spaces that help automate design space exploration and provide powerful conceptual clarity to language design tradeoffs. We then discuss areas where the two fields share common methods and highlight opportunities for researchers to combine knowledge across PL and HCI. 
    more » « less
  5. The concern regarding users’ data privacy has risen to its highest level due to the massive increase in communication platforms, social networking sites, and greater users’ participation in online public discourse. An increasing number of people exchange private information via emails, text messages, and social media without being aware of the risks and implications. Researchers in the field of Natural Language Processing (NLP) have concentrated on creating tools and strategies to identify, categorize, and sanitize private information in text data since a substantial amount of data is exchanged in textual form. However, most of the detection methods solely rely on the existence of pre-identified keywords in the text and disregard the inference of underlying meaning of the utterance in a specific context. Hence, in some situations these tools and algorithms fail to detect disclosure, or the produced results are miss classified. In this paper, we propose a multi-input, multi-output hybrid neural network which utilizes transfer-learning, linguistics, and metadata to learn the hidden patterns. Our goal is to better classify disclosure/non-disclosure content in terms of the context of situation. We trained and evaluated our model on a human-annotated ground truth dataset, containing a total of 5,400 tweets. The results show that the proposed model was able to identify privacy disclosure through tweets with an accuracy of 77.4% while classifying the information type of those tweets with an impressive accuracy of 99%, by jointly learning for two separate tasks. 
    more » « less