In a programmer's pursuit of using or creating new programming languages, finding errors in the syntax of code can present many issues. Languages with little to no documentation and incomprehensible exception handling and reports are frustrating to work with and can create confusion when trying to locate where in the code the program has faulted. In this paper we present {\em CodeBlock}, a parser generator and syntax checker for arbitrary programming languages. CodeBlock is a block based grammar builder for any programming language that constructs a parsing expression grammar for the language based on user built expressions. This grammar can then be used within the CodeBlock website or in the CodeBlock Node.JS application to test the syntax of either written code, or files containing code in the language, reporting comprehensible error messages if errors in syntax are found. Our eventual goal is to incorporate CodeBlock into a compiler design tutoring system, called {\em CompiTS}, in which it will play a central role in teaching students how to design new programming languages and test the effectiveness of the new language using rapid prototyping and a translational approach to implementation. This is an emerging research, and in this paper, we only focus on the syntax checking component of the CompiTS system.
more »
« less
Emerging languages: An alternative approach to teaching programming languages
Abstract We challenge the idea that a course intended to convey principles of languages should be structured according to those principles, and present an alternate approach to teaching a programming language course. The approach involves teaching emerging programming languages. This approach results in a variety of course desiderata including scope for instructor customization; alignment with current trends in language evolution, practice, and research; and congruence with industrial needs. We discuss the rationale for, the course mechanics supporting, and the consequences of this approach.
more »
« less
- Award ID(s):
- 1712406
- PAR ID:
- 10179044
- Date Published:
- Journal Name:
- Journal of Functional Programming
- Volume:
- 29
- ISSN:
- 0956-7968
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Computational thinking can be deemed as thinking in algorithmic way, with which one can transpose given problems into computer algorithms. Since computational thinking requires abstract reasoning, it should not depend on particular programming languages. Unfortunately, introductory programming courses (CS1) often give students false impression that their goals are to teach a particular programming language. This study shares the design of new pedagogy for CS1 that removes dependency on a particular language and promotes computational thinking by teaching multiple programming languages simultaneously. Specifically, chosen programming languages range from low-level to high-level to expose students to different levels of abstraction from the details of computer architecture. Initial student survey responses from both trial and control groups show that there are significant improvements for the trial groups.more » « less
-
Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as building blocks for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming language. Code LLMs produce impressive results on high-resource programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages that have limited training data available (e.g., OCaml, Racket, and several others). This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach, called MultiPL-T, generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. MultiPL-T translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize unit tests for commented code from a high-resource source language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate the code from the high-resource source language to a target low-resource language. This gives us a corpus of candidate training data in the target language, but many of these translations are wrong. 3) We use a lightweight compiler to compile the test cases generated in (1) from the source language to the target language, which allows us to filter our obviously wrong translations. The result is a training corpus in the target low-resource language where all items have been validated with test cases. We apply this approach to generate tens of thousands of new, validated training items for five low-resource languages: Julia, Lua, OCaml, R, and Racket, using Python as the source high-resource language. Furthermore, we use an open Code LLM (StarCoderBase) with open training data (The Stack), which allows us to decontaminate benchmarks, train models without violating licenses, and run experiments that could not otherwise be done. Using datasets generated with MultiPL-T, we present fine-tuned versions of StarCoderBase and Code Llama for Julia, Lua, OCaml, R, and Racket that outperform other fine-tunes of these base models on the natural language to code task. We also present Racket fine-tunes for two very recent models, DeepSeek Coder and StarCoder2, to show that MultiPL-T continues to outperform other fine-tuning approaches for low-resource languages. The MultiPL-T approach is easy to apply to new languages, and is significantly more efficient and effective than alternatives such as training longer.more » « less
-
The expansion of computer science (CS) into K-12 contexts has resulted in a diverse ecosystem of curricula designed for various grade levels, teaching a variety of concepts, and using a wide array of different programming languages and environments. Many students will learn more than one programming language over the course of their studies. There is a growing need for computer science assessment that can measure student learning over time, but the multilingual learning pathways create two challenges for assessment in computer science. First, there are not validated assessments for all of the programming languages used in CS classrooms. Second, it is difficult to measure growth in student understanding over time when students move between programming languages as they progress in their CS education. In this position paper, we argue that the field of computing education research needs to develop methods and tools to better measure students' learning over time and across the different programming languages they learn along the way. In presenting this position, we share data that shows students approach assessment problems differently depending on the programming language, even when the problems are conceptually isomorphic, and discuss some approaches for developing multilingual assessments of student learning over time.more » « less
-
Background and Context: In this theory paper, we explore the concept of translanguaging from bilingual education, and its implications for teaching and learning programming and computing in especially computer science (CS) for all initiatives. Objective: We use translanguaging to examine how programming is and isn't like using human languages. We frame CS as computational literacies. We describe a pedagogical approach for teaching computational literacies. Method: We review theory from applied linguistics, literacy, and computational literacy. We provide a design narrative of our pedagogical approach by describing activities from bilingual middle school classrooms integrating Scratch into academic subjects. Findings: Translanguaging pedagogy can leverage learners' (bilingual and otherwise) full linguistic repertoires as they engage with computational literacies. Implications: Our data helps demonstrate how translanguaging can be mobilized to do CS, which has implications for increasing equitable participation in computer science.more » « less
An official website of the United States government

