skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: LensKit for Python: Next-Generation Software for Recommender Systems Experiments
LensKit is an open-source toolkit for building, researching, and learning about recommender systems. First released in 2010 as a Java framework, it has supported diverse published research, small-scale production deployments, and education in both MOOC and traditional classroom settings. In this paper, I present the next generation of the LensKit project, re-envisioning the original tool's objectives as flexible Python package for supporting recommender systems research and development. LensKit for Python (LKPY) enables researchers and students to build robust, flexible, and reproducible experiments that make use of the large and growing PyData and Scientific Python ecosystem, including scikit-learn, and TensorFlow. To that end, it provides classical collaborative filtering implementations, recommender system evaluation metrics, data preparation routines, and tools for efficiently batch running recommendation algorithms, all usable in any combination with each other or with other Python software. This paper describes the design goals, use cases, and capabilities of LKPY, contextualized in a reflection on the successes and failures of the original LensKit for Java software.  more » « less
Award ID(s):
1751278
PAR ID:
10199450
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the 29th ACM International Conference on Information and Knowledge Management
Page Range / eLocation ID:
2999 to 3006
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Call graphs have many applications in software engineering, including bug-finding, security analysis, and code navigation in IDEs. However, the construction of call graphs requires significant investment in program analysis infrastructure. An increasing number of programming languages compile to the Java Virtual Machine (JVM), and program analysis frameworks such as WALA and SOOT support a broad range of program analysis algorithms by analyzing JVM bytecode. This approach has been shown to work well when applied to bytecode produced from Java code. In this paper, we show that it also works well for diverse other JVM-hosted languages: dynamically-typed functional Scheme, statically-typed object-oriented Scala, and polymorphic functional OCaml. Effectively, we get call graph construction for these languages for free, using existing analysis infrastructure for Java, with only minor challenges to soundness. This, in turn, suggests that bytecode-based analysis could serve as an implementation vehicle for bug-finding, security analysis, and IDE features for these languages. We present qualitative and quantitative analyses of the soundness and precision of call graphs constructed from JVM bytecodes for these languages, and also for Groovy, Clojure, Python, and Ruby. However, we also show that implementation details matter greatly. In particular, the JVM-hosted implementations of Groovy, Clojure, Python, and Ruby produce very unsound call graphs, due to the pervasive use of reflection, invokedynamic instructions, and run-time code generation. Interestingly, the dynamic translation schemes employed by these languages, which result in unsound static call graphs, tend to be correlated with poor performance at run time. 
    more » « less
  2. Software applications and workloads, especially within the domains of Cloud computing and large-scale AI model training, exert considerable demand on computing resources, thus contributing significantly to the overall energy footprint of the IT industry. In this paper, we present an in-depth analysis of certain software coding practices that can play a substantial role in increasing the application’s overall energy consumption, primarily stemming from the suboptimal utilization of computing resources. Our study encompasses a thorough investigation of 16 distinct code smells and other coding malpractices across 31 real-world open-source applications written in Java and Python. Through our research, we provide compelling evidence that various common refactoring techniques, typically employed to rectify specific code smells, can unintentionally escalate the application’s energy consumption. We illustrate that a discerning and strategic approach to code smell refactoring can yield substantial energy savings. For selective refactorings, this yields a reduction of up to 13.1% of energy consumption and 5.1% of carbon emissions per workload on average. These findings underscore the potential of selective and intelligent refactoring to substantially increase energy efficiency of Cloud software systems. 
    more » « less
  3. In our previous research we found that teaching novice programmers introductory programming in Java using subgoal labels led to deeper knowledge [2] and increased persistence for students potentially at risk of dropping out or failing their first undergraduate course in CS [3]. Given the increasing number of universities using Python in introductory programming classes, we have begun the process of defining the subgoals to use in Python-based introductory courses. We have repeated the Task Analysis by Problem Solving (TAPS) development process [1] that we used to create Java-based instructional materials. As with the Java-based instructional materials, we have created subgoals for both evaluating programs written in Python and writing programs in Python. The poster will present an overview of the TAPS development process for identifying the Python subgoals and the proposed experimental design for assessing the effectiveness of the instructional materials. 
    more » « less
  4. Abstract Over the past few decades, the measurement precision of some pulsar timing experiments has advanced from ∼10 μ s to ∼10 ns, revealing many subtle phenomena. Such high precision demands both careful data handling and sophisticated timing models to avoid systematic error. To achieve these goals, we present PINT ( P INT I s N ot T empo3 ), a high-precision Python pulsar timing data analysis package, which is hosted on GitHub and available on the Python Package Index (PyPI) as pint-pulsar . PINT is well tested, validated, object oriented, and modular, enabling interactive data analysis and providing an extensible and flexible development platform for timing applications. It utilizes well-debugged public Python packages (e.g., the N um P y and A stropy libraries) and modern software development schemes (e.g., version control and efficient development with git and GitHub) and a continually expanding test suite for improved reliability, accuracy, and reproducibility. PINT is developed and implemented without referring to, copying, or transcribing the code from other traditional pulsar timing software packages (e.g., Tempo / Tempo2 ) and therefore provides a robust tool for cross-checking timing analyses and simulating pulse arrival times. In this paper, we describe the design, use, and validation of PINT , and we compare timing results between it and Tempo and Tempo2 . 
    more » « less
  5. Synthetic data is a useful resource for algorithmic research. It allows for the evaluation of systems under a range of conditions that might be difficult to achieve in real world settings. In recommender systems, the use of synthetic data is somewhat limited; some work has concentrated on building user-item interaction data at large scale. We believe that fairness-aware recommendation research can benefit from simulated data as it allows the study of protected groups and their interactions without depending on sensitive data that needs privacy protection. In this paper, we propose a novel type of data for fairness-aware recommendation: synthetic recommender system outputs that can be used to study re-ranking algorithms. 
    more » « less