skip to main content


Title: LensKit for Python: Next-Generation Software for Recommender Systems Experiments
LensKit is an open-source toolkit for building, researching, and learning about recommender systems. First released in 2010 as a Java framework, it has supported diverse published research, small-scale production deployments, and education in both MOOC and traditional classroom settings. In this paper, I present the next generation of the LensKit project, re-envisioning the original tool's objectives as flexible Python package for supporting recommender systems research and development. LensKit for Python (LKPY) enables researchers and students to build robust, flexible, and reproducible experiments that make use of the large and growing PyData and Scientific Python ecosystem, including scikit-learn, and TensorFlow. To that end, it provides classical collaborative filtering implementations, recommender system evaluation metrics, data preparation routines, and tools for efficiently batch running recommendation algorithms, all usable in any combination with each other or with other Python software. This paper describes the design goals, use cases, and capabilities of LKPY, contextualized in a reflection on the successes and failures of the original LensKit for Java software.  more » « less
Award ID(s):
1751278
NSF-PAR ID:
10199450
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the 29th ACM International Conference on Information and Knowledge Management
Page Range / eLocation ID:
2999 to 3006
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The evolution of Mechatronics and Robotics Engineering (MRE) has enabled numerous technological advancements since the early 20th century. Professionals in this field are reshaping the world by designing smart and autonomous systems aiming to improve human well-being. Recognizing the need for preparing highly-educated MRE professionals, many universities and colleges are adopting MRE as a distinct degree program. One of the cornerstones of MRE education is laboratory- and project-based learning to provide a hands-on and engaging experience for the students. To this end, numerous software and hardware platforms have been developed and utilized in MRE courses and laboratories. Commercial products can provide a rich hands-on experience for the students, but they can be cost-prohibitive. On the other hand, open-source platforms are low-cost alternatives to their commercial counterparts and are being increasingly used in industry. Developing open-source laboratory platforms will be a more feasible option for a wider range of institutions and would enable familiarizing the students with recent technological trends in industry and exposing them to the development details of a real-world system. However, adoption of open-source platforms in MRE courses can be lengthy and time consuming. Educators who wish to utilize such systems typically lack the expertise in all aspects of their implementation which can make them difficult to troubleshoot. Debugging open-source systems can also be challenging because most of the troubleshooting is done through forum discussions which appear to be very noisy and unfocused. The flip side of this chaotic nature of the open-source world is that there is a vast amount of information available, including tutorials, examples, and commentary and, with some focused searching, debugging and usage questions can often get answered. There is also a disconnect between the forum participants, typically computer scientists and hobbyists, and MRE educators and students. Finally, the available resources and documentation for utilizing open-source platforms in MRE education are insufficient and incomprehensive. Therefore, the main goal of this paper is to increase awareness and familiarity with the use of open-source software and hardware packages in MRE education and practice towards accelerating their adoption. To this end, open-source software packages such as Python, GNU Octave, OpenFOAM, Java, Modelica, Gazebo, SPICE, Scilab, and Gnuplot, which have the potential to be useful in the modeling and analysis of MRE systems are introduced. Furthermore, low-cost and powerful open-source hardware packages such as Arduino, Raspberry Pi, and BeagleBone which can be used as the main processing unit for data acquisition and control implementation in a wide range of MRE systems are reviewed and their limitations and potentials are investigated. This paper provides a valuable resource for MRE students and faculty who would like to utilize open-source hardware and software platforms in their education and research. 
    more » « less
  2. Call graphs have many applications in software engineering, including bug-finding, security analysis, and code navigation in IDEs. However, the construction of call graphs requires significant investment in program analysis infrastructure. An increasing number of programming languages compile to the Java Virtual Machine (JVM), and program analysis frameworks such as WALA and SOOT support a broad range of program analysis algorithms by analyzing JVM bytecode. This approach has been shown to work well when applied to bytecode produced from Java code. In this paper, we show that it also works well for diverse other JVM-hosted languages: dynamically-typed functional Scheme, statically-typed object-oriented Scala, and polymorphic functional OCaml. Effectively, we get call graph construction for these languages for free, using existing analysis infrastructure for Java, with only minor challenges to soundness. This, in turn, suggests that bytecode-based analysis could serve as an implementation vehicle for bug-finding, security analysis, and IDE features for these languages. We present qualitative and quantitative analyses of the soundness and precision of call graphs constructed from JVM bytecodes for these languages, and also for Groovy, Clojure, Python, and Ruby. However, we also show that implementation details matter greatly. In particular, the JVM-hosted implementations of Groovy, Clojure, Python, and Ruby produce very unsound call graphs, due to the pervasive use of reflection, invokedynamic instructions, and run-time code generation. Interestingly, the dynamic translation schemes employed by these languages, which result in unsound static call graphs, tend to be correlated with poor performance at run time. 
    more » « less
  3. Summary

    Researchers and practitioners have designed and implemented various automated test case generators to support effective software testing. Such generators exist for various languages (e.g., Java, C#, or Python) and various platforms (e.g., desktop, web, or mobile applications). The generators exhibit varying effectiveness and efficiency, depending on the testing goals they aim to satisfy (e.g., unit‐testing of libraries versus system‐testing of entire applications) and the underlying techniques they implement. In this context, practitioners need to be able to compare different generators to identify the most suited one for their requirements, while researchers seek to identify future research directions. This can be achieved by systematically executing large‐scale evaluations of different generators. However, executing such empirical evaluations is not trivial and requires substantial effort to select appropriate benchmarks, setup the evaluation infrastructure, and collect and analyse the results. In this Software Note, we present ourJUnit Generation Benchmarking Infrastructure(JUGE) supporting generators (search‐based, random‐based, symbolic execution, etc.) seeking to automate the production of unit tests for various purposes (validation, regression testing, fault localization, etc.). The primary goal is to reduce the overall benchmarking effort, ease the comparison of several generators, and enhance the knowledge transfer between academia and industry by standardizing the evaluation and comparison process. Since 2013, several editions of a unit testing tool competition, co‐located with the Search‐Based Software Testing Workshop, have taken place whereJUGEwas used and evolved. As a result, an increasing amount of tools (over 10) from academia and industry have been evaluated onJUGE, matured over the years, and allowed the identification of future research directions. Based on the experience gained from the competitions, we discuss the expected impact ofJUGEin improving the knowledge transfer on tools and approaches for test generation between academia and industry. Indeed, theJUGEinfrastructure demonstrated an implementation design that is flexible enough to enable the integration of additional unit test generation tools, which is practical for developers and allows researchers to experiment with new and advanced unit testing tools and approaches.

     
    more » « less
  4. Limitations in the applicability, accuracy, and precision of individual structure characterization methods can sometimes be overcome via an integrative modeling approach that relies on information from all available sources, including all available experimental data and prior models. The open-source Integrative Modeling Platform (IMP) is one piece of software that implements all computational aspects of integrative modeling. To maximize the impact of integrative structures, the coordinates should be made publicly available, as is already the case for structures based on X-ray crystallography, NMR spectroscopy, and electron microscopy. Moreover, the associated experimental data and modeling protocols should also be archived, such that the original results can easily be reproduced. Finally, it is essential that the integrative structures are validated as part of their publication and deposition. A number of research groups have already developed software to implement integrative modeling and have generated a number of structures, prompting the formation of an Integrative/Hybrid Methods Task Force. Following the recommendations of this task force, the existing PDBx/mmCIF data representation used for atomic PDB structures has been extended to address the requirements for archiving integrative structural models. This IHM-dictionary adds a flexible model representation, including coarse graining, models in multiple states and/or related by time or other order, and multiple input experimental information sources. A prototype archiving system called PDB-Dev ( https://pdb-dev.wwpdb.org ) has also been created to archive integrative structural models, together with a Python library to facilitate handling of integrative models in PDBx/mmCIF format. 
    more » « less
  5. null (Ed.)
    Virtual conversational assistants designed specifically for software engineers could have a huge impact on the time it takes for software engineers to get help. Research efforts are focusing on virtual assistants that support specific software development tasks such as bug repair and pair programming. In this paper, we study the use of online chat platforms as a resource towards collecting developer opinions that could potentially help in building opinion Q&A systems, as a specialized instance of virtual assistants and chatbots for software engineers. Opinion Q&A has a stronger presence in chats than in other developer communications, thus mining them can provide a valuable resource for developers in quickly getting insight about a specific development topic (e.g., What is the best Java library for parsing JSON?). We address the problem of opinion Q&A extraction by developing automatic identification of opinion-asking questions and extraction of participants’ answers from public online developer chats. We evaluate our automatic approaches on chats spanning six programming communities and two platforms. Our results show that a heuristic approach to opinion-asking questions works well (.87 precision), and a deep learning approach customized to the software domain outperforms heuristics-based, machine-learning-based and deep learning for answer extraction in community question answering. 
    more » « less