skip to main content


Title: AsterixDB Mid-Flight: A Case Study in Building Systems in Academia
Building large software systems is always a chal- lenging venture, but it is especially so in academia. This paper describes the experiences that the author and his (mostly UC- based) partners in software crime have had that culminated in the Big Data Management System now available as Apache AsterixDB. It covers a mix of the history and technical content of the nearly ten-year-old project, starting with its inception during the MapReduce craze. It describes the phases that the effort has gone through and some of the lessons learned along the way. The paper also covers some personal reflections and opinions about the challenges of systems-building, as well as writing about it, in our current academic culture. Included is the case for doing this sort of work at all – discussing the pitfalls of doing “systems” research in the absence of an actual system, and why the gain outweighs the pain of building and sharing database software in academia. As of late 2018, Apache AsterixDB is also having a commercial impact as the storage and parallel query engine underlying a new offering called Couchbase Analytics. The last part of the paper explains how we are attempting to balance the uses of AsterixDB as (i) a generally available open source Apache software platform, (ii) an end-to-end research testbed for universities, and (iii) the technology powering a commercial NoSQL product.  more » « less
Award ID(s):
1925610
NSF-PAR ID:
10184925
Author(s) / Creator(s):
Date Published:
Journal Name:
2019 IEEE 35th International Conference on Data Engineering (ICDE)
Page Range / eLocation ID:
1 to 12
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In September 2019, the fourth and final workshop on the Future of Mechatronics and Robotics Education (FoMRE) was held at a Lawrence Technological University in Southfield, MI. This workshop was organized by faculty at several universities with financial support from industry partners and the National Science Foundation. The purpose of the workshops was to create a cohesive effort among mechatronics and robotics courses, minors and degree programs. Mechatronics and Robotics Engineering (MRE) is an integration of mechanics, controls, electronics, and software, which provides a unique opportunity for engineering students to function on multidisciplinary teams. Due to its multidisciplinary nature, it attracts diverse and innovative students, and graduates better-prepared professional engineers. In this fast growing field, there is a great need to standardize educational material and make MRE education more widely available and easier to adopt. This can only be accomplished if the community comes together to speak with one clear voice about not only the benefits, but also the best ways to teach it. These efforts would also aid in establishing more of these degree programs and integrating minors or majors into existing computer science, mechanical engineering, or electrical engineering departments. The final workshop was attended by approximately 50 practitioners from industry and academia. Participants identified many practical skills required for students to succeed in an MRE curriculum and as practicing engineers after graduation. These skills were then organized into the following categories: professional, independent learning, controller design, numerical simulation and analysis, electronics, software development, and system design. For example, professional skills include technical reports, presentations, and documentation. Independent learning includes reading data sheets, performing internet searches, doing a literature review, and having a maker mindset. Numerical simulation skills include understanding data, presenting data graphically, solving and simulating in software such as MATLAB, Simulink and Excel. Controller design involves selecting a controller, tuning a controller, designing to meet specifications, and understanding when the results are good enough. Electronics skills include selecting sensors, interfacing sensors, interfacing actuators, creating printed circuit boards, wiring on a breadboard, soldering, installing drivers, using integrated circuits, and using microcontrollers. Software development of embedded systems includes agile program design, state machines, analyzing and evaluating code results, commenting code, troubleshooting, debugging, AI and machine learning. Finally, system design includes prototyping, creating CAD models, design for manufacturing, breaking a system down into subsystems, integrating and interfacing subcomponents, having a multidisciplinary perspective, robustness, evaluating tradeoffs, testing, validation, and verification, failure, effect, and mode analysis. A survey was prepared and sent out to the participants from all four workshops as well as other robotics faculty, researchers and industry personnel in order to elicit a broader community response. Because one of the biggest challenges in mechatronics and robotics education is the absence of standardized curricula, textbooks, platforms, syllabi, assignments, and learning outcomes, this was a vital part of the process to achieve some level of consensus. This paper presents an introduction to MRE education, related work on existing programs, methods, results of the practical skills survey, and then draws conclusions based upon these results. It aims to create the foundation for standardizing the development of student skills in mechatronics and robotics curricula across institutions, disciplines, majors and minors. The survey was completed by 94 participants and it was clear that there is a consensus that the primary skills students should have upon completion of MRE courses or a program is a broader multidisciplinary systems-level perspective, an ability to problem solve, and an ability to design a system to meet specifications. 
    more » « less
  2. Analyzing the increasingly large volumes of data that are available today, possibly including the application of custom machine learning models, requires the utilization of distributed frameworks. This can result in serious productivity issues for “normal” data scientists. This paper introduces AFrame, a new scalable data analysis package powered by a Big Data management system that extends the data scientists' familiar DataFrame operations to efficiently operate on managed data at scale. AFrame is implemented as a layer on top of Apache AsterixDB, transparently scaling out the execution of DataFrame operations and machine learning model invocation through a parallel, shared-nothing big data management system. AFrame incrementally constructs SQL++ queries and leverages AsterixDB's semistructured data management facilities, user-defined function support, and live data ingestion support. In order to evaluate the proposed approach, this paper also introduces an extensible micro-benchmark for use in evaluating DataFrame performance in both single-node and distributed settings via a collection of representative analytic operations. This paper presents the architecture of AFrame, describes the underlying capabilities of AsterixDB that efficiently support modern data analytic operations, and utilizes the proposed benchmark to evaluate and compare the performance and support for largescale data analyses provided by alternative DataFrame libraries. 
    more » « less
  3. The evolution of Mechatronics and Robotics Engineering (MRE) has enabled numerous technological advancements since the early 20th century. Professionals in this field are reshaping the world by designing smart and autonomous systems aiming to improve human well-being. Recognizing the need for preparing highly-educated MRE professionals, many universities and colleges are adopting MRE as a distinct degree program. One of the cornerstones of MRE education is laboratory- and project-based learning to provide a hands-on and engaging experience for the students. To this end, numerous software and hardware platforms have been developed and utilized in MRE courses and laboratories. Commercial products can provide a rich hands-on experience for the students, but they can be cost-prohibitive. On the other hand, open-source platforms are low-cost alternatives to their commercial counterparts and are being increasingly used in industry. Developing open-source laboratory platforms will be a more feasible option for a wider range of institutions and would enable familiarizing the students with recent technological trends in industry and exposing them to the development details of a real-world system. However, adoption of open-source platforms in MRE courses can be lengthy and time consuming. Educators who wish to utilize such systems typically lack the expertise in all aspects of their implementation which can make them difficult to troubleshoot. Debugging open-source systems can also be challenging because most of the troubleshooting is done through forum discussions which appear to be very noisy and unfocused. The flip side of this chaotic nature of the open-source world is that there is a vast amount of information available, including tutorials, examples, and commentary and, with some focused searching, debugging and usage questions can often get answered. There is also a disconnect between the forum participants, typically computer scientists and hobbyists, and MRE educators and students. Finally, the available resources and documentation for utilizing open-source platforms in MRE education are insufficient and incomprehensive. Therefore, the main goal of this paper is to increase awareness and familiarity with the use of open-source software and hardware packages in MRE education and practice towards accelerating their adoption. To this end, open-source software packages such as Python, GNU Octave, OpenFOAM, Java, Modelica, Gazebo, SPICE, Scilab, and Gnuplot, which have the potential to be useful in the modeling and analysis of MRE systems are introduced. Furthermore, low-cost and powerful open-source hardware packages such as Arduino, Raspberry Pi, and BeagleBone which can be used as the main processing unit for data acquisition and control implementation in a wide range of MRE systems are reviewed and their limitations and potentials are investigated. This paper provides a valuable resource for MRE students and faculty who would like to utilize open-source hardware and software platforms in their education and research. 
    more » « less
  4. null (Ed.)
    LensKit is an open-source toolkit for building, researching, and learning about recommender systems. First released in 2010 as a Java framework, it has supported diverse published research, small-scale production deployments, and education in both MOOC and traditional classroom settings. In this paper, I present the next generation of the LensKit project, re-envisioning the original tool's objectives as flexible Python package for supporting recommender systems research and development. LensKit for Python (LKPY) enables researchers and students to build robust, flexible, and reproducible experiments that make use of the large and growing PyData and Scientific Python ecosystem, including scikit-learn, and TensorFlow. To that end, it provides classical collaborative filtering implementations, recommender system evaluation metrics, data preparation routines, and tools for efficiently batch running recommendation algorithms, all usable in any combination with each other or with other Python software. This paper describes the design goals, use cases, and capabilities of LKPY, contextualized in a reflection on the successes and failures of the original LensKit for Java software. 
    more » « less
  5. With the rise of data science, there has been a sharp increase in data-driven techniques that rely on both real and synthetic data. At the same time, there is a growing interest from the scientific com- munity in the reproducibility of results. Some conferences include this explicitly in their review forms or give special badges to repro- ducible papers. This tutorial describes two systems that facilitate the design of reproducible experiments on both real and synthetic data. UCR-Star is an interactive repository that hosts terabytes of open geospatial data. In addition to the ability to explore and visu- alize this data, UCR-Star makes it easy to share all or parts of these datasets in many standard formats ensuring that other researchers can get the same exact data mentioned in the paper. Spider is a spa- tial data generator that generates standardized spatial datasets with full control over the data characteristics which further promotes the reproducibility of results. This tutorial will be organized into two parts. The first part will exhibit the key features of UCR-star and Spider where participants can get hands-on experience in in- teracting with real spatial datasets, generating synthetic data with varying distributions, and downloading them to a local machine or a remote server. The second part will explore the integration of both UCR-Star and Spider into existing systems such as QGIS and Apache AsterixDB. 
    more » « less