skip to main content


Title: Containerizing an eTextbook Infrastructure
The CS Education community has developed many educational tools in recent years, such as interactive exercises. Often the developer makes them freely available for use, hosted on their own server, and usually they are directly accessible within the instructor's LMS through the LTI protocol. As convenient as this can be, instructors using these third-party tools for their courses can experience issues related to data access and privacy concerns. The tools typically collect clickstream data on student use. But they might not make it easy for the instructor to access these data, and the institution might be concerned about privacy violations. While the developers might allow and even support local installation of the tool, this can be a difficult process unless the tool carefully designed for third-party installation. And integration of small tools within larger frameworks (like a type of interactive exercise within an eTextbook framework) is also difficult without proper design. This paper describes an ongoing containerization effort for the OpenDSA eTextbook project. Our goal is both to serve our needs by creating an easier-to-manage decomposition of the many tools and sub-servers required by this complex system, and also to provide an easily installable production environment that instructors can run locally. This new system provides better access to developer-level data analysis tools and potentially removes many FERPA-related privacy concerns. We also describe our efforts to integrate Caliper Analytics into OpenDSA to expand the data collection and analysis services. We hope that our containerization architecture can help provide a roadmap for similar projects to follow  more » « less
Award ID(s):
1740765
NSF-PAR ID:
10294502
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 5th Educational Data Mining in Computer Science Education (CSEDM) Workshop
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Researchers in many disciplines are developing novel interactive smart learning objects like exercises and visualizations. Meanwhile, Learning Management Systems (LMS) and eTextbook systems are also becoming more sophisticated in their ability to use standard protocols to make use of third party smart learning objects. But at this time, educational tool developers do not always make best use of the interoperability standards and need exemplars to guide and motivate their development efforts. In this paper we present a case study where the two large educational ecosystems use the Learning Tools Interoperability (LTI) standard to allow cross-sharing of their educational materials. At the end of our development process, Virginia Tech’s OpenDSA eTextbook system became able to import materials from Aalto University’s ACOS smart learning content server, such as python programming exercises and Parsons problems. Meanwhile, University of Pittsburgh’s Mastery Grids (which already uses the ACOS exercises) was made to support CodeWorkout programming exercises (a system already used within OpenDSA). Thus, four major projects in CS Education became inter-operable. 
    more » « less
  2. The Amazon Alexa voice assistant provides convenience through automation and control of smart home appliances using voice commands. Amazon allows third-party applications known as skills to run on top of Alexa to further extend Alexa's capability. However, as multiple skills can share the same invocation phrase and request access to sensitive user data, growing security and privacy concerns surround third-party skills. In this paper, we study the availability and effectiveness of existing security indicators or a lack thereof to help users properly comprehend the risk of interacting with different types of skills. We conduct an interactive user study (inviting active users of Amazon Alexa) where participants listen to and interact with real-world skills using the official Alexa app. We find that most participants fail to identify the skill developer correctly (i.e., they assume Amazon also develops the third-party skills) and cannot correctly determine which skills will be automatically activated through the voice interface. We also propose and evaluate a few voice-based skill type indicators, showcasing how users would benefit from such voice-based indicators. 
    more » « less
  3. null (Ed.)
    Amazon's voice-based assistant, Alexa, enables users to directly interact with various web services through natural language dialogues. It provides developers with the option to create third-party applications (known as Skills) to run on top of Alexa. While such applications ease users' interaction with smart devices and bolster a number of additional services, they also raise security and privacy concerns due to the personal setting they operate in. This paper aims to perform a systematic analysis of the Alexa skill ecosystem. We perform the first large-scale analysis of Alexa skills, obtained from seven different skill stores totaling to 90,194 unique skills. Our analysis reveals several limitations that exist in the current skill vetting process. We show that not only can a malicious user publish a skill under any arbitrary developer/company name, but she can also make backend code changes after approval to coax users into revealing unwanted information. We, next, formalize the different skill-squatting techniques and evaluate the efficacy of such techniques. We find that while certain approaches are more favorable than others, there is no substantial abuse of skill squatting in the real world. Lastly, we study the prevalence of privacy policies across different categories of skill, and more importantly the policy content of skills that use the Alexa permission model to access sensitive user data. We find that around 23.3% of such skills do not fully disclose the data types associated with the permissions requested. We conclude by providing some suggestions for strengthening the overall ecosystem, and thereby enhance transparency for end-users. 
    more » « less
  4. In December, 2020, Apple began requiring developers to disclose their data collection and use practices to generate a “privacy label” for their application. The use of mobile application Software Development Kits (SDKs) and third-party libraries, coupled with a typical lack of expertise in privacy, makes it challenging for developers to accurately report their data collection and use practices. In this work we discuss the design and evaluation of a tool to help iOS developers generate privacy labels. The tool combines static code analysis to identify likely data collection and use practices with interactive functionality designed to prompt developers to elucidate analysis results and carefully reflect on their applications’ data practices. We conducted semi-structured interviews with iOS developers as they used an initial version of the tool. We discuss how these results motivated us to develop an enhanced software tool, Privacy Label Wiz, that more closely resembles interactions developers reported to be most useful in our semi-structured interviews. We present findings from our interviews and the enhanced tool motivated by our study. We also outline future directions for software tools to better assist developers communicating their mobile app’s data practices to different audiences. 
    more » « less
  5. It takes great effort to manually or semi-automatically convert free-text phenotype narratives (e.g., morphological descriptions in taxonomic works) to a computable format before they can be used in large-scale analyses. We argue that neither a manual curation approach nor an information extraction approach based on machine learning is a sustainable solution to produce computable phenotypic data that are FAIR (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016). This is because these approaches do not scale to all biodiversity, and they do not stop the publication of free-text phenotypes that would need post-publication curation. In addition, both manual and machine learning approaches face great challenges: the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other) in manual curation, and keywords to ontology concept translation in automated information extraction, make it difficult for either approach to produce data that are truly FAIR. Our empirical studies show that inter-curator variation in translating phenotype characters to Entity-Quality statements (Mabee et al. 2007) is as high as 40% even within a single project. With this level of variation, curated data integrated from multiple curation projects may still not be FAIR. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardized vocabularies (ontologies). We argue that the authors describing characters are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of the descriptions from the moment of publication. In this presentation, we will introduce the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists, which consists of three components: a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. Fig. 1 shows the system diagram of the platform. The presentation will consist of: a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. The software modules currently incorporated in Character Recorder and Conflict Resolver have undergone formal usability studies. We are actively recruiting Carex experts to participate in a 3-day usability study of the entire system of the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists. Participants will use the platform to record 100 characters about one Carex species. In addition to usability data, we will collect the terms that participants submit to the underlying ontology and the data related to conflict resolution. Such data allow us to examine the types and the quantities of logical conflicts that may result from the terms added by the users and to use Discrete Event Simulation models to understand if and how term additions and conflict resolutions converge. We look forward to a discussion on how the tools (Character Recorder is online at http://shark.sbs.arizona.edu/chrecorder/public) described in our presentation can contribute to producing and publishing FAIR data in taxonomic studies. 
    more » « less