skip to main content


Title: EURECA: Enhanced Understanding of Real Environments via Crowd Assistance
Indoor robots hold the promise of automatically handling mundane daily tasks, helping to improve access for people with disabilities, and providing on-demand access to remote physical environments. Unfortunately, the ability to understand never-before-seen objects in scenes where new items may be added (e.g., purchased) or altered (e.g., damaged) on a regular basis remains an open challenge for robotics. In this paper, we introduce EURECA, a mixed-initiative system that leverages online crowds of human contributors to help robots robustly identify 3D point cloud segments corresponding to user-referenced objects in near real-time. EURECA allows robots to understand multi-object 3D scenes on-the-fly (in ∼40 seconds) by providing groups of non-expert crowd workers with intelligent tools that can segment objects more quickly (∼70% faster) and more accurately than individuals. More broadly, EURECA introduces the first real-time crowdsourcing tool that addresses the challenge of learning about new objects in real-world settings, creating a new source of data for training robots online, as well as a platform for studying mixed-initiative crowdsourcing workflows for understanding 3D scenes.  more » « less
Award ID(s):
1638047
NSF-PAR ID:
10066635
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
AAAI Conference on Human Computation (HCOMP 2018)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Robotics has emerged as one of the most popular subjects in STEM (Science, Technology, Engineering, and Mathematics) education for students in elementary, middle, and high schools, providing them with an opportunity to gain knowledge of engineering and technology. In recent years, flying robots (or drones) have also gained popularity as teaching tool to impart the fundamentals of computer programming to high school students. However, despite completing the programming course, students may still lack an understanding of the working principle of drones. This paper proposes an approach to teach students the basic principles of drone aeronautics through laboratory programming. This course was designed by professors from Vaughn College of Aeronautics and Technology for high school students who work on after-school and weekend programs during the school year or summer. In early 2021, the college applied for and was approved to offer a certificate program in UAS (Unmanned Aerial Systems) Designs, Applications, and Operations to college students by the Education Department of New York State. Later that year, the college also received a grant from the Federal Aviation Administration (FAA) to provide tuition-free early higher education for high school students, allowing them to complete the majority of the credits in the UAS certificate program while still enrolled in high school. The program aims to equip students with the hands-on skills necessary for successful careers as versatile engineers and technicians. Most of the courses in the certificate program are introductory or application-oriented, such as Introduction to Drones, Drone Law, Part 107 License, or Fundamentals of Land Surveying and Photogrammetry. However, one of the courses, Introduction to Drone Aeronautics, is more focused on the theory of drone flight and control. Organizing the lectures and laboratory of the course for high school students who are interested in pursuing the certificate can be a challenge. To create the Introduction to Drone Aeronautics course, a variety of school courses and online resources were examined. After careful consideration, the Robolink Co-drone [1] was chosen as the experimental platform for students to study drone flight, and control and stabilize a drone. However, developing a set of comprehensible lectures proved to be a difficult task. Based on the requirements of the certificate program, the lectures were designed to cover the following topics: (a) an overview of fundamentals of drone flight principles, including the forces acting on a drone such as lift, weight, drag, and thrust, as well as the selection of on-board components and trade-offs for proper payload and force balance; (b) an introduction to the proportional-integral-directive (PID) controller and its role in stabilizing a drone and reducing steady-state errors; (c) an explanation of the forces acting on a drone in different coordinates, along with coordinate transformations; and (d) an opportunity for students to examine the dynamic model of a 3D quadcopter with control parameters, but do not require them to derive the 3D drone dynamic equations. In the future, the course can be improved to cater to the diverse learning needs of the students. More interactive and accessible tools can be developed to help different types of students understand drone aeronautics. For instance, some students may prefer to apply mathematical skills to derive results, while others may find it easier to comprehend the stable flight of a drone by visualizing the continuous changes in forces and balances resulting from the control of DC motor speeds. Despite the differences in students’ mathematical abilities, the course has helped high school students appreciate that mathematics is a powerful tool for solving complex problems in the real world, rather than just a subject of abstract numbers. 
    more » « less
  2. Researchers, educators, and multimedia designers need to better understand how mixing physical tangible objects with virtual experiences affects learning and science identity. In this novel study, a 3D-printed tangible that is an accurate facsimile of the sort of expensive glassware that chemists use in real laboratories is tethered to a laptop with a digitized lesson. Interactive educational content is increasingly being placed online, it is important to understand the educational boundary conditions associated with passive haptics and 3D-printed manipulables. Cost-effective printed objects would be particularly welcome in rural and low Socio-Economic (SES) classrooms. A Mixed Reality (MR) experience was created that used a physical 3D-printed haptic burette to control a computer-based chemistry titration experiment. This randomized control trial study with 136 college students had two conditions: 1) low-embodied control (using keyboard arrows), and 2) high-embodied experimental (physically turning a valve/stopcock on the 3D-printed burette). Although both groups displayed similar significant gains on the declarative knowledge test, deeper analyses revealed nuanced Aptitude by Treatment Interactions (ATIs). These interactionsfavored the high-embodied experimental group that used the MR devicefor both titration-specific posttest knowledge questions and for science efficacy and science identity. Those students with higher prior science knowledge displayed higher titration knowledge scores after using the experimental 3D-printed haptic device. A multi-modal linguistic and gesture analysis revealed that during recall the experimental participants used the stopcock-turning gesture significantly more often, and their recalls created a significantly different Epistemic Network Analysis (ENA). ENA is a type of 2D projection of the recall data, stronger connections were seen in the high embodied group mainly centering on the key hand-turning gesture. Instructors and designers should consider the multi-modal and multi-dimensional nature of the user interface, and how the addition of another sensory-based learning signal (haptics) might differentially affect lower prior knowledge students. One hypothesis is that haptically manipulating novel devices during learning may create more cognitive load. For low prior knowledge students, it may be advantageous for them to begin learning content on a more ubiquitous interface (e.g., keyboard) before moving them to more novel, multi-modal MR devices/interfaces.

     
    more » « less
  3. Manipulation of deformable objects is a desired skill in making robots ubiquitous in manufacturing, service, healthcare, and security. Common deformable objects (e.g., wires, clothes, bed sheets, etc.) are significantly more difficult to model than rigid objects. In this research, we contribute to the model-based manipulation of linear flexible objects such as cables. We propose a 3D geometric model of the linear flexible object that is subject to gravity and a physical model with multiple links connected by revolute joints and identified model parameters. These models enable task automation in manipulating linear flexible objects both in simulation and real world. To bridge the gap between simulation and real world and build a close-to-reality simulation of flexible objects, we propose a new strategy called Simulation-to-Real-to-Simulation (Sim2Real2Sim). We demonstrate the feasibility of our approach by completing the Plug Task used in the 2015 DARPA Robotics Challenge Finals both in simulation and real world, which involves unplugging a power cable from one socket and plugging it into another. Numerical experiments are implemented to validate our approach. 
    more » « less
  4. While current vision algorithms excel at many challenging tasks, it is unclear how well they understand the physical dynamics of real-world environments. Here we introduce Physion, a dataset and benchmark for rigorously evaluating the ability to predict how physical scenarios will evolve over time. Our dataset features realistic simulations of a wide range of physical phenomena, including rigid and soft-body collisions, stable multi-object configurations, rolling, sliding, and projectile motion, thus providing a more comprehensive challenge than previous benchmarks. We used Physion to benchmark a suite of models varying in their architecture, learning objective, input-output structure, and training data. In parallel, we obtained precise measurements of human prediction behavior on the same set of scenarios, allowing us to directly evaluate how well any model could approximate human behavior. We found that vision algorithms that learn object-centric representations generally outperform those that do not, yet still fall far short of human performance. On the other hand, graph neural networks with direct access to physical state information both perform substantially better and make predictions that are more similar to those made by humans. These results suggest that extracting physical representations of scenes is the main bottleneck to achieving human-level and human-like physical understanding in vision algorithms. We have publicly released all data and code to facilitate the use of Physion to benchmark additional models in a fully reproducible manner, enabling systematic evaluation of progress towards vision algorithms that understand physical environments as robustly as people do. 
    more » « less
  5. null (Ed.)
    The DeepLearningEpilepsyDetectionChallenge: design, implementation, andtestofanewcrowd-sourced AIchallengeecosystem Isabell Kiral*, Subhrajit Roy*, Todd Mummert*, Alan Braz*, Jason Tsay, Jianbin Tang, Umar Asif, Thomas Schaffter, Eren Mehmet, The IBM Epilepsy Consortium◊ , Joseph Picone, Iyad Obeid, Bruno De Assis Marques, Stefan Maetschke, Rania Khalaf†, Michal Rosen-Zvi† , Gustavo Stolovitzky† , Mahtab Mirmomeni† , Stefan Harrer† * These authors contributed equally to this work † Corresponding authors: rkhalaf@us.ibm.com, rosen@il.ibm.com, gustavo@us.ibm.com, mahtabm@au1.ibm.com, sharrer@au.ibm.com ◊ Members of the IBM Epilepsy Consortium are listed in the Acknowledgements section J. Picone and I. Obeid are with Temple University, USA. T. Schaffter is with Sage Bionetworks, USA. E. Mehmet is with the University of Illinois at Urbana-Champaign, USA. All other authors are with IBM Research in USA, Israel and Australia. Introduction This decade has seen an ever-growing number of scientific fields benefitting from the advances in machine learning technology and tooling. More recently, this trend reached the medical domain, with applications reaching from cancer diagnosis [1] to the development of brain-machine-interfaces [2]. While Kaggle has pioneered the crowd-sourcing of machine learning challenges to incentivise data scientists from around the world to advance algorithm and model design, the increasing complexity of problem statements demands of participants to be expert data scientists, deeply knowledgeable in at least one other scientific domain, and competent software engineers with access to large compute resources. People who match this description are few and far between, unfortunately leading to a shrinking pool of possible participants and a loss of experts dedicating their time to solving important problems. Participation is even further restricted in the context of any challenge run on confidential use cases or with sensitive data. Recently, we designed and ran a deep learning challenge to crowd-source the development of an automated labelling system for brain recordings, aiming to advance epilepsy research. A focus of this challenge, run internally in IBM, was the development of a platform that lowers the barrier of entry and therefore mitigates the risk of excluding interested parties from participating. The challenge: enabling wide participation With the goal to run a challenge that mobilises the largest possible pool of participants from IBM (global), we designed a use case around previous work in epileptic seizure prediction [3]. In this “Deep Learning Epilepsy Detection Challenge”, participants were asked to develop an automatic labelling system to reduce the time a clinician would need to diagnose patients with epilepsy. Labelled training and blind validation data for the challenge were generously provided by Temple University Hospital (TUH) [4]. TUH also devised a novel scoring metric for the detection of seizures that was used as basis for algorithm evaluation [5]. In order to provide an experience with a low barrier of entry, we designed a generalisable challenge platform under the following principles: 1. No participant should need to have in-depth knowledge of the specific domain. (i.e. no participant should need to be a neuroscientist or epileptologist.) 2. No participant should need to be an expert data scientist. 3. No participant should need more than basic programming knowledge. (i.e. no participant should need to learn how to process fringe data formats and stream data efficiently.) 4. No participant should need to provide their own computing resources. In addition to the above, our platform should further • guide participants through the entire process from sign-up to model submission, • facilitate collaboration, and • provide instant feedback to the participants through data visualisation and intermediate online leaderboards. The platform The architecture of the platform that was designed and developed is shown in Figure 1. The entire system consists of a number of interacting components. (1) A web portal serves as the entry point to challenge participation, providing challenge information, such as timelines and challenge rules, and scientific background. The portal also facilitated the formation of teams and provided participants with an intermediate leaderboard of submitted results and a final leaderboard at the end of the challenge. (2) IBM Watson Studio [6] is the umbrella term for a number of services offered by IBM. Upon creation of a user account through the web portal, an IBM Watson Studio account was automatically created for each participant that allowed users access to IBM's Data Science Experience (DSX), the analytics engine Watson Machine Learning (WML), and IBM's Cloud Object Storage (COS) [7], all of which will be described in more detail in further sections. (3) The user interface and starter kit were hosted on IBM's Data Science Experience platform (DSX) and formed the main component for designing and testing models during the challenge. DSX allows for real-time collaboration on shared notebooks between team members. A starter kit in the form of a Python notebook, supporting the popular deep learning libraries TensorFLow [8] and PyTorch [9], was provided to all teams to guide them through the challenge process. Upon instantiation, the starter kit loaded necessary python libraries and custom functions for the invisible integration with COS and WML. In dedicated spots in the notebook, participants could write custom pre-processing code, machine learning models, and post-processing algorithms. The starter kit provided instant feedback about participants' custom routines through data visualisations. Using the notebook only, teams were able to run the code on WML, making use of a compute cluster of IBM's resources. The starter kit also enabled submission of the final code to a data storage to which only the challenge team had access. (4) Watson Machine Learning provided access to shared compute resources (GPUs). Code was bundled up automatically in the starter kit and deployed to and run on WML. WML in turn had access to shared storage from which it requested recorded data and to which it stored the participant's code and trained models. (5) IBM's Cloud Object Storage held the data for this challenge. Using the starter kit, participants could investigate their results as well as data samples in order to better design custom algorithms. (6) Utility Functions were loaded into the starter kit at instantiation. This set of functions included code to pre-process data into a more common format, to optimise streaming through the use of the NutsFlow and NutsML libraries [10], and to provide seamless access to the all IBM services used. Not captured in the diagram is the final code evaluation, which was conducted in an automated way as soon as code was submitted though the starter kit, minimising the burden on the challenge organising team. Figure 1: High-level architecture of the challenge platform Measuring success The competitive phase of the "Deep Learning Epilepsy Detection Challenge" ran for 6 months. Twenty-five teams, with a total number of 87 scientists and software engineers from 14 global locations participated. All participants made use of the starter kit we provided and ran algorithms on IBM's infrastructure WML. Seven teams persisted until the end of the challenge and submitted final solutions. The best performing solutions reached seizure detection performances which allow to reduce hundred-fold the time eliptologists need to annotate continuous EEG recordings. Thus, we expect the developed algorithms to aid in the diagnosis of epilepsy by significantly shortening manual labelling time. Detailed results are currently in preparation for publication. Equally important to solving the scientific challenge, however, was to understand whether we managed to encourage participation from non-expert data scientists. Figure 2: Primary occupation as reported by challenge participants Out of the 40 participants for whom we have occupational information, 23 reported Data Science or AI as their main job description, 11 reported being a Software Engineer, and 2 people had expertise in Neuroscience. Figure 2 shows that participants had a variety of specialisations, including some that are in no way related to data science, software engineering, or neuroscience. No participant had deep knowledge and experience in data science, software engineering and neuroscience. Conclusion Given the growing complexity of data science problems and increasing dataset sizes, in order to solve these problems, it is imperative to enable collaboration between people with differences in expertise with a focus on inclusiveness and having a low barrier of entry. We designed, implemented, and tested a challenge platform to address exactly this. Using our platform, we ran a deep-learning challenge for epileptic seizure detection. 87 IBM employees from several business units including but not limited to IBM Research with a variety of skills, including sales and design, participated in this highly technical challenge. 
    more » « less