skip to main content


Title: Real-Time Multi-Modal Human–Robot Collaboration Using Gestures and Speech
Abstract As artificial intelligence and industrial automation are developing, human–robot collaboration (HRC) with advanced interaction capabilities has become an increasingly significant area of research. In this paper, we design and develop a real-time, multi-model HRC system using speech and gestures. A set of 16 dynamic gestures is designed for communication from a human to an industrial robot. A data set of dynamic gestures is designed and constructed, and it will be shared with the community. A convolutional neural network is developed to recognize the dynamic gestures in real time using the motion history image and deep learning methods. An improved open-source speech recognizer is used for real-time speech recognition of the human worker. An integration strategy is proposed to integrate the gesture and speech recognition results, and a software interface is designed for system visualization. A multi-threading architecture is constructed for simultaneously operating multiple tasks, including gesture and speech data collection and recognition, data integration, robot control, and software interface operation. The various methods and algorithms are integrated to develop the HRC system, with a platform constructed to demonstrate the system performance. The experimental results validate the feasibility and effectiveness of the proposed algorithms and the HRC system.  more » « less
Award ID(s):
1954548
NSF-PAR ID:
10352879
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Manufacturing Science and Engineering
Volume:
144
Issue:
10
ISSN:
1087-1357
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    With the development of industrial automation and artificial intelligence, robotic systems are developing into an essential part of factory production, and the human-robot collaboration (HRC) becomes a new trend in the industrial field. In our previous work, ten dynamic gestures have been designed for communication between a human worker and a robot in manufacturing scenarios, and a dynamic gesture recognition model based on Convolutional Neural Networks (CNN) has been developed. Based on the model, this study aims to design and develop a new real-time HRC system based on multi-threading method and the CNN. This system enables the real-time interaction between a human worker and a robotic arm based on dynamic gestures. Firstly, a multi-threading architecture is constructed for high-speed operation and fast response while schedule more than one task at the same time. Next, A real-time dynamic gesture recognition algorithm is developed, where a human worker’s behavior and motion are continuously monitored and captured, and motion history images (MHIs) are generated in real-time. The generation of the MHIs and their identification using the classification model are synchronously accomplished. If a designated dynamic gesture is detected, it is immediately transmitted to the robotic arm to conduct a real-time response. A Graphic User Interface (GUI) for the integration of the proposed HRC system is developed for the visualization of the real-time motion history and classification results of the gesture identification. A series of actual collaboration experiments are carried out between a human worker and a six-degree-of-freedom (6 DOF) Comau industrial robot, and the experimental results show the feasibility and robustness of the proposed system.

     
    more » « less
  2. Abstract

    Human-robot collaboration (HRC) is a challenging task in modern industry and gesture communication in HRC has attracted much interest. This paper proposes and demonstrates a dynamic gesture recognition system based on Motion History Image (MHI) and Convolutional Neural Networks (CNN). Firstly, ten dynamic gestures are designed for a human worker to communicate with an industrial robot. Secondly, the MHI method is adopted to extract the gesture features from video clips and generate static images of dynamic gestures as inputs to CNN. Finally, a CNN model is constructed for gesture recognition. The experimental results show very promising classification accuracy using this method.

     
    more » « less
  3. null (Ed.)
    The DeepLearningEpilepsyDetectionChallenge: design, implementation, andtestofanewcrowd-sourced AIchallengeecosystem Isabell Kiral*, Subhrajit Roy*, Todd Mummert*, Alan Braz*, Jason Tsay, Jianbin Tang, Umar Asif, Thomas Schaffter, Eren Mehmet, The IBM Epilepsy Consortium◊ , Joseph Picone, Iyad Obeid, Bruno De Assis Marques, Stefan Maetschke, Rania Khalaf†, Michal Rosen-Zvi† , Gustavo Stolovitzky† , Mahtab Mirmomeni† , Stefan Harrer† * These authors contributed equally to this work † Corresponding authors: rkhalaf@us.ibm.com, rosen@il.ibm.com, gustavo@us.ibm.com, mahtabm@au1.ibm.com, sharrer@au.ibm.com ◊ Members of the IBM Epilepsy Consortium are listed in the Acknowledgements section J. Picone and I. Obeid are with Temple University, USA. T. Schaffter is with Sage Bionetworks, USA. E. Mehmet is with the University of Illinois at Urbana-Champaign, USA. All other authors are with IBM Research in USA, Israel and Australia. Introduction This decade has seen an ever-growing number of scientific fields benefitting from the advances in machine learning technology and tooling. More recently, this trend reached the medical domain, with applications reaching from cancer diagnosis [1] to the development of brain-machine-interfaces [2]. While Kaggle has pioneered the crowd-sourcing of machine learning challenges to incentivise data scientists from around the world to advance algorithm and model design, the increasing complexity of problem statements demands of participants to be expert data scientists, deeply knowledgeable in at least one other scientific domain, and competent software engineers with access to large compute resources. People who match this description are few and far between, unfortunately leading to a shrinking pool of possible participants and a loss of experts dedicating their time to solving important problems. Participation is even further restricted in the context of any challenge run on confidential use cases or with sensitive data. Recently, we designed and ran a deep learning challenge to crowd-source the development of an automated labelling system for brain recordings, aiming to advance epilepsy research. A focus of this challenge, run internally in IBM, was the development of a platform that lowers the barrier of entry and therefore mitigates the risk of excluding interested parties from participating. The challenge: enabling wide participation With the goal to run a challenge that mobilises the largest possible pool of participants from IBM (global), we designed a use case around previous work in epileptic seizure prediction [3]. In this “Deep Learning Epilepsy Detection Challenge”, participants were asked to develop an automatic labelling system to reduce the time a clinician would need to diagnose patients with epilepsy. Labelled training and blind validation data for the challenge were generously provided by Temple University Hospital (TUH) [4]. TUH also devised a novel scoring metric for the detection of seizures that was used as basis for algorithm evaluation [5]. In order to provide an experience with a low barrier of entry, we designed a generalisable challenge platform under the following principles: 1. No participant should need to have in-depth knowledge of the specific domain. (i.e. no participant should need to be a neuroscientist or epileptologist.) 2. No participant should need to be an expert data scientist. 3. No participant should need more than basic programming knowledge. (i.e. no participant should need to learn how to process fringe data formats and stream data efficiently.) 4. No participant should need to provide their own computing resources. In addition to the above, our platform should further • guide participants through the entire process from sign-up to model submission, • facilitate collaboration, and • provide instant feedback to the participants through data visualisation and intermediate online leaderboards. The platform The architecture of the platform that was designed and developed is shown in Figure 1. The entire system consists of a number of interacting components. (1) A web portal serves as the entry point to challenge participation, providing challenge information, such as timelines and challenge rules, and scientific background. The portal also facilitated the formation of teams and provided participants with an intermediate leaderboard of submitted results and a final leaderboard at the end of the challenge. (2) IBM Watson Studio [6] is the umbrella term for a number of services offered by IBM. Upon creation of a user account through the web portal, an IBM Watson Studio account was automatically created for each participant that allowed users access to IBM's Data Science Experience (DSX), the analytics engine Watson Machine Learning (WML), and IBM's Cloud Object Storage (COS) [7], all of which will be described in more detail in further sections. (3) The user interface and starter kit were hosted on IBM's Data Science Experience platform (DSX) and formed the main component for designing and testing models during the challenge. DSX allows for real-time collaboration on shared notebooks between team members. A starter kit in the form of a Python notebook, supporting the popular deep learning libraries TensorFLow [8] and PyTorch [9], was provided to all teams to guide them through the challenge process. Upon instantiation, the starter kit loaded necessary python libraries and custom functions for the invisible integration with COS and WML. In dedicated spots in the notebook, participants could write custom pre-processing code, machine learning models, and post-processing algorithms. The starter kit provided instant feedback about participants' custom routines through data visualisations. Using the notebook only, teams were able to run the code on WML, making use of a compute cluster of IBM's resources. The starter kit also enabled submission of the final code to a data storage to which only the challenge team had access. (4) Watson Machine Learning provided access to shared compute resources (GPUs). Code was bundled up automatically in the starter kit and deployed to and run on WML. WML in turn had access to shared storage from which it requested recorded data and to which it stored the participant's code and trained models. (5) IBM's Cloud Object Storage held the data for this challenge. Using the starter kit, participants could investigate their results as well as data samples in order to better design custom algorithms. (6) Utility Functions were loaded into the starter kit at instantiation. This set of functions included code to pre-process data into a more common format, to optimise streaming through the use of the NutsFlow and NutsML libraries [10], and to provide seamless access to the all IBM services used. Not captured in the diagram is the final code evaluation, which was conducted in an automated way as soon as code was submitted though the starter kit, minimising the burden on the challenge organising team. Figure 1: High-level architecture of the challenge platform Measuring success The competitive phase of the "Deep Learning Epilepsy Detection Challenge" ran for 6 months. Twenty-five teams, with a total number of 87 scientists and software engineers from 14 global locations participated. All participants made use of the starter kit we provided and ran algorithms on IBM's infrastructure WML. Seven teams persisted until the end of the challenge and submitted final solutions. The best performing solutions reached seizure detection performances which allow to reduce hundred-fold the time eliptologists need to annotate continuous EEG recordings. Thus, we expect the developed algorithms to aid in the diagnosis of epilepsy by significantly shortening manual labelling time. Detailed results are currently in preparation for publication. Equally important to solving the scientific challenge, however, was to understand whether we managed to encourage participation from non-expert data scientists. Figure 2: Primary occupation as reported by challenge participants Out of the 40 participants for whom we have occupational information, 23 reported Data Science or AI as their main job description, 11 reported being a Software Engineer, and 2 people had expertise in Neuroscience. Figure 2 shows that participants had a variety of specialisations, including some that are in no way related to data science, software engineering, or neuroscience. No participant had deep knowledge and experience in data science, software engineering and neuroscience. Conclusion Given the growing complexity of data science problems and increasing dataset sizes, in order to solve these problems, it is imperative to enable collaboration between people with differences in expertise with a focus on inclusiveness and having a low barrier of entry. We designed, implemented, and tested a challenge platform to address exactly this. Using our platform, we ran a deep-learning challenge for epileptic seizure detection. 87 IBM employees from several business units including but not limited to IBM Research with a variety of skills, including sales and design, participated in this highly technical challenge. 
    more » « less
  4. Abstract

    Directing groups of unmanned air vehicles (UAVs) is a task that typically requires the full attention of several operators. This can be prohibitive in situations where an operator must pay attention to their surroundings. In this paper we present a gesture device that assists operators in commanding UAVs in focus-constrained environments. The operator influences the UAVs’ behavior by using intuitive hand gesture movements. Gestures are captured using an accelerometer and gyroscope and then classified using a logistic regression model. Ten gestures were chosen to provide behaviors for a group of fixed-wing UAVs. These behaviors specified various searching, following, and tracking patterns that could be used in a dynamic environment. A novel variant of the Monte Carlo Tree Search algorithm was developed to autonomously plan the paths of the cooperating UAVs. These autonomy algorithms were executed when their corresponding gesture was recognized by the gesture device. The gesture device was trained to classify the ten gestures and accurately identified them 95% of the time. Each of the behaviors associated with the gestures was tested in hardware-in-the-loop simulations and the ability to dynamically switch between them was demonstrated. The results show that the system can be used as a natural interface to assist an operator in directing a fleet of UAVs.

    Article highlights

    A gesture device was created that enables operators to command a group of UAVs in focus-constrained environments.

    Each gesture triggers high-level commands that direct a UAV group to execute complex behaviors.

    Software simulations and hardware-in-the-loop testing shows the device is effective in directing UAV groups.

     
    more » « less
  5. null (Ed.)
    ABSTRACT Introduction Short response time is critical for future military medical operations in austere settings or remote areas. Such effective patient care at the point of injury can greatly benefit from the integration of semi-autonomous robotic systems. To achieve autonomy, robots would require massive libraries of maneuvers collected with the goal of training machine learning algorithms. Although this is attainable in controlled settings, obtaining surgical data in austere settings can be difficult. Hence, in this article, we present the Dexterous Surgical Skill (DESK) database for knowledge transfer between robots. The peg transfer task was selected as it is one of the six main tasks of laparoscopic training. In addition, we provide a machine learning framework to evaluate novel transfer learning methodologies on this database. Methods A set of surgical gestures was collected for a peg transfer task, composed of seven atomic maneuvers referred to as surgemes. The collected Dexterous Surgical Skill dataset comprises a set of surgical robotic skills using the four robotic platforms: Taurus II, simulated Taurus II, YuMi, and the da Vinci Research Kit. Then, we explored two different learning scenarios: no-transfer and domain-transfer. In the no-transfer scenario, the training and testing data were obtained from the same domain; whereas in the domain-transfer scenario, the training data are a blend of simulated and real robot data, which are tested on a real robot. Results Using simulation data to train the learning algorithms enhances the performance on the real robot where limited or no real data are available. The transfer model showed an accuracy of 81% for the YuMi robot when the ratio of real-tosimulated data were 22% to 78%. For the Taurus II and the da Vinci, the model showed an accuracy of 97.5% and 93%, respectively, training only with simulation data. Conclusions The results indicate that simulation can be used to augment training data to enhance the performance of learned models in real scenarios. This shows potential for the future use of surgical data from the operating room in deployable surgical robots in remote areas. 
    more » « less