skip to main content


Title: Mechanical intelligence for learning embodied sensor-object relationships
Abstract

Intelligence involves processing sensory experiences into representations useful for prediction. Understanding sensory experiences and building these contextual representations without prior knowledge of sensor models and environment is a challenging unsupervised learning problem. Current machine learning methods process new sensory data using prior knowledge defined by either domain knowledge or datasets. When datasets are not available, data acquisition is needed, though automating exploration in support of learning is still an unsolved problem. Here we develop a method that enables agents to efficiently collect data for learning a predictive sensor model—without requiring domain knowledge, human input, or previously existing data—using ergodicity to specify the data acquisition process. This approach is based entirely on data-driven sensor characteristics rather than predefined knowledge of the sensor model and its physical characteristics. We learn higher quality models with lower energy expenditure during exploration for data acquisition compared to competing approaches, including both random sampling and information maximization. In addition to applications in autonomy, our approach provides a potential model of how animals use their motor control to develop high quality models of their sensors (sight, sound, touch) before having knowledge of their sensor capabilities or their surrounding environment.

 
more » « less
Award ID(s):
1837515
NSF-PAR ID:
10369039
Author(s) / Creator(s):
;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
13
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Previous studies have convincingly shown that traditional, content-centered, and didactic teaching methods are not effective for developing a deep understanding and knowledge transfer. Nor does it adequately address the development of critical problem-solving skills. Active and collaborative instruction, coupled with effective means to encourage student engagement, invariably leads to better student learning outcomes irrespective of academic discipline. Despite these findings, the existing construction engineering programs, for the most part, consist of a series of fragmented courses that mainly focus on procedural skills rather than on the fundamental and conceptual knowledge that helps students become innovative problem-solvers. In addition, these courses are heavily dependent on traditional lecture-based teaching methods focused on well-structured and closed-ended problems that prepare students to plug variables into equations to get the answer. Existing programs rarely offer a systematic approach to allow students to develop a deep understanding of the engineering core concepts and discover systematic solutions for fundamental problems. Without properly understanding these core concepts, contextualized in domain-specific settings, students are not able to develop a holistic view that will help them to recognize the big picture and think outside the box to come up with creative solutions for arising problems. The long history of empirical learning in the field of construction engineering shows the significant potential of cognitive development through direct experience and reflection on what works in particular situations. Of course, the complex nature of the construction industry in the twenty-first century cannot afford an education through trial and error in the real environment. However, recent advances in computer science can help educators develop virtual environments and gamification platforms that allow students to explore various scenarios and learn from their experiences. This study aims to address this need by assessing the effectiveness of guided active exploration in a digital game environment on students’ ability to discover systematic solutions for fundamental problems in construction engineering. To address this objective, through a research project funded by the NSF Division of Engineering Education and Centers (EEC), we designed and developed a scenario-based interactive digital game, called Zebel, to guide students solve fundamental problems in construction scheduling. The proposed gamified pedagogical approach was designed based on the Constructivism learning theory and a framework that consists of six essential elements: (1) modeling; (2) reflection; (3) strategy formation; (4) scaffolded exploration; (5) debriefing; and (6) articulation. We also designed a series of pre- and post-assessment instruments for empirical data collection to assess the effectiveness of the proposed approach. The proposed gamified method was implemented in a graduate-level construction planning and scheduling course. The outcomes indicated that students with no prior knowledge of construction scheduling methods were able to discover systematic solutions for fundamental scheduling problems through their experience with the proposed gamified learning method. 
    more » « less
  2. null (Ed.)
    Human activity recognition (HAR) from wearable sensors data has become ubiquitous due to the widespread proliferation of IoT and wearable devices. However, recognizing human activity in heterogeneous environments, for example, with sensors of different models and make, across different persons and their on-body sensor placements introduces wide range discrepancies in the data distributions, and therefore, leads to an increased error margin. Transductive transfer learning techniques such as domain adaptation have been quite successful in mitigating the domain discrepancies between the source and target domain distributions without the costly target domain data annotations. However, little exploration has been done when multiple distinct source domains are present, and the optimum mapping to the target domain from each source is not apparent. In this paper, we propose a deep Multi-Source Adversarial Domain Adaptation (MSADA) framework that opportunistically helps select the most relevant feature representations from multiple source domains and establish such mappings to the target domain by learning the perplexity scores. We showcase that the learned mappings can actually reflect our prior knowledge on the semantic relationships between the domains, indicating that MSADA can be employed as a powerful tool for exploratory activity data analysis. We empirically demonstrate that our proposed multi-source domain adaptation approach achieves 2% improvement with OPPORTUNITY dataset (cross-person heterogeneity, 4 ADLs), whereas 13% improvement on DSADS dataset (cross-position heterogeneity, 10 ADLs and sports activities). 
    more » « less
  3. As augmented and virtual reality (AR/VR) technology matures, a method is desired to represent real-world persons visually and aurally in a virtual scene with high fidelity to craft an immersive and realistic user experience. Current technologies leverage camera and depth sensors to render visual representations of subjects through avatars, and microphone arrays are employed to localize and separate high-quality subject audio through beamforming. However, challenges remain in both realms. In the visual domain, avatars can only map key features (e.g., pose, expression) to a predetermined model, rendering them incapable of capturing the subjects’ full details. Alternatively, high-resolution point clouds can be utilized to represent human subjects. However, such three-dimensional data is computationally expensive to process. In the realm of audio, sound source separation requires prior knowledge of the subjects’ locations. However, it may take unacceptably long for sound source localization algorithms to provide this knowledge, which can still be error-prone, especially with moving objects. These challenges make it difficult for AR systems to produce real-time, high-fidelity representations of human subjects for applications such as AR/VR conferencing that mandate negligible system latency. We present Acuity, a real-time system capable of creating high-fidelity representations of human subjects in a virtual scene both visually and aurally. Acuity isolates subjects from high-resolution input point clouds. It reduces the processing overhead by performing background subtraction at a coarse resolution, then applying the detected bounding boxes to fine-grained point clouds. Meanwhile, Acuity leverages an audiovisual sensor fusion approach to expedite sound source separation. The estimated object location in the visual domain guides the acoustic pipeline to isolate the subjects’ voices without running sound source localization. Our results demonstrate that Acuity can isolate multiple subjects’ high-quality point clouds with a maximum latency of 70 ms and average throughput of over 25 fps, while separating audio in less than 30 ms. We provide the source code of Acuity at: https://github.com/nesl/Acuity. 
    more » « less
  4. Persistence diagrams have been widely used to quantify the underlying features of filtered topological spaces in data visualization. In many applications, computing distances between diagrams is essential; however, computing these distances has been challenging due to the computational cost. In this paper, we propose a persistence diagram hashing framework that learns a binary code representation of persistence diagrams, which allows for fast computation of distances. This framework is built upon a generative adversarial network (GAN) with a diagram distance loss function to steer the learning process. Instead of using standard representations, we hash diagrams into binary codes, which have natural advantages in large-scale tasks. The training of this model is domain-oblivious in that it can be computed purely from synthetic, randomly created diagrams. As a consequence, our proposed method is directly applicable to various datasets without the need for retraining the model. These binary codes, when compared using fast Hamming distance, better maintain topological similarity properties between datasets than other vectorized representations. To evaluate this method, we apply our framework to the problem of diagram clustering and we compare the quality and performance of our approach to the state-of-the-art. In addition, we show the scalability of our approach on a dataset with 10k persistence diagrams, which is not possible with current techniques. Moreover, our experimental results demonstrate that our method is significantly faster with the potential of less memory usage, while retaining comparable or better quality comparisons. 
    more » « less
  5. Speech processing is highly incremental. It is widely accepted that human listeners continuously use the linguistic context to anticipate upcoming concepts, words, and phonemes. However, previous evidence supports two seemingly contradictory models of how a predictive context is integrated with the bottom-up sensory input: Classic psycholinguistic paradigms suggest a two-stage process, in which acoustic input initially leads to local, context-independent representations, which are then quickly integrated with contextual constraints. This contrasts with the view that the brain constructs a single coherent, unified interpretation of the input, which fully integrates available information across representational hierarchies, and thus uses contextual constraints to modulate even the earliest sensory representations. To distinguish these hypotheses, we tested magnetoencephalography responses to continuous narrative speech for signatures of local and unified predictive models. Results provide evidence that listeners employ both types of models in parallel. Two local context models uniquely predict some part of early neural responses, one based on sublexical phoneme sequences, and one based on the phonemes in the current word alone; at the same time, even early responses to phonemes also reflect a unified model that incorporates sentence-level constraints to predict upcoming phonemes. Neural source localization places the anatomical origins of the different predictive models in nonidentical parts of the superior temporal lobes bilaterally, with the right hemisphere showing a relative preference for more local models. These results suggest that speech processing recruits both local and unified predictive models in parallel, reconciling previous disparate findings. Parallel models might make the perceptual system more robust, facilitate processing of unexpected inputs, and serve a function in language acquisition. MEG Data MEG data is in FIFF format and can be opened with MNE-Python. Data has been directly converted from the acquisition device native format without any preprocessing. Events contained in the data indicate the stimuli in numerical order. Subjects R2650 and R2652 heard stimulus 11b instead of 11. Predictor Variables The original audio files are copyrighted and cannot be shared, but the make_audio folder contains make_clips.py which can be used to extract the exact clips from the commercially available audiobook (ISBN 978-1480555280). The predictors directory contains all the predictors used in the original study as pickled eelbrain objects. They can be loaded in Python with the eelbrain.load.unpickle function. The TextGrids directory contains the TextGrids aligned to the audio files. Source Localization The localization.zip file contains files needed for source localization. Structural brain models used in the published analysis are reconstructed by scaling the FreeSurfer fsaverage brain (distributed with FreeSurfer) based on each subject's `MRI scaling parameters.cfg` file. This can be done using the `mne.scale_mri` function. Each subject's MEG folder contains a `subject-trans.fif` file which contains the coregistration between MEG sensor space and (scaled) MRI space, which is used to compute the forward solution. 
    more » « less