skip to main content


Search for: All records

Award ID contains: 1723869

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, in- troduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accu- mulates in long term tracking to necessitate re-initialization of the object’s pose. This work proposes a data-driven opti- mization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object’s model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also the most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz. 
    more » « less
  2. Picking an item in the presence of other objects can be challenging as it involves occlusions and partial views. Given object models, one approach is to perform object pose estimation and use the most likely candidate pose per object to pick the target without collisions. This approach, however, ignores the uncertainty of the perception process both regarding the target’s and the surrounding objects’ poses. This work proposes first a perception process for 6D pose estimation, which returns a discrete distribution of object poses in a scene. Then, an open-loop planning pipeline is proposed to return safe and effective solutions for moving a robotic arm to pick, which (a) minimizes the probability of collision with the obstructing objects; and (b) maximizes the probability of reaching the target item. The planning framework models the challenge as a stochastic variant of the Minimum Constraint Removal (MCR) problem. The effectiveness of the methodology is verified given both simulated and real data in different scenarios. The experiments demonstrate the importance of considering the uncertainty of the perception process in terms of safe execution. The results also show that the methodology is more effective than conservative MCR approaches, which avoid all possible object poses regardless of the reported uncertainty. 
    more » « less
  3. Predicting the crowd behavior in complex environments is a key requirement for crowd and disaster management, architectural design, and urban planning. Given a crowd’s immediate state, current approaches must be successively repeated over multiple time-steps for long-term predictions, leading to compute expensive and error-prone results. However, most applications require the ability to accurately predict hundreds of possible simulation outcomes (e.g., under different environment and crowd situations) at real-time rates, for which these approaches are prohibitively expensive. We propose the first deep framework to instantly predict the long-term flow of crowds in arbitrarily large, realistic environments. Central to our approach are a novel representation CAGE, which efficiently encodes crowd scenarios into compact, fixed-size representations that losslessly represent the environment, and a modified SegNet architecture for instant long-term crowd flow prediction. We conduct comprehensive experiments on novel synthetic and real datasets. Our results indicate that our approach is able to capture the essence of real crowd movement over very long time periods, while generalizing to never-before-seen environments and crowd contexts. The associated Supplementary Material, models, and datasets are available at github.com/SSSohn/LTCF. 
    more » « less
  4. Multiscale modeling has yielded immense success on various machine learning tasks. However, it has not been properly explored for the prominent task of information diffusion, which aims to understand how information propagates along users in online social networks. For a specific user, whether and when to adopt a piece of information propagated from another user is affected by complex interactions, and thus, is very challenging to model. Current state-of-the-art techniques invoke deep neural models with vector representations of users. In this paper, we present a Hierarchical Information Diffusion (HID) framework by integrating user representation learning and multiscale modeling. The proposed framework can be layered on top of all information diffusion techniques that leverage user representations, so as to boost the predictive power and learning efficiency of the original technique. Extensive experiments on three real-world datasets showcase the superiority of our method.

     
    more » « less
  5. Many manipulation tasks, such as placement or within-hand manipulation, require the object’s pose relative to a robot hand. The task is difficult when the hand significantly occludes the object. It is especially hard for adaptive hands, for which it is not easy to detect the finger’s configuration. In addition, RGB-only approaches face issues with texture-less objects or when the hand and the object look similar. This paper presents a depth-based framework, which aims for robust pose estimation and short response times. The approach detects the adaptive hand’s state via efficient parallel search given the highest overlap between the hand’s model and the point cloud. The hand’s point cloud is pruned and robust global registration is performed to generate object pose hypotheses, which are clustered. False hypotheses are pruned via physical reasoning. The remaining poses’ quality is evaluated given agreement with observed data. Extensive evaluation on synthetic and real data demonstrates the accuracy and computational efficiency of the framework when applied on challenging, highly-occluded scenarios for different object types. An ablation study identifies how the framework’s components help in performance. This work also provides a dataset for in-hand 6D object pose esti- mation. Code and dataset are available at: https://github. com/wenbowen123/icra20-hand-object-pose 
    more » « less
  6. Dense crowds in public spaces have often caused serious security issues at large events. In this paper, we study the 2010 Love Parade disaster, for which a large amount of data (e.g. research papers, professional reports and video footage) exist. We reproduce the Love Parade disaster in a three-dimensional computer simulation calibrated with data from the actual event and using the social force model for pedestrian behaviour. Moreover, we simulate several crowd management strategies and investigate their ability to prevent the disaster. We evaluate these strategies in virtual reality (VR) by measuring the response and arousal of participants while experiencing the simulated event from a festival attendee’s perspective. Overall, we find that opening an additional exit and removing the police cordons could have significantly reduced the number of casualties. We also find that this strategy affects the physiological responses of the participants in VR. 
    more » « less