skip to main content


Search for: All records

Award ID contains: 1734266

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. El Asmar, Mounir ; Grau, David ; Tang, Pingbo (Ed.)
    As a proactive means of preventing struck-by accidents in construction, many studies have presented proximity monitoring applications using wireless sensors (e.g., RFID, UWB, and GPS) or computer vision methods. Most prior research has emphasized proximity detection rather than prediction. However, prediction can be more effective and important for contact-driven accident prevention, particularly given that the sooner workers (e.g., equipment operators and workers on foot) are informed of their proximity to each other, the more likely they are to avoid the impending collision. In earlier studies, the authors presented a trajectory prediction method leveraging a deep neural network to examine the feasibility of proximity prediction in real-world applications. In this study, we enhance the existing trajectory prediction accuracy. Specifically, we improve the trajectory prediction model by tuning its pre-trained weight parameters with construction data. Moreover, inherent movement-driven post-processing algorithm is developed to refine the trajectory prediction of a target in accordance with its inherent movement patterns such as the final position, predominant direction, and average velocity. In a test on real-site operations data, the proposed approach demonstrates the improvement in accuracy: for 5.28 seconds’ prediction, it achieves 0.39 meter average displacement error, improved by 51.43% as compared with the previous one (0.84 meters). The improved trajectory prediction method can support to predict potential contact-driven hazards in advance, which can allow for prompt feedback (e.g., visible, acoustic, and vibration alarms) to equipment operators and workers on foot. The proactive intervention can lead the workers to take prompt evasive action, thereby reducing the chance of an impending collision. 
    more » « less
  2. null (Ed.)
  3. null (Ed.)
    We introduce UniLoss, a unified framework to generate surrogate losses for training deep networks with gradient descent, reducing the amount of manual design of task-specific surrogate losses. Our key observation is that in many cases, evaluating a model with a performance metric on a batch of examples can be refactored into four steps: from input to real-valued scores, from scores to comparisons of pairs of scores, from comparisons to binary variables, and from binary variables to the final performance metric. Using this refactoring we generate differentiable approximations for each non-differentiable step through interpolation. Using UniLoss, we can optimize for different tasks and metrics using one unified framework, achieving comparable performance compared with task-specific losses. We validate the effectiveness of UniLoss on three tasks and four datasets. Code is available at this https URL. 
    more » « less
  4. null (Ed.)
    Understanding spatial relations (e.g., laptop on table) in visual input is important for both humans and robots. Existing datasets are insufficient as they lack large-scale, high-quality 3D ground truth information, which is critical for learning spatial relations. In this paper, we fill this gap by constructing Rel3D: the first large-scale, human-annotated dataset for grounding spatial relations in 3D. Rel3D enables quantifying the effectiveness of 3D information in predicting spatial relations on large-scale human data. Moreover, we propose minimally contrastive data collection---a novel crowdsourcing method for reducing dataset bias. The 3D scenes in our dataset come in minimally contrastive pairs: two scenes in a pair are almost identical, but a spatial relation holds in one and fails in the other. We empirically validate that minimally contrastive examples can diagnose issues with current relation detection models as well as lead to sample-efficient training. Code and data are available at https://github.com/princeton-vl/Rel3D. 
    more » « less
  5. null (Ed.)
    Recent advances have spurred incredible progress in self-supervised pretraining for vision. We investigate what factors may play a role in the utility of these pretraining methods for practitioners. To do this, we evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks. We prepare a suite of synthetic data that enables an endless supply of annotated images as well as full control over dataset difficulty. Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows as well as how the utility changes as a function of the downstream task and the properties of the training data. We also find that linear evaluation does not correlate with finetuning performance. Code and data is available at \href{this https URL}{this http URL}. 
    more » « less
  6. null (Ed.)
    The ability to jointly understand the geometry of objects and plan actions for manipulating them is crucial for intelligent agents. We refer to this ability as geometric planning. Recently, many interactive environments have been proposed to evaluate intelligent agents on various skills, however, none of them cater to the needs of geometric planning. We present PackIt, a virtual environment to evaluate and potentially learn the ability to do geometric planning, where an agent needs to take a sequence of actions to pack a set of objects into a box with limited space. We also construct a set of challenging packing tasks using an evolutionary algorithm. Further, we study various baselines for the task that include model-free learning-based and heuristic-based methods, as well as search-based optimization methods that assume access to the model of the environment. Code and data are available at this https URL. 
    more » « less
  7. Struck-by accidents are potential safety concerns on construction sites and require a robust machine pose estimation. The development of deep learning methods has enhanced the human pose estimation that can be adapted for articulated machines. These methods require abundant dataset for training, which is challenging and time-consuming to obtain on-site. This paper proposes a fast data collection approach to build the dataset for excavator pose estimation. It uses two industrial robot arms as the excavator and the camera monopod to collect different excavator pose data. The 3D annotation can be obtained from the robot's embedded encoders. The 2D pose is annotated manually. For evaluation, 2,500 pose images were collected and trained with the stacked hourglass network. The results showed that the dataset is suitable for the excavator pose estimation network training in a controlled environment, which leads to the potential of the dataset augmenting with real construction site images. 
    more » « less
  8. Construction robots have drawn increased attention as a potential means of improving construction safety and productivity. However, it is still challenging to ensure safe human-robot collaboration on dynamic and unstructured construction workspaces. On construction sites, multiple entities dynamically collaborate with each other and the situational context between them evolves continually. Construction robots must therefore be equipped to visually understand the scene’s contexts (i.e., semantic relations to surrounding entities), thereby safely collaborating with humans, as a human vision system does. Toward this end, this study builds a unique deep neural network architecture and develops a construction-specialized model by experimenting multiple fine-tuning scenarios. Also, this study evaluates its performance on real construction operations data in order to examine its potential toward real-world applications. The results showed the promising performance of the tuned model: the recall@5 on training and validation dataset reached 92% and 67%, respectively. The proposed method, which supports construction co-robots with the holistic scene understanding, is expected to contribute to promoting safer human-robot collaboration in construction. 
    more » « less