NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

Majumder, Sagnik; Al-Halah, Ziad; Grauman, Kristen (June 2024, IEEE Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

Ramakrishnan, Santhosh; Al-Halah, Ziad; Grauman, Kristen (January 2023, IEEE Conference on Computer Vision and Pattern Recognition)

Full Text Available
SpotEM: Efficient Video Search for Episodic Memory

Ramakrishnan, Santhosh; Al-Halah, Ziad; Grauman, Kristen (January 2023, International Conference on Machine Learning)

Full Text Available
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

https://doi.org/10.1109/CVPR52688.2022.01652

Al-Halah, Ziad; Ramakrishnan, Santhosh K.; Grauman, Kristen (June 2022, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments. However, this process is expensive; massive amounts of interactions are needed for the model to generalize well. Moreover, this process is repeated when-ever there is a change in the task type or the goal modality. We present a unified approach to visual navigation using a novel modular transfer learning model. Our model can effectively leverage its experience from one source task and apply it to multiple target tasks (e.g., ObjectNav, Room-Nav, Vi ewNav) with various goal modalities (e.g., image, sketch, audio, label). Furthermore, our model enables zero-shot experience learning, whereby it can solve the target tasks without receiving any task-specific interactive training. Our experiments on multiple photorealistic datasets and challenging tasks show that our approach learns faster, generalizes better, and outperforms SoTA models by a significant margin.
more » « less
Full Text Available
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation

Al-Halah, Ziad; Ramakrishnan, Santhosh; Grauman, Kristen (January 2022, IEEE Conference on Computer Vision and Pattern Recognition)

Full Text Available
Few-Shot Audio-Visual Learning of Environment Acoustics

Majumder, Sagnik; Chen, Changan; Al-Halah, Ziad; Grauman, Kristen (January 2022, Advances in neural information processing systems)

Full Text Available
Move2Hear: Active Audio-Visual Source Separation

https://doi.org/10.1109/ICCV48922.2021.00034

Majumder, Sagnik; Al-Halah, Ziad; Grauman, Kristen (October 2021, IEEE/CVF International Conference on Computer Vision (ICCV))

We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. The agent hears multiple audio sources simultaneously (e.g., a person speaking down the hall in a noisy household) and it must use its eyes and ears to automatically separate out the sounds originating from a target object within a limited time budget. Towards this goal, we introduce a reinforcement learning approach that trains movement policies controlling the agent’s camera and microphone placement over time, guided by the improvement in predicted audio separation quality. We demonstrate our approach in scenarios motivated by both augmented reality (system is already co-located with the target object) and mobile robotics (agent begins arbitrarily far from the target object). Using state-of-the-art realistic audio-visual simulations in 3D environments, we demonstrate our model’s ability to find minimal movement sequences with maximal payoff for audio source separation.
more » « less
Full Text Available
Semantic Audio-Visual Navigation

Chen, Changan; Al-Halah, Ziad; Grauman, Kristen (January 2021, IEEE Conference on Computer Vision and Pattern Recognition)

Full Text Available
A domain-agnostic approach for characterization of lifelong learning systems

https://doi.org/10.1016/j.neunet.2023.01.007

Baker, Megan M.; New, Alexander; Aguilar-Simon, Mario; Al-Halah, Ziad; Arnold, Sébastien M.R.; Ben-Iwhiwhu, Ese; Brna, Andrew P.; Brooks, Ethan; Brown, Ryan C.; Daniels, Zachary; et al (March 2023, Neural Networks)

Full Text Available

Search for: All records