skip to main content


Search for: All records

Creators/Authors contains: "Pan, Jiayi"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world. However, known as visual illusions, human's perception of reality isn't always faithful to the physical world. This raises a key question: do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality? To investigate this question, we build a dataset containing five types of visual illusions and formulate four tasks to examine visual illusions in state-of-the-art VLMs. Our findings have shown that although the overall alignment is low, larger models are closer to human perception and more susceptible to visual illusions. Our dataset and initial findings will promote a better understanding of visual illusions in humans and machines and provide a stepping stone for future computational models that can better align humans and machines in perceiving and communicating about the shared visual world. 
    more » « less
    Free, publicly-accessible full text available November 1, 2024
  2. The ability to connect language units to their referents in the physical world, referred to as grounding, is crucial to learning and understanding grounded meanings of words. While humans demonstrate fast mapping in new word learning, it remains unclear whether modern vision-language models can truly represent language with their grounded meanings, and how grounding may further bootstrap new word learning. To this end, we introduce Grounded Open Vocabulary Acquisition (GOVA) to examine grounding and bootstrapping in open-world language learning. As an initial attempt, we propose object-oriented BERT (OctoBERT), a novel visually-grounded language model by pre-training on image-text pairs highlighting grounding as an objective. Through extensive experiments and analysis, we demonstrate that OctoBERT is a more coherent and fast grounded word learner, and that the grounding ability acquired during pre-training helps the model to learn unseen words more rapidly and robustly. 
    more » « less
    Free, publicly-accessible full text available July 1, 2024
  3. Free, publicly-accessible full text available May 29, 2024
  4. Abstract. The eruption of the Hunga Tonga-Hunga Ha'apai volcano on 15 January 2022 provided a rare opportunity to understand global tsunamiimpacts of explosive volcanism and to evaluate future hazards, includingdangers from “volcanic meteotsunamis” (VMTs) induced by the atmosphericshock waves that followed the eruption. The propagation of the volcanic andmarine tsunamis was analyzed using globally distributed 1 min measurementsof air pressure and water level (WL) (from both tide gauges and deep-waterbuoys). The marine tsunami propagated primarily throughout the Pacific,reaching nearly 2 m at some locations, though most Pacific locationsrecorded maximums lower than 1 m. However, the VMT resulting from theatmospheric shock wave arrived before the marine tsunami and propagatedglobally, producing water level perturbations in the Indian Ocean, theMediterranean, and the Caribbean. The resulting water level response of manyPacific Rim gauges was amplified, likely related to wave interaction withbathymetry. The meteotsunami repeatedly boosted tsunami wave energy as itcircled the planet several times. In some locations, the VMT was amplifiedby as much as 35-fold relative to the inverse barometer due to near-Proudmanresonance and topographic effects. Thus, a meteotsunami from a largereruption (such as the Krakatoa eruption of 1883) could yield atmosphericpressure changes of 10 to 30 mb, yielding a 3–10 m near-field tsunami thatwould occur in advance of (usually) larger marine tsunami waves, posingadditional hazards to local populations. Present tsunami warning systems donot consider this threat. 
    more » « less