skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text
Images and text in advertisements interact in complex, non-literal ways. The two channels are usually complementary, with each channel telling a different part of the story. Current approaches, such as image captioning methods, only examine literal, redundant relationships, where image and text show exactly the same content. To understand more complex relationships, we first collect a dataset of advertisement interpretations for whether the image and slogan in the same visual advertisement form a parallel (conveying the same message without literally saying the same thing) or non-parallel relationship, with the help of workers recruited on Amazon Mechanical Turk. We develop a variety of features that capture the creativity of images and the specificity or ambiguity of text, as well as methods that analyze the semantics within and across channels. We show that our method outperforms standard image-text alignment approaches on predicting the parallel/non-parallel relationship between image and text.  more » « less
Award ID(s):
1718262
PAR ID:
10104819
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
British Machine Vision Conference (BMVC)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Neural Machine Translation (NMT) models have been observed to produce poor translations when there are few/no parallel sentences to train the models. In the absence of parallel data, several approaches have turned to the use of images to learn translations. Since images of words, e.g., horse may be unchanged across languages, translations can be identified via images associated with words in different languages that have a high degree of visual similarity. However, translating via images has been shown to improve upon text-only models only marginally. To better understand when images are useful for translation, we study image translatability of words, which we define as the translatability of words via images, by measuring intra- and inter-cluster similarities of image representations of words that are translations of each other. We find that images of words are not always invariant across languages, and that language pairs with shared culture, meaning having either a common language family, ethnicity or religion, have improved image translatability (i.e., have more similar images for similar words) compared to its converse, regardless of their geographic proximity. In addition, in line with previous works that show images help more in translating concrete words, we found that concrete words have improved image translatability compared to abstract ones. 
    more » « less
  2. Parallel-laser photogrammetry is growing in popularity as a way to collect non-invasive body size data from wild mammals. Despite its many appeals, this method requires researchers to hand-measure (i) the pixel distance between the parallel laser spots (inter-laser distance) to produce a scale within the image, and (ii) the pixel distance between the study subject’s body landmarks (inter-landmark distance). This manual effort is time-consuming and introduces human error: a researcher measuring the same image twice will rarely return the same values both times (resulting in within-observer error), as is also the case when two researchers measure the same image (resulting in between-observer error). Here, we present two independent methods that automate the inter-laser distance measurement of parallel-laser photogrammetry images. One method uses machine learning and image processing techniques in Python, and the other uses image processing techniques in ImageJ. Both of these methods reduce labor and increase precision without sacrificing accuracy. We first introduce the workflow of the two methods. Then, using two parallel-laser datasets of wild mountain gorilla and wild savannah baboon images, we validate the precision of these two automated methods relative to manual measurements and to each other. We also estimate the reduction of variation in final body size estimates in centimeters when adopting these automated methods, as these methods have no human error. Finally, we highlight the strengths of each method, suggest best practices for adopting either of them, and propose future directions for the automation of parallel-laser photogrammetry data. 
    more » « less
  3. null (Ed.)
    We propose JECL, a method for clustering image-caption pairs by training parallel encoders with regularized clustering and alignment objectives, simultaneously learning both representations and cluster assignments. These image-caption pairs arise frequently in high-value applications where structured training data is expensive to produce, but free-text descriptions are common. JECL trains by minimizing the Kullback-Leibler divergence between the distribution of the images and text to that of a combined joint target distribution and optimizing the Jensen-Shannon divergence between the soft cluster assignments of the images and text. Regularizers are also applied to JECL to prevent trivial solutions. Experiments show that JECL outperforms both single-view and multi-view methods on large benchmark image-caption datasets, and is remarkably robust to missing captions and varying data sizes. 
    more » « less
  4. We explore the use of deep convolutional neural networks (CNNs) trained on overhead imagery of biomass sorghum to ascertain the relationship between single nucleotide polymorphisms (SNPs), or groups of related SNPs, and the phenotypes they control. We consider both CNNs trained explicitly on the classification task of predicting whether an image shows a plant with a reference or alternate version of various SNPs as well as CNNs trained to create data-driven features based on learning features so that images from the same plot are more similar than images from different plots, and then using the features this network learns for genetic marker classification. We characterize how efficient both approaches are at predicting the presence or absence of a genetic markers, and visualize what parts of the images are most important for those predictions. We find that the data-driven approaches give somewhat higher prediction performance, but have visualizations that are harder to interpret; and we give suggestions of potential future machine learning research and discuss the possibilities of using this approach to uncover unknown genotype × phenotype relationships. 
    more » « less
  5. Point-set classification for multiplexed pathology images aims to distinguish between the spatial configurations of cells within multiplexed immuno-fluorescence (mIF) images of different diseases. This problem is important towards aiding pathologists in diag- nosing diseases (e.g., chronic pancreatitis and pancreatic ductal adenocarcinoma). This problem is challenging because crucial spa- tial relationships are implicit in point sets and the non-uniform distribution of points makes the relationships complex. Manual morphologic or cell-count based methods, the conventional clinical approach for studying spatial patterns within mIF images, is limited by inter-observer variability. The current deep neural network methods for point sets (e.g., PointNet) are limited in learning the representation of implicit spatial relationships between categorical points. To overcome the limitation, we propose a new deep neural network (DNN) architecture, namely spatial-relationship aware neural networks (SRNet), with a novel design of representation learning layers. Experimental results with a University of Michigan mIF dataset show that the proposed method significantly outperforms the competing DNN methods, by 80%, reaching 95% accuracy. 
    more » « less