skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predicting Video Affect via Induced Affection in the Wild
Curating large and high quality datasets for studying affect is a costly and time consuming process, especially when the labels are continuous. In this paper, we examine the potential to use unlabeled public reactions in the form of textual comments to aid in classifying video affect. We examine two popular datasets used for affect recognition and mine public reactions for these videos. We learn a representation of these reactions by using the video ratings as a weakly supervised signal. We show that our model can learn a fine-grained prediction of comment affect when given a video alone. Furthermore, we demonstrate how predicting the affective properties of a comment can be a potentially useful modality to use in multimodal affect modeling.  more » « less
Award ID(s):
1845587
PAR ID:
10332229
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
Page Range / eLocation ID:
442 to 451
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Language-guided smart systems can help to design next-generation human-machine interactive applications. The dense text description is one of the research areas where systems learn the semantic knowledge and visual features of each video frame and map them to describe the video's most relevant subjects and events. In this paper, we consider untrimmed sports videos as our case study. Generating dense descriptions in the sports domain to supplement journalistic works without relying on commentators and experts requires more investigation. Motivated by this, we propose an end-to-end automated text-generator, SpecTextor, that learns the semantic features from untrimmed videos of sports games and generates associated descriptive texts. The proposed approach considers the video as a sequence of frames and sequentially generates words. After splitting videos into frames, we use a pre-trained VGG-16 model for feature extraction and encoding the video frames. With these encoded frames, we posit a Long Short-Term Memory (LSTM) based attention-decoder pipeline that leverages soft-attention mechanism to map the semantic features with relevant textual descriptions to generate the explanation of the game. Because developing a comprehensive description of the game warrants training on a set of dense time-stamped captions, we leverage two available public datasets: ActivityNet Captions and Microsoft Video Description. In addition, we utilized two different decoding algorithms: beam search and greedy search and computed two evaluation metrics: BLEU and METEOR scores. 
    more » « less
  2. null (Ed.)
    The digital divide—and, in particular, the homework gap— have been exacerbated by the COVID-19 pandemic, laying bare not only the inequities in broadband Internet access but also how these inequities ultimately affect citizens’ ability to learn, work, and play. Addressing these inequities ultimately requires having holistic, “full stack” data on the nature of the gaps in infrastructure and uptake—from the physical infrastructure (e.g., fiber, cable) to speed and application performance to affordability and neighborhood effects that ultimately affect whether a technology is adopted. This paper surveys how various existing datasets can (and cannot) shed light on these gaps, the limitations of these datasets, what we know from existing data about how the Internet responded to shifts in traffic during COVID-19, and—importantly for the future—what data we need to better understand these problems moving forward and how the research community, policymakers, and the public might gain access to various data. Keywords: digital divide,iInternet, mapping, performance 
    more » « less
  3. null (Ed.)
    Neural methods are state-of-the-art for urban prediction problems such as transportation resource demand, accident risk, crowd mobility, and public safety. Model performance can be improved by integrating exogenous features from open data repositories (e.g., weather, housing prices, traffic, etc.), but these uncurated sources are often too noisy, incomplete, and biased to use directly. We propose to learn integrated representations, called EquiTensors, from heterogeneous datasets that can be reused across a variety of tasks. We align datasets to a consistent spatio-temporal domain, then describe an unsupervised model based on convolutional denoising autoencoders to learn shared representations. We extend this core integrative model with adaptive weighting to prevent certain datasets from dominating the signal. To combat discriminatory bias, we use adversarial learning to remove correlations with a sensitive attribute (e.g., race or income). Experiments with 23 input datasets and 4 real applications show that EquiTensors could help mitigate the effects of the sensitive information embodied in the biased data. Meanwhile, applications using EquiTensors outperform models that ignore exogenous features and are competitive with "oracle" models that use hand-selected datasets. 
    more » « less
  4. Discourse involves two perspectives: a person’s intention in making an utterance and others’ perception of that utterance. The misalignment between these perspectives can lead to undesirable outcomes, such as misunderstandings, low productivity and even overt strife. In this work, we present a computational framework for exploring and comparing both perspectives in online public discussions. We combine logged data about public comments on Facebook with a survey of over 16,000 people about their intentions in writing these comments or about their perceptions of comments that others had written. Unlike previous studies of online discussions that have largely relied on third-party labels to quantify properties such as sentiment and subjectivity, our approach also directly captures what the speakers actually intended when writing their comments. In particular, our analysis focuses on judgments of whether a comment is stating a fact or an opinion, since these concepts were shown to be often confused. We show that intentions and perceptions diverge in consequential ways. People are more likely to perceive opinions than to intend them, and linguistic cues that signal how an utterance is intended can differ from those that signal how it will be perceived. Further, this misalignment between intentions and perceptions can be linked to the future health of a conversation: when a comment whose author intended to share a fact is misperceived as sharing an opinion, the subsequent conversation is more likely to derail into uncivil behavior than when the comment is perceived as intended. Altogether, these findings may inform the design of discussion platforms that better promote positive interactions. 
    more » « less
  5. Computer vision is a data hungry field. Researchers and practitioners who work on human-centric computer vision, like facial recognition, emphasize the necessity of vast amounts of data for more robust and accurate models. Humans are seen as a data resource which can be converted into datasets. The necessity of data has led to a proliferation of gathering data from easily available sources, including public data from the web. Yet the use of public data has significant ethical implications for the human subjects in datasets. We bridge academic conversations on the ethics of using publicly obtained data with concerns about privacy and agency associated with computer vision applications. Specifically, we examine how practices of dataset construction from public data-not only from websites, but also from public settings and public records-make it extremely difficult for human subjects to trace their images as they are collected, converted into datasets, distributed for use, and, in some cases, retracted. We discuss two interconnected barriers current data practices present to providing an ethics of traceability for human subjects: awareness and control. We conclude with key intervention points for enabling traceability for data subjects. We also offer suggestions for an improved ethics of traceability to enable both awareness and control for individual subjects in dataset curation practices. 
    more » « less