News media structure their reporting of events or issues using certain perspectives. When describing an incident involving gun violence, for example, some journalists may focus on mental health or gun regulation, while others may emphasize the discussion of gun rights. Such perspectives are called “frames” in communication research. We study, for the first time, the value of combining lead images and their contextual information with text to identify the frame of a given news article. We observe that using multiple modes of information(article- and image-derived features) improves prediction of news frames over any single mode of information when the images are relevant to the frames of the headlines. We also observe that frame image relevance is related to the ease of conveying frames via images, which we call frame concreteness. Additionally, we release the first multimodal news framing dataset related to gun violence in the U.S., curated and annotated by communication researchers. The dataset will allow researchers to further examine the use of multiple information modalities for studying media framing.
more »
« less
Automated Filtering Big Visual Data from Drones for Enhanced Visual Analytics in Construction
Nowadays, to assess and document construction and building performance, large amount of visual data are captured and stored through camera equipped platforms such as wearable cameras, unmanned aerial/ground vehicles, and smart phones. However, due to the nonstop fashion in recording such visual data, not all of the frames in captured consecutive footages are intentionally taken, and thus not every frame is worthy of being processed for construction and building performance analysis. Since many frames will simply have non-construction related contents, before processing the visual data, the content of each recorded frame should be manually investigated depending on the association with the goal of the visual assessment. To address such challenges, this paper aims to automatically filter construction big visual data that requires no human annotations. To overcome challenges in pure discriminative approach using manually labeled images, we construct a generative model with unlabeled visual dataset, and use it to find construction-related frames in big visual dataset from jobsites. First, through composition-based snap point detection together with domain adaptation, we filter and remove most of accidently recorded frames in the footage. Then, we create discriminative classifier trained with visual data from jobsites to eliminate non-construction related images. To evaluate the reliability of the proposed method, we have obtained the ground truth based on human judgment for each photo in our testing dataset. Despite learning without any explicit labels, the proposed method shows a reasonable practical range of accuracy, which generally outperforms prior snap point detection. Through the case studies, the fidelity of the algorithm is discussed in detail. By being able to focus on selective visual data, practitioners will spend less time on browsing large amounts of visual data; rather spend more time on looking at how to leverage the visual data to facilitate decision-makings in built environments.
more »
« less
- PAR ID:
- 10081679
- Date Published:
- Journal Name:
- ASCE Construction Research Congress 2018
- Page Range / eLocation ID:
- 398 to 409
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)High spatiotemporal resolution can offer high precision for vision applications, which is particularly useful to capture the nuances of visual features, such as for augmented reality. Unfortunately, capturing and processing high spatiotemporal visual frames generates energy-expensive memory traffic. On the other hand, low resolution frames can reduce pixel memory throughput, but reduce also the opportunities of high-precision visual sensing. However, our intuition is that not all parts of the scene need to be captured at a uniform resolution. Selectively and opportunistically reducing resolution for different regions of image frames can yield high-precision visual computing at energy-efficient memory data rates. To this end, we develop a visual sensing pipeline architecture that flexibly allows application developers to dynamically adapt the spatial resolution and update rate of different “rhythmic pixel regions” in the scene. We develop a system that ingests pixel streams from commercial image sensors with their standard raster-scan pixel read-out patterns, but only encodes relevant pixels prior to storing them in the memory. We also present streaming hardware to decode the stored rhythmic pixel region stream into traditional frame-based representations to feed into standard computer vision algorithms. We integrate our encoding and decoding hardware modules into existing video pipelines. On top of this, we develop runtime support allowing developers to flexibly specify the region labels. Evaluating our system on a Xilinx FPGA platform over three vision workloads shows 43 − 64% reduction in interface traffic and memory footprint, while providing controllable task accuracy.more » « less
-
Wildfires are one of the deadliest and dangerous natural disasters in the world. Wildfires burn millions of forests and they put many lives of humans and animals in danger. Predicting fire behavior can help firefighters to have better fire management and scheduling for future incidents and also it reduces the life risks for the firefighters. Recent advance in aerial images shows that they can be beneficial in wildfire studies. Among different methods and technologies for aerial images, Unmanned Aerial Vehicles (UAVs) and drones are beneficial to collect information regarding the fire. This study provides an aerial imagery dataset using drones during a prescribed pile fire in Northern Arizona, USA. This dataset consists of different repositories including raw aerial videos recorded by drones' cameras and also raw heatmap footage recorded by an infrared thermal camera. To help researchers, two well-known studies; fire classification and fire segmentation are defined based on the dataset. For approaches such as Neural Networks (NNs) and fire classification, 39,375 frames are labeled ("Fire" vs "Non-Fire") for the training phase. Also, another 8,617 frames are labeled for the test data. 2,003 frames are considered for the fire segmentation and regarding that, 2,003 masks are generated for the purpose of Ground Truth data with pixel-wise annotation.more » « less
-
null (Ed.)First-person-view videos of hands interacting with tools are widely used in the computer vision industry. However, creating a dataset with pixel-wise segmentation of hands is challenging since most videos are captured with fingertips occluded by the hand dorsum and grasped tools. Current methods often rely on manually segmenting hands to create annotations, which is inefficient and costly. To relieve this challenge, we create a method that utilizes thermal information of hands for efficient pixel-wise hand segmentation to create a multi-modal activity video dataset. Our method is not affected by fingertip and joint occlusions and does not require hand pose ground truth. We show our method to be 24 times faster than the traditional polygon labeling method while maintaining high quality. With the segmentation method, we propose a multi-modal hand activity video dataset with 790 sequences and 401,765 frames of "hands using tools" videos captured by thermal and RGB-D cameras with hand segmentation data. We analyze multiple models for hand segmentation performance and benchmark four segmentation networks. We show that our multi-modal dataset with fusing Long-Wave InfraRed (LWIR) and RGB-D frames achieves 5% better hand IoU performance than using RGB frames.more » « less
-
Visual qualitative methodologies enhance the richness of data and makes participants experts on the object of interest. Visual data brings another dimension to the evaluation process, besides surveys and interviews, as well as depth and breadth to participants reactions to specific program activities. Visual data consists of images, such as photos, drawings, artwork, among others. Exploring a different approach to assess impact of an educational activity, an exercise was designed where participants were asked to take photos to document a site visit to an area impacted by a swarm of earthquakes in 2019. The exercise required taking five photos of either objects, persons, scenery, structures, or any other thing that captured their attention during the visit and write a reflective essay to answer three questions: 1) How do these photos represent your site visit experience? 2) Based on the content of your photos, write about what you learned, discovered, new knowledge acquired, emotions, changes in your way of thinking, etc., and 3) What did you learned or discovered from doing this exercise? Twenty-two undergraduate engineering and architecture students from the RISE-UP Program, enrolled in a curricular sequence in design and construction of resilient and sustainable structures, completed the exercise. Analyses of obtained data includes frequency of captured images and content analysis of reflective essays to determine instances where each of the four proposed learning objectives was present. Results show that across essays, 32% of the essays include text that demonstrate impact related to the first objective, 59% for the second, 73% for the third, and 86% for the fourth objective. Forty-five percent of essays included text considered relevant but not related to an objective. Personal, social, and career insights were categorized as unintended results. Photos taken by students represent what they considered relevant during the visit and also evidence the achievement of the proposed learning objectives. In general, three mayor categories emerged from the content in photos: 1) photos related to the design and construction of the structure and specific damage observed from earthquakes; 2) photos of classmates, professors, and group activities; and 3) other photos that do not share a theme. Both photos and essays demonstrate that the learning objectives were successfully achieved and encourage the use of visual data as an alternative for the evaluation of educational activities.more » « less