Many mobile applications (i.e., apps) include UI widgets to use or collect users’ sensitive data. Thus, to identify suspicious sensitive data usage such as UI-permission mis- match, it is crucial to understand the intentions of UI widgets. However, many UI widgets leverage icons of specific shapes (object icons) and icons embedded with text (text icons) to express their intentions, posing challenges for existing detection techniques that analyze only textual data to identify sensitive UI widgets. In this work, we propose a novel app analysis frame- work, ICONINTENT, that synergistically combines program analysis and icon classification to identify sensitive UI widgets in Android apps. ICONINTENT automatically associates UI widgets and icons via static analysis on app’s UI layout files and code, and then adapts computer vision techniques to classify the associated icons into eight categories of sensitive data. Our evaluations of ICONINTENT on 150 apps from Google Play show that ICONINTENT can detect 248 sensitive UI widgets in 97 apps, achieving a precision of 82.4%. When combined with SUPOR, the state-of-the-art sensitive UI widget identification technique based on text analysis, SUPOR +ICONINTENT can detect 487 sensitive UI widgets (101.2% improvement over SU- POR only), and reduces suspicious permissions to be inspected by 50.7% (129.4% improvement over SUPOR only).
more »
« less
Parsing and Summarizing Infographics with Synthetically Trained Icon Detection
Widely used in news, business, and educational media, infographics are handcrafted to effectively communicate messages about complex and often abstract topics including `ways to conserve the environment' and `coronavirus prevention'. The computational understanding of infographics required for future applications like automatic captioning, summarization, search, and question-answering, will depend on being able to parse the visual and textual elements contained within. However, being composed of stylistically and semantically diverse visual and textual elements, infographics pose challenges for current A.I. systems. While automatic text extraction works reasonably well on infographics, standard object detection algorithms fail to identify the stand-alone visual elements in infographics that we refer to as `icons'. In this paper, we propose a novel approach to train an object detector using synthetically-generated data, and show that it succeeds at generalizing to detecting icons within in-the-wild infographics. We further pair our icon detection approach with an icon classifier and a state-of-the-art text detector to demonstrate three demo applications: topic prediction, multi-modal summarization, and multi-modal search. Parsing the visual and textual elements within infographics provides us with the first steps towards automatic infographic understanding.
more »
« less
- Award ID(s):
- 1901030
- PAR ID:
- 10300546
- Date Published:
- Journal Name:
- 2021 IEEE 14th Pacific Visualization Symposium (PacificVis)
- Page Range / eLocation ID:
- 31 to 40
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Increasingly, icons are being proposed to concisely convey privacy-related information and choices to users. However, complex privacy concepts can be difficult to communicate. We investigate which icons effectively signal the presence of privacy choices. In a series of user studies, we designed and evaluated icons and accompanying textual descriptions (link texts) conveying choice, opting-out, and sale of personal information — the latter an opt-out mandated by the California Consumer Privacy Act (CCPA). We identified icon-link text pairings that conveyed the presence of privacy choices without creating misconceptions, with a blue stylized toggle icon paired with “Privacy Options” performing best. The two CCPA-mandated link texts (“Do Not Sell My Personal Information” and “Do Not Sell My Info”) accurately communicated the presence of do-not-sell opt-outs with most icons. Our results provide insights for the design of privacy choice indicators and highlight the necessity of incorporating user testing into policy making.more » « less
-
null (Ed.)Increasingly, icons are being proposed to concisely convey privacyrelated information and choices to users. However, complex privacy concepts can be difcult to communicate. We investigate which icons efectively signal the presence of privacy choices. In a series of user studies, we designed and evaluated icons and accompanying textual descriptions (link texts) conveying choice, opting-out, and sale of personal information — the latter an opt-out mandated by the California Consumer Privacy Act (CCPA). We identifed icon-link text pairings that conveyed the presence of privacy choices without creating misconceptions, with a blue stylized toggle icon paired with “Privacy Options” performing best. The two CCPA-mandated link texts (“Do Not Sell My Personal Information” and “Do Not Sell My Info”) accurately communicated the presence of do-notsell opt-outs with most icons. Our results provide insights for the design of privacy choice indicators and highlight the necessity of incorporating user testing into policy making.more » « less
-
Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multi-modal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multi- modal research.more » « less
-
null (Ed.)We present a multimodal deep learning framework that can generate summarization text supporting the main idea of an information graphic for presentation to a person who is blind or visually impaired. The framework utilizes the visual, textual, positional, and size characteristics extracted from the image to create the summary. Different and complimentary neural architectures are optimized for each task using crowdsourced training data. From our quantitative experiments and results, we explain the reasoning behind our framework and show the effectiveness of our models. Our qualitative results showcase text generated from our framework and show that Mechanical Turk participants favor them to other automatic and human generated summarizations. We describe the design and results of an experiment to evaluate the utility of our system for people who have visual impairments in the context of understanding Twitter Tweets containing line graphs.more » « less