Android mobile applications collect information in various ways to provide users with functionalities and services. An Android app's permission manifest and privacy policy are documents that provide users with guidelines about what information type is being collected. However, the information types mentioned in these files are often abstract and does not include the fine grained information types being collected through user input fields in applications. Existing approaches focus on API calls in the application code and are able to reveal what information types are being collected. However, they are unable to identify the information types based on direct user input as a major source of private information. In this paper, we propose to direct apply natural language processing approach to Android layout code to identify information types associated with input fields in applications. 
                        more » 
                        « less   
                    
                            
                            SPT-code: sequence-to-sequence pre-training for learning source code representations
                        
                    - Award ID(s):
- 2034508
- PAR ID:
- 10343376
- Date Published:
- Journal Name:
- 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)
- Page Range / eLocation ID:
- 2006 to 2018
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Lightweight syntactic analysis tools like Semgrep and Comby leverage the tree structure of code, making them more expressive than string and regex search. Unlike traditional language frameworks (e.g., ESLint) that analyze codebases via explicit syntax tree manipulations, these tools use query languages that closely resemble the source language. However, state-of-the-art matching techniques for these tools require queries to be complete and parsable snippets, which makes in-progress query specifications useless. We propose a new search architecture that relies only on tokenizing (not parsing) a query. We introduce a novel language and matching algorithm to support tree-aware wildcards on this architecture by building on tree automata. We also presentstsearch, a syntactic search tool leveraging our approach. In contrast to past work, our approach supports syntactic searcheven for previously unparsable queries.We show empirically that stsea rch can support all tokenizable queries, while still providing results comparable to Semgrep for existing queries. Our work offers evidence that lightweight syntactic code search can accept in-progress specifications, potentially improving support for interactive settings. CCS Concepts: •Software and its engineering→Formal language definitions;Software maintenance tools;•Information systems→Query representation;•Theory of computation→ Tree languages.more » « less
- 
            Codes and data for "Large language models design sequence-defined macromolecules via evolutionary optimization" Note this repository contains codes and data files for the manuscript. This is a snapshot of the repository, frozen at the time of submission. Codes: LLM codes, other algorithms, postprocessing, visualization Data files: prompts, models, embeddings, LLM responsesmore » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    