Cross-site scripting (XSS) vulnerabilities are the most frequently reported web application vulnerability. As com- plex JavaScript applications become more widespread, DOM (Document Object Model) XSS vulnerabilities—a type of XSS vulnerability where the vulnerability is located in client-side JavaScript, rather than server-side code—are becoming more common. As the first contribution of this work, we empirically assess the impact of DOM XSS on the web using a browser with taint tracking embedded in the JavaScript engine. Building on the methodology used in a previous study that crawled popular websites, we collect a current dataset of potential DOM XSS vulnerabilities. We improve on the methodology for confirming XSS vulnerabilities, and using this improved methodology, we find 83% more vulnerabilities than previous methodology applied to the same dataset. As a second contribution, we identify the causes of and discuss how to prevent DOM XSS vulnerabilities. One example of our findings is that custom HTML templating designs—a design pattern that could prevent DOM XSS vulnerabilities analogous to parameterized SQL—can be buggy in practice, allowing DOM XSS attacks. As our third contribution, we evaluate the error rates of three static-analysis tools to detect DOM XSS vulnerabilities found with dynamic analysis techniques using in-the-wild examples. We find static-analysis tools to miss 90% of bugs found by our dynamic analysis, though some tools can have very few false positives and at the same time find vulnerabilities not found using the dynamic analysis.
more »
« less
A Transfer Learning Scheme for Time Series Forecasting Using Facebook Prophet
We describe our methodology to support time-series forecasts over spatial datasets using the Prophet library. Our approach underpinned by our transfer learning scheme ensures that model instances capture subtle regional variations and converge faster while using fewer resources. Our benchmarks demonstrate the suitability of our methodology.
more »
« less
- Award ID(s):
- 1931363
- PAR ID:
- 10352253
- Date Published:
- Journal Name:
- 2021 IEEE International Conference on Cluster Computing (CLUSTER)
- Page Range / eLocation ID:
- 809 to 810
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper we develop a methodology for analyzing transportation data at different levels of temporal and spatial granularity, and apply our methodology to the TLC Trip Record Dataset, made publicly available by the NYC Taxi & Limousine Commission. This data is naturally represented by a set of trajectories, annotated with time and with additional information such as passenger count and cost. We analyze TLC data to identify hotspots, which point to lack of convenient public transportation options, and popular routes, which motivate ride-sharing solutions or addition of a bus route. Our methodology is based on using an open-source system called Portal that supports an algebraic query language for analyzing evolving property graphs. Portal is implemented as an Apache Spark library and is inter-operable with other Spark libraries like SparkSQL, which we also use in our analysis.more » « less
-
We address the problem of generating high-quality question-answer pairs for educational materials. Previous work on this problem showed that using summaries as input improves the quality of question generation (QG) over original textbook text and that human-written summaries result in higher quality QG than automatic summaries. In this paper, a) we show that advances in Large Language Models (LLMs) are not yet sufficient to generate quality summaries for QG and b) we introduce a new methodology for enhancing bullet point student notes into fully fledged summaries and find that our methodology yields higher quality QG. We conducted a large-scale human annotation study of generated question-answer pairs for the evaluation of our methodology. In order to aid in future research, we release a new dataset of 9.2K human annotations of generated questions.more » « less
-
This paper presents a framework for embedding watermarks into DNN hardware accelerators. Unlike previous works that have looked at protecting the algorithmic intellectual properties of deep learning systems, this work proposes a methodology for defending deep learning hardware. Our methodology embeds modifications into the hardware accelerator's functional blocks that can be revealed with the rightful owner's key DNN and corresponding key sample, verifying the legitimate owner. We propose an Lp-box ADMM based algorithm to co-optimize watermark's hardware overhead and impact on the design's algorithmic functionality. We evaluate the performance of the hardware watermarking scheme on popular image classifier models using various accelerator designs. Our results demonstrate that the proposed methodology effectively embeds watermarks while preserving the original functionality of the hardware architecture. Specifically, we can successfully embed watermarks into the deep learning hardware and reliably execute a ResNet ImageNet classifiers with an accuracy degradation of only 0.009%more » « less
-
null (Ed.)e present a novel AI-based methodology that identifies phases of a host-level cyber attack simply from system call logs. System calls emanating from cyber attacks on hosts such as honey pots are often recorded in audit logs. Our methodology first involves efficiently loading, caching, processing, and querying system events contained in audit logs in support of computer forensics. Output of queries remains at the system call level and is difficult to process. The next step is to infer a sequence of abstracted actions, which we colloquially call a storyline, from the system calls given as observations to a latent-state probabilistic model. These storylines are then accurately identified with class labels using a learned classifier. We qualitatively and quantitatively evaluate methods and models for each step of the methodology using 114 different attack phases collected by logging the attacks of a red team on a server, on some likely benign sequences containing regular user activities, and on traces from a recent DARPA project. The resulting end-to-end system, which we call Cyberian, identifies the attack phases with a high level of accuracy illustrating the benefit that this machine learning-based methodology brings to security forensics.more » « less
An official website of the United States government

