NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Covariate Software Vulnerability Discovery Model to Support Cybersecurity Test & Evaluation (Practical Experience Report)

https://doi.org/10.1109/ISSRE55969.2022.00025

Sorrentino, Julia; Silva, Priscila; Baye, Gaspard; Kul, Gokhan; Fiondella, Lance (October 2022, IEEE International Symposium on Software Reliability Engineering)

Full Text Available
Quantitative assessment of machine learning reliability and resilience

https://doi.org/10.1111/risa.14666

Faddi, Zakaria; da_Mata, Karen; Silva, Priscila; Nagaraju, Vidhyashree; Ghosh, Susmita; Kul, Gokhan; Fiondella, Lance (July 2024, Risk Analysis)

Abstract Advances in machine learning (ML) have led to applications in safety‐critical domains, including security, defense, and healthcare. These ML models are confronted with dynamically changing and actively hostile conditions characteristic of real‐world applications, requiring systems incorporating ML to be reliable and resilient. Many studies propose techniques to improve the robustness of ML algorithms. However, fewer consider quantitative techniques to assess changes in the reliability and resilience of these systems over time. To address this gap, this study demonstrates how to collect relevant data during the training and testing of ML suitable for the application of software reliability, with and without covariates, and resilience models and the subsequent interpretation of these analyses. The proposed approach promotes quantitative risk assessment of ML technologies, providing the ability to track and predict degradation and improvement in the ML model performance and assisting ML and system engineers with an objective approach to compare the relative effectiveness of alternative training and testing methods. The approach is illustrated in the context of an image recognition model, which is subjected to two generative adversarial attacks and then iteratively retrained to improve the system's performance. Our results indicate that software reliability models incorporating covariates characterized the misclassification discovery process more accurately than models without covariates. Moreover, the resilience model based on multiple linear regression incorporating interactions between covariates tracks and predicts degradation and recovery of performance best. Thus, software reliability and resilience models offer rigorous quantitative assurance methods for ML‐enabled systems and processes.
more » « less
Similarity Measures for SQL Query Clustering

https://doi.org/10.1109/TKDE.2018.2831214

Kul, Gokhan; Luong, Duc Thanh; Xie, Ting; Chandola, Varun; Kennedy, Oliver; Upadhyaya, Shambhu (July 2018, IEEE Transactions on Knowledge and Data Engineering)

Database access logs are the starting point for many forms of database administration, from database performance tuning, to security auditing, to benchmark design, and many more. Unfortunately, query logs are also large and unwieldy, and it can be difficult for an analyst to extract broad patterns from the set of queries found therein. Clustering is a natural first step towards understanding the massive query logs. However, many clustering methods rely on the notion of pairwise similarity, which is challenging to compute for SQL queries, especially when the underlying data and database schema is unavailable. We investigate the problem of computing similarity between queries, relying only on the query structure. We conduct a rigorous evaluation of three query similarity heuristics proposed in the literature applied to query clustering on multiple query log datasets, representing different types of query workloads. To improve the accuracy of the three heuristics, we propose a generic feature engineering strategy, using classical query rewrites to standardize query structure. The proposed strategy results in a significant improvement in the performance of all three similarity heuristics.
more » « less
Full Text Available
Ettu: Analyzing Query Intents in Corporate Databases

https://doi.org/10.1145/2872518.2888608

Kul, Gokhan; Luong, Duc; Xie, Ting; Coonan, Patrick; Chandola, Varun; Kennedy, Oliver; Upadhyaya, Shambhu (January 2016, Proceedings of the 25th International Conference Companion on World Wide Web)

Insider threats to databases in the financial sector have become a very serious and pervasive security problem. This paper proposes a framework to analyze access patterns to databases by clustering SQL queries issued to the database. Our system Ettu works by grouping queries with other similarly structured queries. The small number of intent groups that result can then be efficiently labeled by human operators. We show how our system is designed and how the components of the system work. Our preliminary results show that our system accurately models user intent.
more » « less
Full Text Available

Search for: All records