Search for: All records

Creators/Authors contains: "Joshi, Anupam"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PrivComp-KG: Leveraging KG and LLM for Compliance Verification

https://doi.org/10.1109/TPS-ISA62245.2024.00021

Garza, Leon; Elluri, Lavanya; Piplai, Aritran; Kotal, Anantaa; Gupta, Deepti; Joshi, Anupam (October 2024, IEEE)

Regulatory documents are complex and lengthy, making full compliance a challenging task for businesses. Similarly, privacy policies provided by vendors frequently fall short of the necessary legal standards due to insufficient detail. To address these issues, we propose a solution that leverages a Large Language Model (LLM) in combination with Semantic Web technology. This approach aims to clarify regulatory requirements and ensure that organizations’ privacy policies align with the relevant legal frameworks, ultimately simplifying the compliance process, reducing privacy risks, and improving efficiency. In this paper, we introduce a novel tool, the Privacy Policy Compliance Verification Knowledge Graph, referred to as PrivComp-KG. PrivComp-KG is designed to efficiently store and retrieve comprehensive information related to privacy policies, regulatory frameworks, and domain-specific legal knowledge. By utilizing LLM and Retrieval Augmented Generation (RAG), we can accurately identify relevant sections in privacy policies and map them to the corresponding regulatory rules. Our LLM-based retrieval system has demonstrated a high level of accuracy, achieving a correctness score of 0.9, outperforming other models in privacy policy analysis. The extracted information from individual privacy policies is then integrated into the PrivComp-KG. By combining this data with contextual domain knowledge and regulatory rules, PrivComp-KG can be queried to assess each vendor’s compliance with applicable regulations. We demonstrate the practical utility of PrivComp-KG by verifying the compliance of privacy policies across various organizations. This approach not only helps policy writers better understand legal requirements but also enables them to identify gaps in existing policies and update them in response to evolving regulations.
more » « less
Full Text Available
Privacy-Preserving Data Sharing in Agriculture: Enforcing Policy Rules for Secure and Confidential Data Synthesis

https://doi.org/10.1109/BigData59044.2023.10386276

Kotal, Anantaa; Elluri, Lavanya; Gupta, Deepti; Mandalapu, Varun; Joshi, Anupam (December 2023, IEEE International Conference on Big Data)

Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the European Union’s General Data Protection Regulation (GDPR), the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Synthetic data that retains the statistical properties of the original data but does not include actual individuals’ information provides a suitable alternative to sharing sensitive datasets. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats, such as re-identification and linkage issues, and secure data based on applicable regulatory policy rules.
more » « less
Full Text Available
Change Management using Generative Modeling on Digital Twins

https://doi.org/10.1109/ISI58743.2023.10297181

Das, Nilanjana; Kotal, Anantaa; Roseberry, Daniel; Joshi, Anupam (October 2023, IEEE International Conference on Intelligence and Security Informatics)

A key challenge faced by small and medium-sized business entities is securely managing software updates and changes. Specifically, with rapidly evolving cybersecurity threats, changes/updates/patches to software systems are necessary to stay ahead of emerging threats and are often mandated by regulators or statutory authorities to counter these. However, security patches/updates require stress testing before they can be released in the production system. Stress testing in production environments is risky and poses security threats. Large businesses usually have a non-production environment where such changes can be made and tested before being released into production. Smaller businesses do not have such facilities. In this work, we show how “digital twins”, especially for a mix of IT and IoT environments, can be created on the cloud. These digital twins act as a non-production environment where changes can be applied, and the system can be securely tested before patch release. Additionally, the non-production digital twin can be used to collect system data and run stress tests on the environment, both manually and automatically. In this paper, we show how using a small sample of real data/interactions, Generative Artificial Intelligence (AI) models can be used to generate testing scenarios to check for points of failure.
more » « less
Full Text Available
Knowledge Infusion in Privacy Preserving Data Generation

Kotal, Anantaa; Das, Nilanjana; Joshi, Anupam (August 2023, KDD Workshop on Knowledge-infused Learning, 29TH ACM SIGKDD)

Security monitoring is crucial for maintaining a strong IT infrastructure by protecting against emerging threats, identifying vulnerabilities, and detecting potential points of failure. It involves deploying advanced tools to continuously monitor networks, systems, and configurations. However, organizations face challenges in adapting modern techniques like Machine Learning (ML) due to privacy and security risks associated with sharing internal data. Compliance with regulations like GDPR further complicates data sharing. To promote external knowledge sharing, a secure and privacy-preserving method for organizations to share data is necessary. Privacy-preserving data generation involves creating new data that maintains privacy while preserving key characteristics and properties of the original data so that it is still useful in creating downstream models of attacks. Generative models, such as Generative Adversarial Networks (GAN), have been proposed as a solution for privacy preserving synthetic data generation. However, standard GANs are limited in their capabilities to generate realistic system data. System data have inherent constraints, e.g., the list of legitimate I.P. addresses and port numbers are limited, and protocols dictate a valid sequence of network events. Standard generative models do not account for such constraints and do not utilize domain knowledge in their generation process. Additionally, they are limited by the attribute values present in the training data. This poses a major privacy risk, as sensitive discrete attribute values are repeated by GANs. To address these limitations, we propose a novel model for Knowledge Infused Privacy Preserving Data Generation. A privacy preserving Generative Adversarial Network (GAN) is trained on system data for generating synthetic datasets that can replace original data for downstream tasks while protecting sensitive data. Knowledge from domain-specific knowledge graphs is used to guide the data generation process, check for the validity of generated values, and enrich the dataset by diversifying the values of attributes. We specifically demonstrate this model by synthesizing network data captured by the network capture tool, Wireshark. We establish that the synthetic dataset holds up to the constraints of the network-specific datasets and can replace the original dataset in downstream tasks.
more » « less
Full Text Available
Knowledge-Enhanced Neurosymbolic Artificial Intelligence for Cybersecurity and Privacy

https://doi.org/10.1109/MIC.2023.3299435

Piplai, Aritran; Kotal, Anantaa; Mohseni, Seyedreza; Gaur, Manas; Mittal, Sudip; Joshi, Anupam (September 2023, IEEE Internet Computing)

Neurosymbolic artificial intelligence (AI) is an emerging and quickly advancing field that combines the subsymbolic strengths of (deep) neural networks and the explicit, symbolic knowledge contained in knowledge graphs (KGs) to enhance explainability and safety in AI systems. This approach addresses a key criticism of current generation systems, namely, their inability to generate human-understandable explanations for their outcomes and ensure safe behaviors, especially in scenarios with unknown unknowns (e.g., cybersecurity, privacy). The integration of neural networks, which excel at exploring complex data spaces, and symbolic KGs representing domain knowledge, allows AI systems to reason, learn, and generalize in a manner understandable to experts. This article describes how applications in cybersecurity and privacy, two of the most demanding domains in terms of the need for AI to be explainable while being highly accurate in complex environments, can benefit from neurosymbolic AI.
more » « less
Full Text Available
Offline RL+CKG: A hybrid AI model for cybersecurity tasks

Piplai, Aritran; Joshi, Anupam; Finin, Tim (April 2023, Proceedings of the AAAI 2023 Spring Symposium on Challenges Requiring the Combination of Machine Learning and Knowledge Engineering (AAAI-MAKE 2023))
Martin, A; Hinkelmann, K; Fill, H.-G.; Gerber, A.; Lenat, D.; Stolle, R.; van Harmelen, F. (Ed.)
AI models for cybersecurity have to detect and defend against constantly evolving cyber threats. Much effort is spent building defenses for zero days and unseen variants of known cyber-attacks. Current AI models for cybersecurity struggle with these yet unseen threats due to the constantly evolving nature of threat vectors, vulnerabilities, and exploits. This paper shows that cybersecurity AI models will be improved and more general if we include semi-structured representations of background knowledge. This could include information about the software and systems, as well as information obtained from observing the behavior of malware samples captured and detonated in honeypots. We describe how we can transfer this knowledge into forms that the RL models can directly use for decision-making purposes.
more » « less
Full Text Available
Towards Semantic Exploration of Tables in Scientific Documents

Mulwad, Varish; Kumar, Vijay S; Williams, Jenny Weisenberg; Finin, Tim; Dixit, Sharad; Joshi, Anupam (May 2023, Workshop on Semantic Technologies for Scientific, Technical and Legal Data, Extended Semantic Web Conference)

Structured data artifacts such as tables are widely used in scientific literature to organize and concisely communicate important statistical information. Discovering relevant information in these tables remains a significant challenge owing to their structural heterogeneity, dense and often implicit semantics, and diffuse context. This paper describes how we leverage semantic technologies to enable technical experts to search and explore tabular data embedded within scientific documents. We present a system for the on-demand construction of knowledge graphs representing scientific tables (drawn from online scholarly articles hosted by PubMed Central) and for synthesizing tabular responses to semantic search requests against such graphs. We discuss key differentiators in our overall approach, including a two-stage semantic table interpretation that relies on an extensive structural and syntactic characterization of scientific tables and a prototype knowledge discovery engine that uses automatically inferred semantics of scientific tables to serve search requests by potentially fusing information from multiple tables on the fly. We evaluate our system on a real-world dataset of approximately 120,000 tables extracted from over 62,000 COVID-19-related scientific articles.
more » « less
Knowledge Guided Two-player Reinforcement Learning for Cyber Attacks and Defenses

https://doi.org/10.1109/icmla55696.2022.00213

Piplai, Aritran; Anoruo, Mike; Fasaye, Kayode; Joshi, Anupam; Finin, Tim; Ridley, Ahmad (December 2022, 21st IEEE International Conference on Machine Learning and Applications)

Cyber defense exercises are an important avenue to understand the technical capacity of organizations when faced with cyber-threats. Information derived from these exercises often leads to finding unseen methods to exploit vulnerabilities in an organization. These often lead to better defense mechanisms that can counter previously unknown exploits. With recent developments in cyber battle simulation platforms, we can generate a defense exercise environment and train reinforcement learning (RL) based autonomous agents to attack the system described by the simulated environment. In this paper, we describe a two-player game-based RL environment that simultaneously improves the performance of both the attacker and defender agents. We further accelerate the convergence of the RL agents by guiding them with expert knowledge from Cybersecurity Knowledge Graphs on attack and mitigation steps. We have implemented and integrated our proposed approaches into the CyberBattleSim system.
more » « less
Full Text Available
CAPD: a context-aware, policy-driven framework for secure and resilient IoBT operations

https://doi.org/10.1117/12.2618106

Chukkapalli, Sai Sree; Joshi, Anupam; Finin, Tim; Erbacher, Robert F. (June 2022, Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications IV, SPIE Defense + Commercial Sensing)
Pham, Tien; Solomon, Latasha; Hohil, Myron E. (Ed.)
The Internet of Battlefield Things (IoBT) will advance the operational effectiveness of infantry units. However, this requires autonomous assets such as sensors, drones, combat equipment, and uncrewed vehicles to collaborate, securely share information, and be resilient to adversary attacks in contested multi-domain operations. CAPD addresses this problem by providing a context-aware, policy-driven framework supporting data and knowledge exchange among autonomous entities in a battlespace. We propose an IoBT ontology that facilitates controlled information sharing to enable semantic interoperability between systems. Its key contributions include providing a knowledge graph with a shared semantic schema, integration with background knowledge, efficient mechanisms for enforcing data consistency and drawing inferences, and supporting attribute-based access control. The sensors in the IoBT provide data that create populated knowledge graphs based on the ontology. This paper describes using CAPD to detect and mitigate adversary actions. CAPD enables situational awareness using reasoning over the sensed data and SPARQL queries. For example, adversaries can cause sensor failure or hijacking and disrupt the tactical networks to degrade video surveillance. In such instances, CAPD uses an ontology-based reasoner to see how alternative approaches can still support the mission. Depending on bandwidth availability, the reasoner initiates the creation of a reduced frame rate grayscale video by active transcoding or transmits only still images. This ability to reason over the mission sensed environment, and attack context permits the autonomous IoBT system to exhibit resilience in contested conditions.
more » « less
Full Text Available
Combating Fake Cyber Threat Intelligence using Provenance in Cybersecurity Knowledge Graphs

https://doi.org/10.1109/BigData52589.2021.9671867

Mitra, Shaswata; Piplai, Aritran; Mittal, Sudip; Joshi, Anupam (December 2021, 2021 IEEE International Conference on Big Data (Big Data))

Today there is a significant amount of fake cybersecurity related intelligence on the internet. To filter out such information, we build a system to capture the provenance information and represent it along with the captured Cyber Threat Intelligence (CTI). In the cybersecurity domain, such CTI is stored in Cybersecurity Knowledge Graphs (CKG). We enhance the exiting CKG model to incorporate intelligence provenance and fuse provenance graphs with CKG. This process includes modifying traditional approaches to entity and relation extraction. CTI data is considered vital in securing our cyberspace. Knowledge graphs containing CTI information along with its provenance can provide expertise to dependent Artificial Intelligence (AI) systems and human analysts.
more » « less
Full Text Available

« Prev Next »