skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2115040

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Website privacy policies are often lengthy and intricate. Privacy assistants assist in simplifying policies and making them more accessible and user-friendly. The emergence of generative AI (genAI) offers new opportunities to build privacy assistants that can answer users’ questions about privacy policies. However, genAI’s reliability is a concern due to its potential for producing inaccurate information. This study introduces GenAIPABench, a benchmark for evaluating Generative AI-based Privacy Assistants (GenAIPAs). GenAIPABench includes: 1) A set of curated questions about privacy policies along with annotated answers for various organizations and regulations; 2) Metrics to assess the accuracy, relevance, and consistency of responses; and 3) A tool for generating prompts to introduce privacy policies and paraphrased variants of the curated questions. We evaluated 3 leading genAI systems—ChatGPT-4, Bard, and Bing AI—using GenAIPABench to gauge their effectiveness as GenAIPAs. Our results demonstrate significant promise in genAI capabilities in the privacy domain while also highlighting challenges in managing complex queries, ensuring consistency, and verifying source accuracy. 
    more » « less
  2. High-quality knowledge graphs (KGs) play a crucial role in many applications. However, KGs created by automated information extraction systems can suffer from erroneous extractions or be inconsistent with provenance/source text. It is important to identify and correct such problems. In this paper, we study leveraging the emergent reasoning capabilities of large language models (LLMs) to detect inconsistencies between extracted facts and their provenance. With a focus on ``open'' LLMs that can be run and trained locally, we find that few-shot approaches can yield an absolute performance gain of 2.5-3.4% over the state-of-the-art method with only 9% of training data. We examine the LLM architectures' effect and show that Decoder-Only models underperform Encoder-Decoder approaches. We also explore how model size impacts performance and counterintuitively find that larger models do not result in consistent performance gains. Our detailed analyses suggest that while LLMs can improve KG consistency, the different LLM models learn different aspects of KG consistency and are sensitive to the number of entities involved. 
    more » « less
  3. Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the European Union’s General Data Protection Regulation (GDPR), the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Synthetic data that retains the statistical properties of the original data but does not include actual individuals’ information provides a suitable alternative to sharing sensitive datasets. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats, such as re-identification and linkage issues, and secure data based on applicable regulatory policy rules. 
    more » « less
  4. A key challenge faced by small and medium-sized business entities is securely managing software updates and changes. Specifically, with rapidly evolving cybersecurity threats, changes/updates/patches to software systems are necessary to stay ahead of emerging threats and are often mandated by regulators or statutory authorities to counter these. However, security patches/updates require stress testing before they can be released in the production system. Stress testing in production environments is risky and poses security threats. Large businesses usually have a non-production environment where such changes can be made and tested before being released into production. Smaller businesses do not have such facilities. In this work, we show how “digital twins”, especially for a mix of IT and IoT environments, can be created on the cloud. These digital twins act as a non-production environment where changes can be applied, and the system can be securely tested before patch release. Additionally, the non-production digital twin can be used to collect system data and run stress tests on the environment, both manually and automatically. In this paper, we show how using a small sample of real data/interactions, Generative Artificial Intelligence (AI) models can be used to generate testing scenarios to check for points of failure. 
    more » « less
  5. Neurosymbolic artificial intelligence (AI) is an emerging and quickly advancing field that combines the subsymbolic strengths of (deep) neural networks and the explicit, symbolic knowledge contained in knowledge graphs (KGs) to enhance explainability and safety in AI systems. This approach addresses a key criticism of current generation systems, namely, their inability to generate human-understandable explanations for their outcomes and ensure safe behaviors, especially in scenarios with unknown unknowns (e.g., cybersecurity, privacy). The integration of neural networks, which excel at exploring complex data spaces, and symbolic KGs representing domain knowledge, allows AI systems to reason, learn, and generalize in a manner understandable to experts. This article describes how applications in cybersecurity and privacy, two of the most demanding domains in terms of the need for AI to be explainable while being highly accurate in complex environments, can benefit from neurosymbolic AI. 
    more » « less