EvidenceBot: A Privacy-Preserving, Customizable RAG-Based Tool for Enhancing Large Language Model Interactions

Khan, Nafiz Imtiaz (ORCID:0000000301496012); Filkov, Vladimir (ORCID:0000000304924393)

doi:10.1145/3696630.3728607

Large Language Models (LLMs) have become pivotal in reshaping the world by enabling advanced natural language processing tasks such as document analysis, content generation, and conversational assistance. Their ability to process and generate human-like text has unlocked unprecedented opportunities across different domains such as healthcare, education, finance, and more. However, commercial LLM platforms face several limitations, including data privacy concerns, context size restrictions, lack of parameter configurability, and limited evaluation capabilities. These shortcomings hinder their effectiveness, particularly in scenarios involving sensitive information, large-scale document analysis, or the need for customized output. This underscores the need for a tool that combines the power of LLMs with enhanced privacy, flexibility, and usability. To address these challenges, we present EvidenceBot, a local, Retrieval-Augmented Generation (RAG)-based solution designed to overcome the limitations of commercial LLM platforms. Evidence-Bot enables secure and efficient processing of large document sets through its privacy-preserving RAG pipeline, which extracts and appends only the most relevant text chunks as context for queries. The tool allows users to experiment with hyperparameter configurations, optimizing model responses for specific tasks, and includes an evaluation module to assess LLM performance against ground truths using semantic and similarity-based metrics. By offering enhanced privacy, customization, and evaluation capabilities, EvidenceBot bridges critical gaps in the LLM ecosystem, providing a versatile resource for individuals and organizations seeking to leverage LLMs effectively.

More Like this