skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 15, 2025

Title: A Report on Financial Regulations Challenge at COLING 2025
Financial large language models (FinLLMs) have been applied to various tasks in business, finance, accounting, and auditing. Complex financial regulations and standards are critical to financial services, which LLMs must comply with. However, FinLLMs’ performance in understanding and interpreting financial regulations has rarely been studied. Therefore, we organize the Regulations Challenge, a shared task at COLING FinNLP-FNP-LLMFinLegal2025. It encourages the academic community to explore the strengths and limitations of popular LLMs. We create 9 novel tasks and corresponding question sets. In this paper, we provide an overview of these tasks and summarize participants’ approaches and results. We aim to raise awareness of FinLLMs’ professional capability in financial regulations.  more » « less
Award ID(s):
2113906
PAR ID:
10600802
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
arXiv preprints
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. na (Ed.)
    Considering the difficulty of financial time series forecasting in financial aid, much of the current research focuses on leveraging big data analytics in financial services. One modern approach is to utilize ”predictive analysis”, analogous to forecasting financial trends. However, many of these time series data in Financial Aid (FA) pose unique challenges due to limited historical datasets and high dimensional financial information, which hinder the development of effective predictive models that balance accuracy with efficient runtime and memory usage. Pre-trained foundation models are employed to address these challenging tasks. We use state-of-the-art time series models including pre-trained LLMs (GPT-2 as the backbone), transformers, and linear models to demonstrate their ability to outperform traditional approaches, even with minimal (”few-shot”) or no fine-tuning (”zero-shot”). Our benchmark study, which includes financial aid with seven other time series tasks, shows the potential of using LLMs for scarce financial datasets. 
    more » « less
  2. Finetuned large language models (LLMs) have shown remarkable performance in financial tasks, such as sentiment analysis and information retrieval. Due to privacy concerns, finetuning and deploying financial LLMs (FinLLMs) locally are crucial for institutions and individuals. In this paper, we employ quantized low-rank adaptation (QLoRA) to finetune FinLLMs, which leverage low-rank structure and quantization technique to significantly reduce computational requirements while maintaining model performance. We also employ data and pipeline parallelism to enable local finetuning on commodity GPUs. Experiments on financial datasets validate the efficacy of our approach in yielding notable improvements over the base models. 
    more » « less
  3. Modern smart buildings and environments rely on sensory infrastructure to capture and process information about their inhabitants. However, it remains challenging to ensure that this infrastructure complies with privacy norms, preferences, and regulations; individuals occupying smart environments are often occupied with their tasks, lack awareness of the surrounding sensing mechanisms, and are non-technical experts. This problem is only exacerbated by the increasing number of sensors being deployed in these environments, as well as services seeking to use their sensory data. As a result, individuals face an unmanageable number of privacy decisions, preventing them from effectively behaving as their own “privacy firewall” for filtering and managing the multitude of personal information flows. These decisions often require qualitative reasoning over privacy regulations, understanding privacy-sensitive contexts, and applying various privacy transformations when necessary We propose the use of Large Language Models (LLMs), which have demonstrated qualitative reasoning over social/legal norms, sensory data, and program synthesis, all of which are necessary for privacy firewalls. We present PrivacyOracle, a prototype system for configuring privacy firewalls on behalf of users using LLMs, enabling automated privacy decisions in smart built environments. Our evaluation shows that PrivacyOracle achieves up to 
    more » « less
  4. Human-conducted rating tasks are resource-intensive and demand significant time and financial commitments. As Large Language Models (LLMs) like GPT emerge and exhibit prowess across various domains, their potential in automating such evaluation tasks becomes evident. In this research, we leveraged four prominent LLMs: GPT-4, GPT-3.5, Vicuna, and PaLM 2, to scrutinize their aptitude in evaluating teacher-authored mathematical explanations. We utilized a detailed rubric that encompassed accuracy, explanation clarity, the correctness of mathematical notation, and the efficacy of problem-solving strategies. During our investigation, we unexpectedly discerned the influence of HTML formatting on these evaluations. Notably, GPT-4 consistently favored explanations formatted with HTML, whereas the other models displayed mixed inclinations. When gauging Inter-Rater Reliability (IRR) among these models, only Vicuna and PaLM 2 demonstrated high IRR using the conventional Cohen’s Kappa metric for explanations formatted with HTML. Intriguingly, when a more relaxed version of the metric was applied, all model pairings showcased robust agreement. These revelations not only underscore the potential of LLMs in providing feedback on student-generated content but also illuminate new avenues, such as reinforcement learning, which can harness the consistent feedback from these models. 
    more » « less
  5. Johnson, Kristin N.; Reyes, Carla L. (Ed.)
    Privacy regulation has traditionally been the remit of consumer protection, and privacy harm is cast as a contractual harm arising from the interpersonal exchanges between data subjects and data collectors. This frames surveillance of people by companies as primarily a consumer harm. In this article, we argue that the modern economy of personal data is better understood as an extension of the financial system. The data economy intersects with capital markets in ways that may increase systemic and systematic financial risks. We contribute a new regulatory approach to privacy harms: as a source of risk correlated across households, firms and the economy as a whole. We consider adapting tools from macroprudential regulations designed to mitigate financial crises to the market for personal data. We identify both promises and pitfalls to viewing individual privacy through the lens of the financial system. 
    more » « less