skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Transferring Legal Natural Language Inference Model from a US State to Another: What Makes It So Hard?
This study investigates whether a legal natural language inference (NLI) model trained on the data from one US state can be transferred to another state. We fine-tuned a pre-trained model on the task of evaluating the validity of legal will statements, once with the dataset containing the Tennessee wills and once with the dataset containing the Idaho wills. Each model’s performance on the in-domain setting and the out-of-domain setting are compared to see if the models can across the states. We found that the model trained on one US state can be mostly transferred to another state. However, it is clear that the model’s performance drops in the out-of-domain setting. The F1 scores of the Tennessee model and the Idaho model are 96.41 and 92.03 when predicting the data from the same state, but they drop to 66.32 and 81.60 when predicting the data from another state. Subsequent error analysis revealed that there are two major sources of errors. First, the model fails to recognize equivalent laws across states when there are stylistic differences between laws. Second, difference in statutory section numbering system between the states makes it difficult for the model to locate laws relevant to the cases being predicted on. This analysis provides insights on how the future NLI system can be improved. Also, our findings offer empirical support to legal experts advocating the standardization of legal documents.  more » « less
Award ID(s):
2217215
PAR ID:
10488667
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Association for Computational Linguistics
Date Published:
Journal Name:
Proceedings of the Natural Legal Language Processing Workshop 2023
Page Range / eLocation ID:
215 to 222
Format(s):
Medium: X
Location:
Singapore
Sponsoring Org:
National Science Foundation
More Like this
  1. This work introduces a natural language inference (NLI) dataset that focuses on the validity of statements in legal wills. This dataset is unique because: (a) each entailment decision requires three inputs: the statement from the will, the law, and the conditions that hold at the time of the testator’s death; and (b) the included texts are longer than the ones in current NLI datasets. We trained eight neural NLI models in this dataset. All the models achieve more than 80% macro F1 and accuracy, which indicates that neural approaches can handle this task reasonably well. However, group accuracy, a stricter evaluation measure that is calculated with a group of positive and negative examples generated from the same statement as a unit, is in mid 80s at best, which suggests that the models’ understanding of the task remains superficial. Further ablative analyses and explanation experiments indicate that all three text segments are used for prediction, but some decisions rely on semantically irrelevant tokens. This indicates that overfitting on these longer texts likely happens, and that additional research is required for this task to be solved. 
    more » « less
  2. null (Ed.)
    Natural language inference (NLI) is the task of detecting the existence of entailment or contradiction in a given sentence pair. Although NLI techniques could help numerous information retrieval tasks, most solutions for NLI are neural approaches whose lack of interpretability prohibits both straightforward integration and diagnosis for further improvement. We target the task of generating token-level explanations for NLI from a neural model. Many existing approaches for token-level explanation are either computationally costly or require additional annotations for training. In this article, we first introduce a novel method for training an explanation generator that does not require additional human labels. Instead, the explanation generator is trained with the objective of predicting how the model’s classification output will change when parts of the inputs are modified. Second, we propose to build an explanation generator in a multi-task learning setting along with the original NLI task so the explanation generator can utilize the model’s internal behavior. The experiment results suggest that the proposed explanation generator outperforms numerous strong baselines. In addition, our method does not require excessive additional computation at prediction time, which renders it an order of magnitude faster than the best-performing baseline. 
    more » « less
  3. Cire, A.A. (Ed.)
    Wildlife trafficking (WT), the illegal trade of wild fauna, flora, and their parts, directly threatens biodiversity and conservation of trafficked species, while also negatively impacting human health, national security, and economic development. Wildlife traffickers obfuscate their activities in plain sight, leveraging legal, large, and globally linked transportation networks. To complicate matters, defensive interdiction resources are limited, datasets are fragmented and rarely interoperable, and interventions like setting checkpoints place a burden on legal transportation. As a result, interpretable predictions of which routes wildlife traffickers are likely to take can help target defensive efforts and understand what wildlife traffickers may be considering when selecting routes. We propose a data-driven model for predicting trafficking routes on the global commercial flight network, a transportation network for which we have some historical seizure data and a specification of the possible routes that traffickers may take. While seizure data has limitations such as data bias and dependence on the deployed defensive resources, this is a first step towards predicting wildlife trafficking routes on real-world data. Our seizure data documents the planned commercial flight itinerary of trafficked and successfully interdicted wildlife. We aim to provide predictions of highly-trafficked flight paths for known origin-destination pairs with plausible explanations that illuminate how traffickers make decisions based on the presence of criminal actors, markets, and resilience systems. We propose a model that first predicts likelihoods of which commercial flights will be taken out of a given airport given input features, and then subsequently finds the highest-likelihood flight path from origin to destination using a differentiable shortest path solver, allowing us to automatically align our model’s loss with the overall goal of correctly predicting the full flight itinerary from a given source to a destination. We evaluate the proposed model’s predictions and interpretations both quantitatively and qualitatively, showing that the predicted paths are aligned with observed held-out seizures, and can be interpreted by policy-makers 
    more » « less
  4. null (Ed.)
    Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer whether a sentence entails another. However, the ability of NLI models to make pragmatic inferences remains understudied. We create an IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of 32K semi-automatically generated sentence pairs illustrating well-studied pragmatic inference types. We use IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models trained on MultiNLI (Williams et al., 2018) learn to make pragmatic inferences. Although MultiNLI appears to contain very few pairs illustrating these inference types, we find that BERT learns to draw pragmatic inferences. It reliably treats scalar implicatures triggered by “some” as entailments. For some presupposition triggers like “only”, BERT reliably recognizes the presupposition as an entailment, even when the trigger is embedded under an entailment canceling operator like negation. BOW and InferSent show weaker evidence of pragmatic reasoning. We conclude that NLI training encourages models to learn some, but not all, pragmatic inferences. 
    more » « less
  5. A natural language interface (NLI) to databases is an interface that translates a natural language question to a structured query that is executable by database management systems (DBMS). However, an NLI that is trained in the general domain is hard to apply in the spatial domain due to the idiosyncrasy and expressiveness of the spatial questions. Inspired by the machine comprehension model, we propose a spatial comprehension model that is able to recognize the meaning of spatial entities based on the semantics of the context. The spatial semantics learned from the spatial comprehension model is then injected to the natural language question to ease the burden of capturing the spatial-specific semantics. With our spatial comprehension model and information injection, our NLI for the spatial domain, named SpatialNLI, is able to capture the semantic structure of the question and translate it to the corresponding syntax of an executable query accurately. We also experimentally ascertain that SpatialNLI outperforms state-of-the-art methods. 
    more » « less