skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 16 until 2:00 AM ET on Saturday, May 17 due to maintenance. We apologize for the inconvenience.


Search for: All records

Award ID contains: 2023495

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Latent factor models are widely used in the social and behavioural sciences as scaling tools to map discrete multivariate outcomes into low-dimensional, continuous scales. In political science, dynamic versions of classical factor models have been widely used to study the evolution of justices’ preferences in multi-judge courts. In this paper, we discuss a new dynamic factor model that relies on a latent circular space that can accommodate voting behaviours in which justices commonly understood to be on opposite ends of the ideological spectrum vote together on a substantial number of otherwise closely divided opinions. We apply this model to data on nonunanimous decisions made by the US Supreme Court between 1937 and 2021, and show that for most of this period, voting patterns can be better described by a circular latent space. 
    more » « less
  2. Abstract Statistical relational learning (SRL) frameworks are effective at defining probabilistic models over complex relational data. They often use weighted first-order logical rules where the weights of the rules govern probabilistic interactions and are usually learned from data. Existing weight learning approaches typically attempt to learn a set of weights that maximizes some function of data likelihood; however, this does not always translate to optimal performance on a desired domain metric, such as accuracy or F1 score. In this paper, we introduce a taxonomy of search-based weight learning approaches for SRL frameworks that directly optimize weights on a chosen domain performance metric. To effectively apply these search-based approaches, we introduce a novel projection, referred to as scaled space (SS), that is an accurate representation of the true weight space. We show that SS removes redundancies in the weight space and captures the semantic distance between the possible weight configurations. In order to improve the efficiency of search, we also introduce an approximation of SS which simplifies the process of sampling weight configurations. We demonstrate these approaches on two state-of-the-art SRL frameworks: Markov logic networks and probabilistic soft logic. We perform empirical evaluation on five real-world datasets and evaluate them each on two different metrics. We also compare them against four other weight learning approaches. Our experimental results show that our proposed search-based approaches outperform likelihood-based approaches and yield up to a 10% improvement across a variety of performance metrics. Further, we perform an extensive evaluation to measure the robustness of our approach to different initializations and hyperparameters. The results indicate that our approach is both accurate and robust. 
    more » « less
  3. Abstract Statistical relational learning (SRL) and graph neural networks (GNNs) are two powerful approaches for learning and inference over graphs. Typically, they are evaluated in terms of simple metrics such as accuracy over individual node labels. Complexaggregate graph queries(AGQ) involving multiple nodes, edges, and labels are common in the graph mining community and are used to estimate important network properties such as social cohesion and influence. While graph mining algorithms support AGQs, they typically do not take into account uncertainty, or when they do, make simplifying assumptions and do not build full probabilistic models. In this paper, we examine the performance of SRL and GNNs on AGQs over graphs with partially observed node labels. We show that, not surprisingly, inferring the unobserved node labels as a first step and then evaluating the queries on the fully observed graph can lead to sub-optimal estimates, and that a better approach is to compute these queries as an expectation under the joint distribution. We propose a sampling framework to tractably compute the expected values of AGQs. Motivated by the analysis of subgroup cohesion in social networks, we propose a suite of AGQs that estimate the community structure in graphs. In our empirical evaluation, we show that by estimating these queries as an expectation, SRL-based approaches yield up to a 50-fold reduction in average error when compared to existing GNN-based approaches. 
    more » « less
  4. We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems. We demonstrate our framework with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additionally, we develop a dual block coordinate descent algorithm for the new formulation that naturally exploits warm-starts. This leads to over $$100 \times$$ learning runtime improvements over the current best NeuPSL inference method. Finally, we provide extensive empirical evaluations across $$8$$ datasets covering a range of tasks and demonstrate our learning framework achieves up to a $16$ 
    more » « less
    Free, publicly-accessible full text available July 1, 2025
  5. Free, publicly-accessible full text available July 1, 2025
  6. Successful reproduction is critical to the growth and persistence of marine fish populations, yet how changes in the environment influence reproduction remains largely unknown. We explored how shifting ocean conditions influenced larval production in four species of long-lived, live-bearing rockfish (Sebastes spp.) in the California Current. Brood fecundity, body size, and environmental information were analyzed from the mid-1980s through 2020. Interannual variation in brood fecundity was greater than 50% in the single-brooding yellowtail rockfish (S. flavidus) and widow rockfish (S. entomelas). Brood fecundity varied less in chilipepper (S. goodei) and bocaccio (S. paucispinis), two species capable of multiple broods per year. In these two species, interannual fecundity variability is more likely to depend on the number of broods produced than on brood size alone. In all four species, brood fecundity was positively correlated with maternal length and body condition. Variable ocean conditions influenced the strength of maternal size effects by year. These results provide evidence for reproductive plasticity and environmental effects on fecundity, with implications for changes in population reproductive potential with climate change. 
    more » « less
    Free, publicly-accessible full text available June 1, 2025
  7. Machine learning models now automate decisions in applications where we may wish to provide recourse to adversely affected individuals. In practice, existing methods to provide recourse return actions that fail to account for latent characteristics that are not captured in the model (e.g., age, sex, marital status). In this paper, we study how the cost and feasibility of recourse can change across these latent groups. We introduce a notion of group-level plausibility to identify groups of individuals with a shared set of latent characteristics. We develop a general-purpose clustering procedure to identify groups from samples. Further, we propose a constrained optimization approach to learn models that equalize the cost of recourse over latent groups. We evaluate our approach through an empirical study on simulated and real-world datasets, showing that it can produce models that have better performance in terms of overall costs and feasibility at a group level. 
    more » « less
  8. Leading approaches to algorithmic fairness and policy-induced distribution shift are often misaligned with long-term objectives in sequential settings. We aim to correct these shortcomings by ensuring that both the objective and fairness constraints account for policy-induced distribution shift. First, we motivate this problem using an example in which individuals subject to algorithmic predictions modulate their willingness to participate with the policy maker. Fairness in this example is measured by the variance of group participation rates. Next, we develop a method for solving the resulting constrained, non-linear optimization problem and prove that this method converges to a fair, locally optimal policy given first-order information. Finally, we experimentally validate our claims in a semi-synthetic setting. 
    more » « less
  9. Graph representation learning is a fundamental technique for machine learning (ML) on complex networks. Given an input network, these methods represent the vertices by low-dimensional real-valued vectors. These vectors can be used for a multitude of downstream ML tasks. We study one of the most important such task, link prediction. Much of the recent literature on graph representation learning has shown remarkable success in link prediction. On closer investigation, we observe that the performance is measured by the AUC (area under the curve), which suffers biases. Since the ground truth in link prediction is sparse, we design a vertex-centric measure of performance, called the VCMPR@k plots. Under this measure, we show that link predictors using graph representations show poor scores. Despite having extremely high AUC scores, the predictors miss much of the ground truth. We identify a mathematical connection between this performance, the sparsity of the ground truth, and the low-dimensional geometry of the node embeddings. Under a formal theoretical framework, we prove that low-dimensional vectors cannot capture sparse ground truth using dot product similarities (the standard practice in the literature). Our results call into question existing results on link prediction and pose a significant scientific challenge for graph representation learning. The VCMPR plots identify specific scientific challenges for link prediction using low-dimensional node embeddings. 
    more » « less