Abstract BackgroundThe use of systems science methodologies to understand complex environmental and human health relationships is increasing. Requirements for advanced datasets, models, and expertise limit current application of these approaches by many environmental and public health practitioners. MethodsA conceptual system-of-systems model was applied for children in North Carolina counties that includes example indicators of children’s physical environment (home age, Brownfield sites, Superfund sites), social environment (caregiver’s income, education, insurance), and health (low birthweight, asthma, blood lead levels). The web-based Toxicological Prioritization Index (ToxPi) tool was used to normalize the data, rank the resulting vulnerability index, and visualize impacts from each indicator in a county. Hierarchical clustering was used to sort the 100 North Carolina counties into groups based on similar ToxPi model results. The ToxPi charts for each county were also superimposed over a map of percentage county population under age 5 to visualize spatial distribution of vulnerability clusters across the state. ResultsData driven clustering for this systems model suggests 5 groups of counties. One group includes 6 counties with the highest vulnerability scores showing strong influences from all three categories of indicators (social environment, physical environment, and health). A second group contains 15 counties with high vulnerability scores driven by strong influences from home age in the physical environment and poverty in the social environment. A third group is driven by data on Superfund sites in the physical environment. ConclusionsThis analysis demonstrated how systems science principles can be used to synthesize holistic insights for decision making using publicly available data and computational tools, focusing on a children’s environmental health example. Where more traditional reductionist approaches can elucidate individual relationships between environmental variables and health, the study of collective, system-wide interactions can enable insights into the factors that contribute to regional vulnerabilities and interventions that better address complex real-world conditions.
more »
« less
This content will become publicly available on January 31, 2026
AI-assisted discovery of quantitative and formal models in social science
Abstract In social science, formal and quantitative models, ranging from ones that describe economic growth to collective action, are used to formulate mechanistic explanations of the observed phenomena, provide predictions, and uncover new research questions. Here, we demonstrate the use of a machine learning system to aid the discovery of symbolic models that capture non-linear and dynamical relationships in social science datasets. By extending neuro-symbolic methods to find compact functions and differential equations in noisy and longitudinal data, we show that our system can be used to discover interpretable models from real-world data in economics and sociology. Augmenting existing workflows with symbolic regression can help uncover novel relationships and explore counterfactual models during the scientific process. We propose that this AI-assisted framework can bridge parametric and non-parametric models commonly employed in social science research by systematically exploring the space of non-linear models and enabling fine-grained control over expressivity and interpretability.
more »
« less
- Award ID(s):
- 2019786
- PAR ID:
- 10588321
- Publisher / Repository:
- Springer Nature
- Date Published:
- Journal Name:
- Humanities and Social Sciences Communications
- Volume:
- 12
- Issue:
- 1
- ISSN:
- 2662-9992
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Predictive analytics has been widely used in various domains, including education, to inform decision-making and improve outcomes. However, many predictive models are proprietary and inaccessible for evaluation or modification by researchers and practitioners, limiting their accountability and ethical design. Moreover, predictive models are often opaque and incomprehensible to the officials who use them, reducing their trust and utility. Furthermore, predictive models may introduce or exacerbate bias and inequity, as they have done in many sectors of society. Therefore, there is a need for transparent, interpretable, and fair predictive models that can be easily adopted and adapted by different stakeholders. In this paper, we propose a fair predictive model based on multivariate adaptive regression splines (MARS) that incorporates fairness measures in the learning process. MARS is a non-parametric regression model that performs feature selection, handles non-linear relationships, generates interpretable decision rules, and derives optimal splitting criteria on the variables. Specifically, we integrate fairness into the knot optimization algorithm and provide theoretical and empirical evidence of how it results in a fair knot placement. We apply our fairMARS model to real-world data and demonstrate its effectiveness in terms of accuracy and equity. Our paper contributes to the advancement of responsible and ethical predictive analytics for social good.more » « less
-
null (Ed.)Theoretical and Empirical Modeling of Identity and Sentiments in Collaborative Groups (THEMIS.COG) was an interdisciplinary research collaboration of computer scientists and social scientists from the University of Waterloo (Canada), Potsdam University of Applied Sciences (Germany), and Dartmouth College (USA). This white paper summarizes the results of our research at the end of the grant term. Funded by the Trans-Atlantic Platform’s Digging Into Data initiative, the project aimed at theoretical and empirical modeling of identity and sentiments in collaborative groups. Understanding the social forces behind self-organized collaboration is important because technological and social innovations are increasingly generated through informal, distributed processes of collaboration, rather than in formal organizational hierarchies or through market forces. Our work used a data-driven approach to explore the social psychological mechanisms that motivate such collaborations and determine their success or failure. We focused on the example of GitHub, the world’s current largest digital platform for open, collaborative software development. In contrast to most, purely inductive contemporary approaches leveraging computational techniques for social science, THEMIS.COG followed a deductive, theory-driven approach. We capitalized on affect control theory, a mathematically formalized theory of symbolic interaction originated by sociologist David R. Heise and further advanced in previous work by some of the THEMIS.COG collaborators, among others. Affect control theory states that people control their social behaviours by intuitively attempting to verify culturally shared feelings about identities, social roles, and behaviour settings. From this principle, implemented in computational simulation models, precise predictions about group dynamics can be derived. It was the goal of THEMIS.COG to adapt and apply this approach to study the GitHub collaboration ecosystem through a symbolic interactionist lens. The project contributed substantially to the novel endeavor of theory development in social science based on large amounts of naturally occurring digital data.more » « less
-
null (Ed.)There is large interest in networked social science experiments for understanding human behavior at-scale. Significant effort is required to perform data analytics on experimental outputs and for computational modeling of custom experiments. Moreover, experiments and modeling are often performed in a cycle, enabling iterative experimental refinement and data modeling to uncover interesting insights and to generate/refute hypotheses about social behaviors. The current practice for social analysts is to develop tailor-made computer programs and analytical scripts for experiments and modeling. This often leads to inefficiencies and duplication of effort. In this work, we propose a pipeline framework to take a significant step towards overcoming these challenges. Our contribution is to describe the design and implementation of a software system to automate many of the steps involved in analyzing social science experimental data, building models to capture the behavior of human subjects, and providing data to test hypotheses. The proposed pipeline framework consists of formal models, formal algorithms, and theoretical models as the basis for the design and implementation. We propose a formal data model, such that if an experiment can be described in terms of this model, then our pipeline software can be used to analyze data efficiently. The merits of the proposed pipeline framework is elaborated by several case studies of networked social science experiments.more » « less
-
As power systems evolve with the increasing integration of renewable energy sources and smart grid technologies, there is a growing demand for flexible and scalable modeling approaches capable of capturing the complex dynamics of modern grids. This review focuses on symbolic regression, a powerful methodology for deriving parsimonious and interpretable mathematical models directly from data. Symbolic regression is particularly valuable for power systems due to its ability to uncover governing equations without prior structural assumptions, enabling transparent and data-driven insights into nonlinear system behavior. The paper presents a comprehensive overview of symbolic regression methods, including sparse identification of nonlinear dynamics, automatic regression for governing equations, and deep symbolic regression, highlighting their applications in power systems. Through comparative case studies of the single machine infinite bus system, grid-following, and grid-forming inverters, we analyze the strengths, limitations, and suitability of each symbolic regression method in modeling nonlinear power system dynamics. Additionally, we identify critical research gaps and discuss future directions for leveraging symbolic regression in the optimization, control, and operation of modern power grids. This review aims to provide a valuable resource for researchers and engineers seeking innovative, data-driven solutions for modeling in the context of evolving power system infrastructure.more » « less
An official website of the United States government
