In the past decade, topological data analysis has emerged as a powerful algebraic topology approach in data science. Although knot theory and related subjects are a focus of study in mathematics, their success in practical applications is quite limited due to the lack of localization and quantization. We address these challenges by introducing knot data analysis (KDA), a paradigm that incorporates curve segmentation and multiscale analysis into the Gauss link integral. The resulting multiscale Gauss link integral (mGLI) recovers the global topological properties of knots and links at an appropriate scale and offers a multiscale geometric topology approach to capture the local structures and connectivities in data. By integration with machine learning or deep learning, the proposed mGLI significantly outperforms other state-of-the-art methods across various benchmark problems in 13 intricately complex biological datasets, including protein flexibility analysis, protein–ligand interactions, human Ether-à-go-go-Related Gene potassium channel blockade screening, and quantitative toxicity assessment. Our KDA opens a research area—knot deep learning—in data science.
more »
« less
Spatially weighted structural similarity index: a multiscale comparison tool for diverse sources of mobility data
Data collected about routine human activity and mobility is used in diverse applications to improve our society. Robust models are needed to address the challenges of our increasingly interconnected world. Methods capable of portraying the dynamic properties of complex human systems, such as simulation modeling, must comply to rigorous data requirements. Modern data sources, like SafeGraph, provide aggregate data collected from location aware technologies. Opportunities and challenges arise to incorporate the new data into existing analysis and modeling methods. Our research employs a multiscale spatial similarity index to compare diverse origin-destination mobility datasets. Established distance ranges accommodate spatial variability in the model’s datasets. This paper explores how similarity scores change with different aggregations to address discrepancies in the source data’s temporal granularity. We suggest possible explanations for variations in the similarity scores and extract characteristics of human mobility for the study area. The multiscale spatial similarity index may be integrated into a vast array of analysis and modeling workflows, either during preliminary analysis or later evaluation phases as a method of data validation (e.g., agent-based models). We propose that the demonstrated tool has potential to enhance mobility modeling methods in the context of complex human systems.
more »
« less
- Award ID(s):
- 2031407
- PAR ID:
- 10381900
- Editor(s):
- Ossi, Federico; Hachem, Fatima; Robira, Benjamin; Ellis Soto, Diego; Rutz, Christian; Dodge, Somayeh; Cagnacci, Francesca; Damiani, Maria Luisa
- Date Published:
- Journal Name:
- Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Animal Movement Ecology and Human Mobility
- Page Range / eLocation ID:
- 19 to 22
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract BackgroundThe use of systems science methodologies to understand complex environmental and human health relationships is increasing. Requirements for advanced datasets, models, and expertise limit current application of these approaches by many environmental and public health practitioners. MethodsA conceptual system-of-systems model was applied for children in North Carolina counties that includes example indicators of children’s physical environment (home age, Brownfield sites, Superfund sites), social environment (caregiver’s income, education, insurance), and health (low birthweight, asthma, blood lead levels). The web-based Toxicological Prioritization Index (ToxPi) tool was used to normalize the data, rank the resulting vulnerability index, and visualize impacts from each indicator in a county. Hierarchical clustering was used to sort the 100 North Carolina counties into groups based on similar ToxPi model results. The ToxPi charts for each county were also superimposed over a map of percentage county population under age 5 to visualize spatial distribution of vulnerability clusters across the state. ResultsData driven clustering for this systems model suggests 5 groups of counties. One group includes 6 counties with the highest vulnerability scores showing strong influences from all three categories of indicators (social environment, physical environment, and health). A second group contains 15 counties with high vulnerability scores driven by strong influences from home age in the physical environment and poverty in the social environment. A third group is driven by data on Superfund sites in the physical environment. ConclusionsThis analysis demonstrated how systems science principles can be used to synthesize holistic insights for decision making using publicly available data and computational tools, focusing on a children’s environmental health example. Where more traditional reductionist approaches can elucidate individual relationships between environmental variables and health, the study of collective, system-wide interactions can enable insights into the factors that contribute to regional vulnerabilities and interventions that better address complex real-world conditions.more » « less
-
Abstract During the 21st century, human–environment interactions will increasingly expose both systems to risks, but also yield opportunities for improvement as we gain insight into these complex, coupled systems. Human–environment interactions operate over multiple spatial and temporal scales, requiring large data volumes of multi‐resolution information for analysis. Climate change, land‐use change, urbanization, and wildfires, for example, can affect regions differently depending on ecological and socioeconomic structures. The relative scarcity of data on both humans and natural systems at the relevant extent can be prohibitive when pursuing inquiries into these complex relationships. We explore the value of multitemporal, high‐density, and high‐resolution LiDAR, imaging spectroscopy, and digital camera data from the National Ecological Observatory Network’s Airborne Observation Platform (NEON AOP) for Socio‐Environmental Systems (SES) research. In addition to providing an overview of NEON AOP datasets and outlining specific applications for addressing SES questions, we highlight current challenges and provide recommendations for the SES research community to improve and expand its use of this platform for SES research. The coordinated, nationwide AOP remote sensing data, collected annually over the next 30 yr, offer exciting opportunities for cross‐site analyses and comparison, upscaling metrics derived from LiDAR and hyperspectral datasets across larger spatial extents, and addressing questions across diverse scales. Integrating AOP data with other SES datasets will allow researchers to investigate complex systems and provide urgently needed policy recommendations for socio‐environmental challenges. We urge the SES research community to further explore questions and theories in social and economic disciplines that might leverage NEON AOP data.more » « less
-
Abstract Many mechanical engineering applications call for multiscale computational modeling and simulation. However, solving for complex multiscale systems remains computationally onerous due to the high dimensionality of the solution space. Recently, machine learning (ML) has emerged as a promising solution that can either serve as a surrogate for, accelerate or augment traditional numerical methods. Pioneering work has demonstrated that ML provides solutions to governing systems of equations with comparable accuracy to those obtained using direct numerical methods, but with significantly faster computational speed. These high-speed, high-fidelity estimations can facilitate the solving of complex multiscale systems by providing a better initial solution to traditional solvers. This paper provides a perspective on the opportunities and challenges of using ML for complex multiscale modeling and simulation. We first outline the current state-of-the-art ML approaches for simulating multiscale systems and highlight some of the landmark developments. Next, we discuss current challenges for ML in multiscale computational modeling, such as the data and discretization dependence, interpretability, and data sharing and collaborative platform development. Finally, we suggest several potential research directions for the future.more » « less
-
null (Ed.)Background Human movement is one of the forces that drive the spatial spread of infectious diseases. To date, reducing and tracking human movement during the COVID-19 pandemic has proven effective in limiting the spread of the virus. Existing methods for monitoring and modeling the spatial spread of infectious diseases rely on various data sources as proxies of human movement, such as airline travel data, mobile phone data, and banknote tracking. However, intrinsic limitations of these data sources prevent us from systematic monitoring and analyses of human movement on different spatial scales (from local to global). Objective Big data from social media such as geotagged tweets have been widely used in human mobility studies, yet more research is needed to validate the capabilities and limitations of using such data for studying human movement at different geographic scales (eg, from local to global) in the context of global infectious disease transmission. This study aims to develop a novel data-driven public health approach using big data from Twitter coupled with other human mobility data sources and artificial intelligence to monitor and analyze human movement at different spatial scales (from global to regional to local). Methods We will first develop a database with optimized spatiotemporal indexing to store and manage the multisource data sets collected in this project. This database will be connected to our in-house Hadoop computing cluster for efficient big data computing and analytics. We will then develop innovative data models, predictive models, and computing algorithms to effectively extract and analyze human movement patterns using geotagged big data from Twitter and other human mobility data sources, with the goal of enhancing situational awareness and risk prediction in public health emergency response and disease surveillance systems. Results This project was funded as of May 2020. We have started the data collection, processing, and analysis for the project. Conclusions Research findings can help government officials, public health managers, emergency responders, and researchers answer critical questions during the pandemic regarding the current and future infectious risk of a state, county, or community and the effectiveness of social/physical distancing practices in curtailing the spread of the virus. International Registered Report Identifier (IRRID) DERR1-10.2196/24432more » « less
An official website of the United States government

