NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Investigating the sources of variable impact of pathogenic variants in monogenic metabolic conditions

https://doi.org/10.1038/s41467-025-60339-7

Wei, Angela; Border, Richard; Fu, Boyang; Cullina, Sinéad; Brandes, Nadav; Jang, Seon-Kyeong; Sankararaman, Sriram; Kenny, Eimear E; Udler, Miriam S; Ntranos, Vasilis; et al (December 2025, Nature Communications)

Abstract Over three percent of people carry a dominant pathogenic variant, yet only a fraction of carriers develop disease. Disease phenotypes from carriers of variants in the same gene range from mild to severe. Here, we investigate underlying mechanisms for this heterogeneity: variable variant effect sizes, carrier polygenic backgrounds, and modulation of carrier effect by genetic background (marginal epistasis). We leveraged exomes and clinical phenotypes from the UK Biobank and the Mt. Sinai BioMeBiobank to identify carriers of pathogenic variants affecting cardiometabolic traits. We employed recently developed methods to study these cohorts, observing strong statistical support and clinical translational potential for all three mechanisms of variable carrier penetrance and disease severity. For example, scores from our recent model of variant pathogenicity were tightly correlated with phenotype amongst clinical variant carriers, they predicted effects of variants of unknown significance, and they distinguished gain- from loss-of-function variants. We also found that polygenic scores modify phenotypes amongst pathogenic carriers and that genetic background additionally alters the effects of pathogenic variants through interactions.
more » « less
Free, publicly-accessible full text available December 1, 2026
A scalable adaptive quadratic kernel method for interpretable epistasis analysis in complex traits

https://doi.org/10.1101/gr.279140.124

Fu, Boyang; Anand, Prateek; Anand, Aakarsh; Mefford, Joel; Sankararaman, Sriram (September 2024, Genome Research)

Our knowledge of the contribution of genetic interactions (epistasis) to variation in human complex traits remains limited, partly due to the lack of efficient, powerful, and interpretable algorithms to detect interactions. Recently proposed approaches for set-based association tests show promise in improving the power to detect epistasis by examining the aggregated effects of multiple variants. Nevertheless, these methods either do not scale to large Biobank data sets or lack interpretability. We propose QuadKAST, a scalable algorithm focused on testing pairwise interaction effects (quadratic effects) within small to medium-sized sets of genetic variants (window size ≤100) on a trait and provide quantified interpretation of these effects. Comprehensive simulations show that QuadKAST is well-calibrated. Additionally, QuadKAST is highly sensitive in detecting loci with epistatic signals and accurate in its estimation of quadratic effects. We applied QuadKAST to 52 quantitative phenotypes measured in ≈300,000 unrelated white British individuals in the UK Biobank to test for quadratic effects within each of 9515 protein-coding genes. We detect 32 trait-gene pairs across 17 traits and 29 genes that demonstrate statistically significant signals of quadratic effects (accounting for the number of genes and traits tested). Across these trait-gene pairs, the proportion of trait variance explained by quadratic effects is comparable to additive effects, with five pairs having a ratio >1. Our method enables the detailed investigation of epistasis on a large scale, offering new insights into its role and importance.
more » « less
Full Text Available
Fast kernel-based association testing of non-linear genetic effects for biobank-scale data

https://doi.org/10.1038/s41467-023-40346-2

Fu, Boyang; Pazokitoroudi, Ali; Sudarshan, Mukund; Liu, Zhengtong; Subramanian, Lakshminarayanan; Sankararaman, Sriram (December 2023, Nature Communications)

Abstract Our knowledge of non-linear genetic effects on complex traits remains limited, in part, due to the modest power to detect such effects. While kernel-based tests offer a versatile approach to test for non-linear relationships between sets of genetic variants and traits, current approaches cannot be applied to Biobank-scale datasets containing hundreds of thousands of individuals. We propose, FastKAST, a kernel-based approach that can test for non-linear effects of a set of variants on a quantitative trait. FastKAST provides calibrated hypothesis tests while enabling analysis of Biobank-scale datasets with hundreds of thousands of unrelated individuals from a homogeneous population. We apply FastKAST to 53 quantitative traits measured across ≈ 300 K unrelated white British individuals in the UK Biobank to detect sets of variants with non-linear effects at genome-wide significance.
more » « less
Full Text Available
Leveraging family data to design Mendelian Randomization that is provably robust to population stratification

https://doi.org/10.1101/gr.277664.123

LaPierre, Nathan; Fu, Boyang; Turnbull, Steven; Eskin, Eleazar; Sankararaman, Sriram (May 2023, Genome Research)

Mendelian Randomization (MR) has emerged as a powerful approach to leverage genetic instruments to infer causality between pairs of traits in observational studies. However, the results of such studies are susceptible to biases due to weak instruments as well as the confounding effects of population stratification and horizontal pleiotropy. Here, we show that family data can be leveraged to design MR tests that are provably robust to confounding from population stratification, assortative mating, and dynastic effects. We demonstrate in simulations that our approach, MR-Twin, is robust to confounding from population stratification and is not affected by weak instrument bias, while standard MR methods yield inflated false positive rates. We then conducted an exploratory analysis of MR-Twin and other MR methods applied to 121 trait pairs in the UK Biobank dataset. Our results suggest that confounding from population stratification can lead to false positives for existing MR methods, while MR-Twin is immune to this type of confounding, and that MR-Twin can help assess whether traditional approaches may be inflated due to confounding from population stratification.
more » « less
Full Text Available
PrivateBus: Privacy Identification and Protection in Large-Scale Bus WiFi Systems

https://doi.org/10.1145/3380990

Fang, Zhihan; Fu, Boyang; Qin, Zhou; Zhang, Fan; Zhang, Desheng (March 2020, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

Recently, the ubiquity of mobile devices leads to an increasing demand of public network services, e.g., WiFi hot spots. As a part of this trend, modern transportation systems are equipped with public WiFi devices to provide Internet access for passengers as people spend a large amount of time on public transportation in their daily life. However, one of the key issues in public WiFi spots is the privacy concern due to its open access nature. Existing works either studied location privacy risk in human traces or privacy leakage in private networks such as cellular networks based on the data from cellular carriers. To the best of our knowledge, none of these work has been focused on bus WiFi privacy based on large-scale real-world data. In this paper, to explore the privacy risk in bus WiFi systems, we focus on two key questions how likely bus WiFi users can be uniquely re-identified if partial usage information is leaked and how we can protect users from the leaked information. To understand the above questions, we conduct a case study in a large-scale bus WiFi system, which contains 20 million connection records and 78 million location records from 770 thousand bus WiFi users during a two-month period. Technically, we design two models for our uniqueness analyses and protection, i.e., a PB-FIND model to identify the probability a user can be uniquely re-identified from leaked information; a PB-HIDE model to protect users from potentially leaked information. Specifically, we systematically measure the user uniqueness on users' finger traces (i.e., connection URL and domain), foot traces (i.e., locations), and hybrid traces (i.e., both finger and foot traces). Our measurement results reveal (i) 97.8% users can be uniquely re-identified by 4 random domain records of their finger traces and 96.2% users can be uniquely re-identified by 5 random locations on buses; (ii) 98.1% users can be uniquely re-identified by only 2 random records if both their connection records and locations are leaked to attackers. Moreover, the evaluation results show our PB-HIDE algorithm protects more than 95% users from the potentially leaked information by inserting only 1.5% synthetic records in the original dataset to preserve their data utility.
more » « less
Full Text Available
MAC: Measuring the Impacts of Anomalies on Travel Time of Multiple Transportation Systems

https://doi.org/10.1145/3328913

Fang, Zhihan; Yang, Yu; Wang, Shuai; Fu, Boyang; Song, Zixing; Zhang, Fan; Zhang, Desheng (June 2019, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

Urban anomalies have a large impact on passengers' travel behavior and city infrastructures, which can cause uncertainty on travel time estimation. Understanding the impact of urban anomalies on travel time is of great value for various applications such as urban planning, human mobility studies and navigation systems. Most existing studies on travel time have been focused on the total riding time between two locations on an individual transportation modality. However, passengers often take different modes of transportation, e.g., taxis, subways, buses or private vehicles, and a significant portion of the travel time is spent in the uncertain waiting. In this paper, we study the fine-grained travel time patterns in multiple transportation systems under the impact of urban anomalies. Specifically, (i) we investigate implicit components, including waiting and riding time, in multiple transportation systems; (ii) we measure the impact of real-world anomalies on travel time components; (iii) we design a learning-based model for travel time component prediction with anomalies. Different from existing studies, we implement and evaluate our measurement framework on multiple data sources including four city-scale transportation systems, which are (i) a 14-thousand taxicab network, (ii) a 13-thousand bus network, (iii) a 10-thousand private vehicle network, and (iv) an automatic fare collection system for a public transit network (i.e., subway and bus) with 5 million smart cards.
more » « less
Full Text Available

Search for: All records