ABSTRACT Galaxy clusters have a triaxial matter distribution. The weak-lensing signal, an important part in cosmological studies, measures the projected mass of all matter along the line of sight, and therefore changes with the orientation of the cluster. Studies suggest that the shape of the brightest cluster galaxy (BCG) in the centre of the cluster traces the underlying halo shape, enabling a method to account for projection effects. We use 324 simulated clusters at four redshifts between 0.1 and 0.6 from ‘The Three Hundred Project’ to quantify correlations between the orientation and shape of the BCG and the halo. We find that haloes and their embedded BCGs are aligned, with an average ∼20 degree angle between their major axes. The bias in weak lensing cluster mass estimates correlates with the orientation of both the halo and the BCG. Mimicking observations, we compute the projected shape of the BCG, as a measure of the BCG orientation, and find that it is most strongly correlated to the weak-lensing mass for relaxed clusters. We also test a 2D cluster relaxation proxy measured from BCG mass isocontours. The concentration of stellar mass in the projected BCG core compared to the total stellar mass provides an alternative proxy for the BCG orientation. We find that the concentration does not correlate to the weak-lensing mass bias, but does correlate with the true halo mass. These results indicate that the BCG shape and orientation for large samples of relaxed clusters can provide information to improve weak-lensing mass estimates.
more »
« less
A Two-Stage Classification for Dealing with Unseen Clusters in the Testing Data
Classification is an important statistical tool that has increased its importance since the emergence of the data science revolution. However, a training data set that does not capture all underlying population subgroups (or clusters) will result in biased estimates or misclassification. In this paper, we introduce a statistical and computational solution to a possible bias in classification when implemented on estimated population clusters. An unseen-cluster problem denotes the case in which the training data does not contain all underlying clusters in the population. Such a scenario may occur due to various reasons, such as sampling errors, selection bias, or emerging and disappearing population clusters. Once an unseen-cluster problem occurs, a testing observation will be misclassified because a classification rule based on the sample cannot capture a cluster not observed in the training data (sample). To overcome such issues, we suggest a two-stage classification method to ameliorate the unseen-cluster problem in classification. We suggest a test to identify the unseen-cluster problem and demonstrate the performance of the two-stage tailored classifier using simulations and a public data example.
more »
« less
- Award ID(s):
- 2015320
- PAR ID:
- 10542505
- Publisher / Repository:
- School of Statistics and the Center for Applied Statistics, Renmin University of China.
- Date Published:
- Journal Name:
- Journal of Data Science
- ISSN:
- 1680-743X
- Page Range / eLocation ID:
- 1 to 20
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Airborne remote sensing offers unprecedented opportunities to efficiently monitor vegetation, but methods to delineate and classify individual plant species using the collected data are still actively being developed and improved. The Integrating Data science with Trees and Remote Sensing (IDTReeS) plant identification competition openly invited scientists to create and compare individual tree mapping methods. Participants were tasked with training taxon identification algorithms based on two sites, to then transfer their methods to a third unseen site, using field-based plant observations in combination with airborne remote sensing image data products from the National Ecological Observatory Network (NEON). These data were captured by a high resolution digital camera sensitive to red, green, blue (RGB) light, hyperspectral imaging spectrometer spanning the visible to shortwave infrared wavelengths, and lidar systems to capture the spectral and structural properties of vegetation. As participants in the IDTReeS competition, we developed a two-stage deep learning approach to integrate NEON remote sensing data from all three sensors and classify individual plant species and genera. The first stage was a convolutional neural network that generates taxon probabilities from RGB images, and the second stage was a fusion neural network that “learns” how to combine these probabilities with hyperspectral and lidar data. Our two-stage approach leverages the ability of neural networks to flexibly and automatically extract descriptive features from complex image data with high dimensionality. Our method achieved an overall classification accuracy of 0.51 based on the training set, and 0.32 based on the test set which contained data from an unseen site with unknown taxa classes. Although transferability of classification algorithms to unseen sites with unknown species and genus classes proved to be a challenging task, developing methods with openly available NEON data that will be collected in a standardized format for 30 years allows for continual improvements and major gains for members of the computational ecology community. We outline promising directions related to data preparation and processing techniques for further investigation, and provide our code to contribute to open reproducible science efforts.more » « less
-
null (Ed.)Current approaches to A/B testing in networks focus on limiting interference, the concern that treatment effects can ”spill over” from treatment nodes to control nodes and lead to biased causal effect estimation. Prominent methods for network experiment design rely on two-stage randomization, in which sparsely-connected clusters are identified and cluster randomization dictates the node assignment to treatment and control. Here, we show that cluster randomization does not ensure sufficient node randomization and it can lead to selection bias in which treatment and control nodes represent different populations of users. To address this problem, we propose a principled framework for network experiment design which jointly minimizes interference and selection bias. We introduce the concepts of edge spillover probability and cluster matching and demonstrate their importance for designing network A/B testing. Our experiments on a number of real-world datasets show that our proposed framework leads to significantly lower error in causal effect estimation than existing solutions.more » « less
-
The Safe System Approach (SSA) aims to eliminate fatal and serious injury roadway crashes through a holistic view of the road system, moving away from traditional safety analysis based exclusively on historical crash data. One reason for this is the classification of crashes into broad categories (e.g., head-on, sideswipe), which does not capture crash progression or contributing factors. In this context, this paper applies crash sequence analysis to historical crash data and uses the findings to proactively identify safety issues in similar contexts, in alignment with the SSA framework. The method uses sequence-of-events information from crash data to generate clusters of crashes with similar underlying characteristics. Data from fatal and serious injury crashes from urban intersections in the state of Ohio between 2018 and 2022 were used in the analysis. The results show 12 clusters with unique characteristics that consider the sequence of events of each crash. Although derived from crash data, the clusters offer an in-depth understanding of the factors associated with each one and help identify cluster-specific countermeasures related to various SSA elements. State and local jurisdictions can use the presented methodology in transportation safety programs, by focusing on the clusters that represent local challenges or on countermeasures related to the issues of multiple clusters. Finally, the method can also be associated with site-specific analysis, providing a comprehensive toolkit for practitioners.more » « less
-
Avidan, S. (Ed.)The subpopulation shifting challenge, known as some subpopulations of a category that are not seen during training, severely limits the classification performance of the state-of-the-art convolutional neural networks. Thus, to mitigate this practical issue, we explore incremental subpopulation learning (ISL) to adapt the original model via incrementally learning the unseen subpopulations without retaining the seen population data. However, striking a great balance between subpopulation learning and seen population forgetting is the main challenge in ISL but is not well studied by existing approaches. These incremental learners simply use a pre-defined and fixed hyperparameter to balance the learning objective and forgetting regularization, but their learning is usually biased towards either side in the long run. In this paper, we propose a novel two-stage learning scheme to explicitly disentangle the acquisition and forgetting for achieving a better balance between subpopulation learning and seen population forgetting: in the first “gain-acquisition” stage, we progressively learn a new classifier based on the margin-enforce loss, which enforces the hard samples and population to have a larger weight for classifier updating and avoid uniformly updating all the population; in the second “counter-forgetting” stage, we search for the proper combination of the new and old classifiers by optimizing a novel objective based on proxies of forgetting and acquisition. We benchmark the representative and state-of-the-art non-exemplar-based incremental learning methods on a large-scale subpopulation shifting dataset for the first time. Under almost all the challenging ISL protocols, we significantly outperform other methods by a large margin, demonstrating our superiority to alleviate the subpopulation shifting problem (Code is released in https://github.com/wuyujack/ISL).more » « less
An official website of the United States government

