skip to main content


Title: Parameter inference from event ensembles and the top-quark mass
A bstract One of the key tasks of any particle collider is measurement. In practice, this is often done by fitting data to a simulation, which depends on many parameters. Sometimes, when the effects of varying different parameters are highly correlated, a large ensemble of data may be needed to resolve parameter-space degeneracies. An important example is measuring the top-quark mass, where other physical and unphysical parameters in the simulation must be profiled when fitting the top-quark mass parameter. We compare four different methodologies for top-quark mass measurement: a classical histogram fit similar to one commonly used in experiment augmented by soft-drop jet grooming; a 2D profile likelihood fit with a nuisance parameter; a machine-learning method called DCTR; and a linear regression approach, either using a least-squares fit or with a dense linearly-activated neural network. Despite the fact that individual events are totally uncorrelated, we find that the linear regression methods work most effectively when we input an ensemble of events sorted by mass, rather than training them on individual events. Although all methods provide robust extraction of the top-quark mass parameter, the linear network does marginally best and is remarkably simple. For the top study, we conclude that the Monte-Carlo-based uncertainty on current extractions of the top-quark mass from LHC data can be reduced significantly (by perhaps a factor of 2) using networks trained on sorted event ensembles. More generally, machine learning from ensembles for parameter estimation has broad potential for collider physics measurements.  more » « less
Award ID(s):
2019786
NSF-PAR ID:
10299647
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Journal of High Energy Physics
Volume:
2021
Issue:
9
ISSN:
1029-8479
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Assigning weights to a large pool of objects is a fundamental task in a wide variety of applications. In this article, we introduce the concept of structured high-dimensional probability simplexes, in which most components are zero or near zero and the remaining ones are close to each other. Such structure is well motivated by (i) high-dimensional weights that are common in modern applications, and (ii) ubiquitous examples in which equal weights -- despite their simplicity -- often achieve favorable or even state-of-the-art predictive performance. This particular structure, however, presents unique challenges partly because, unlike high-dimensional linear regression, the parameter space is a simplex and pattern switching between partial constancy and sparsity is unknown. To address these challenges, we propose a new class of double spike Dirichlet priors to shrink a probability simplex to one with the desired structure. When applied to ensemble learning, such priors lead to a Bayesian method for structured high-dimensional ensembles that is useful for forecast combination and improving random forests, while enabling uncertainty quantification. We design efficient Markov chain Monte Carlo algorithms for implementation. Posterior contraction rates are established to study large sample behaviors of the posterior distribution. We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters data set and a data set from the UC Irvine Machine Learning Repository (UCI). 
    more » « less
  2. Ensemble-based change detection can improve map accuracies by combining information from multiple datasets. There is a growing literature investigating ensemble inputs and applications for forest disturbance detection and mapping. However, few studies have evaluated ensemble methods other than Random Forest classifiers, which rely on uninterpretable “black box” algorithms with hundreds of parameters. Additionally, most ensemble-based disturbance maps do not utilize independently and systematically collected field-based forest inventory measurements. Here, we compared three approaches for combining change detection results generated from multi-spectral Landsat time series with forest inventory measurements to map forest harvest events at an annual time step. We found that seven-parameter degenerate decision tree ensembles performed at least as well as 500-tree Random Forest ensembles trained and tested on the same LandTrendr segmentation results and both supervised decision tree methods consistently outperformed the top-performing voting approach (majority). Comparisons with an existing national forest disturbance dataset indicated notable improvements in accuracy that demonstrate the value of developing locally calibrated, process-specific disturbance datasets like the harvest event maps developed in this study. Furthermore, by using multi-date forest inventory measurements, we are able to establish a lower bound of 30% basal area removal on detectable harvests, providing biophysical context for our harvest event maps. Our results suggest that simple interpretable decision trees applied to multi-spectral temporal segmentation outputs can be as effective as more complex machine learning approaches for characterizing forest harvest events ranging from partial clearing to clear cuts, with important implications for locally accurate mapping of forest harvests and other types of disturbances. 
    more » « less
  3. The measurement of the charge asymmetry for highly boosted top quark pairs decaying to a single lepton and jets is presented. The analysis is performed using 138 fb−1 of data collected in pp collisions at s√=13 TeV with the CMS detector during Run 2 of the Large Hadron Collider. The selection is optimized for top quark-antiquark pairs produced with large Lorentz boosts, resulting in non-isolated leptons and overlapping jets. The top quark charge asymmetry is measured for events with tt⎯⎯ invariant mass larger than 750 GeV and corrected for detector and acceptance effects using a binned maximum likelihood fit. The measured top quark charge asymmetry is in good agreement with the standard model prediction at next-to-next-to-leading order in perturbation theory with next-to-leading order electroweak corrections. Differential distributions for two invariant mass ranges are also presented. 
    more » « less
  4. null (Ed.)
    Abstract The rate for Higgs ( $${\mathrm{H}} $$ H ) bosons production in association with either one ( $${\mathrm{t}} {\mathrm{H}} $$ t H ) or two ( $${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$ t t ¯ H ) top quarks is measured in final states containing multiple electrons, muons, or tau leptons decaying to hadrons and a neutrino, using proton–proton collisions recorded at a center-of-mass energy of $$13\,\text {TeV} $$ 13 TeV by the CMS experiment. The analyzed data correspond to an integrated luminosity of 137 $$\,\text {fb}^{-1}$$ fb - 1 . The analysis is aimed at events that contain $${\mathrm{H}} \rightarrow {\mathrm{W}} {\mathrm{W}} $$ H → W W , $${\mathrm{H}} \rightarrow {\uptau } {\uptau } $$ H → τ τ , or $${\mathrm{H}} \rightarrow {\mathrm{Z}} {\mathrm{Z}} $$ H → Z Z decays and each of the top quark(s) decays either to lepton+jets or all-jet channels. Sensitivity to signal is maximized by including ten signatures in the analysis, depending on the lepton multiplicity. The separation among $${\mathrm{t}} {\mathrm{H}} $$ t H , $${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$ t t ¯ H , and the backgrounds is enhanced through machine-learning techniques and matrix-element methods. The measured production rates for the $${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$ t t ¯ H and $${\mathrm{t}} {\mathrm{H}} $$ t H signals correspond to $$0.92 \pm 0.19\,\text {(stat)} ^{+0.17}_{-0.13}\,\text {(syst)} $$ 0.92 ± 0.19 (stat) - 0.13 + 0.17 (syst) and $$5.7 \pm 2.7\,\text {(stat)} \pm 3.0\,\text {(syst)} $$ 5.7 ± 2.7 (stat) ± 3.0 (syst) of their respective standard model (SM) expectations. The corresponding observed (expected) significance amounts to 4.7 (5.2) standard deviations for $${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$ t t ¯ H , and to 1.4 (0.3) for $${\mathrm{t}} {\mathrm{H}} $$ t H production. Assuming that the Higgs boson coupling to the tau lepton is equal in strength to its expectation in the SM, the coupling $$y_{{\mathrm{t}}}$$ y t of the Higgs boson to the top quark divided by its SM expectation, $$\kappa _{{\mathrm{t}}}=y_{{\mathrm{t}}}/y_{{\mathrm{t}}}^{\mathrm {SM}}$$ κ t = y t / y t SM , is constrained to be within $$-0.9< \kappa _{{\mathrm{t}}}< -0.7$$ - 0.9 < κ t < - 0.7 or $$0.7< \kappa _{{\mathrm{t}}}< 1.1$$ 0.7 < κ t < 1.1 , at 95% confidence level. This result is the most sensitive measurement of the $${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$ t t ¯ H production rate to date. 
    more » « less
  5. null (Ed.)
    Propensity score methods account for selection bias in observational studies. However, the consistency of the propensity score estimators strongly depends on a correct specification of the propensity score model. Logistic regression and, with increasing popularity, machine learning tools are used to estimate propensity scores. We introduce a stacked generalization ensemble learning approach to improve propensity score estimation by fitting a meta learner on the predictions of a suitable set of diverse base learners. We perform a comprehensive Monte Carlo simulation study, implementing a broad range of scenarios that mimic characteristics of typical data sets in educational studies. The population average treatment effect is estimated using the propensity score in Inverse Probability of Treatment Weighting. Our proposed stacked ensembles, especially using gradient boosting machines as a meta learner trained on a set of 12 base learner predictions, led to superior reduction of bias compared to the current state-of-the-art in propensity score estimation. Further, our simulations imply that commonly used balance measures (averaged standardized absolute mean differences) might be misleading as propensity score model selection criteria. We apply our proposed model - which we call GBM-Stack - to assess the population average treatment effect of a Supplemental Instruction (SI) program in an introductory psychology (PSY 101) course at San Diego State University. Our analysis provides evidence that moving the whole population to SI attendance would on average lead to 1.69 times higher odds to pass the PSY 101 class compared to not offering SI, with a 95% bootstrap confidence interval of (1.31, 2.20). 
    more » « less