skip to main content


Title: Parameter inference from event ensembles and the top-quark mass
A bstract One of the key tasks of any particle collider is measurement. In practice, this is often done by fitting data to a simulation, which depends on many parameters. Sometimes, when the effects of varying different parameters are highly correlated, a large ensemble of data may be needed to resolve parameter-space degeneracies. An important example is measuring the top-quark mass, where other physical and unphysical parameters in the simulation must be profiled when fitting the top-quark mass parameter. We compare four different methodologies for top-quark mass measurement: a classical histogram fit similar to one commonly used in experiment augmented by soft-drop jet grooming; a 2D profile likelihood fit with a nuisance parameter; a machine-learning method called DCTR; and a linear regression approach, either using a least-squares fit or with a dense linearly-activated neural network. Despite the fact that individual events are totally uncorrelated, we find that the linear regression methods work most effectively when we input an ensemble of events sorted by mass, rather than training them on individual events. Although all methods provide robust extraction of the top-quark mass parameter, the linear network does marginally best and is remarkably simple. For the top study, we conclude that the Monte-Carlo-based uncertainty on current extractions of the top-quark mass from LHC data can be reduced significantly (by perhaps a factor of 2) using networks trained on sorted event ensembles. More generally, machine learning from ensembles for parameter estimation has broad potential for collider physics measurements.  more » « less
Award ID(s):
2019786
NSF-PAR ID:
10299647
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Journal of High Energy Physics
Volume:
2021
Issue:
9
ISSN:
1029-8479
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Assigning weights to a large pool of objects is a fundamental task in a wide variety of applications. In this article, we introduce the concept of structured high-dimensional probability simplexes, in which most components are zero or near zero and the remaining ones are close to each other. Such structure is well motivated by (i) high-dimensional weights that are common in modern applications, and (ii) ubiquitous examples in which equal weights -- despite their simplicity -- often achieve favorable or even state-of-the-art predictive performance. This particular structure, however, presents unique challenges partly because, unlike high-dimensional linear regression, the parameter space is a simplex and pattern switching between partial constancy and sparsity is unknown. To address these challenges, we propose a new class of double spike Dirichlet priors to shrink a probability simplex to one with the desired structure. When applied to ensemble learning, such priors lead to a Bayesian method for structured high-dimensional ensembles that is useful for forecast combination and improving random forests, while enabling uncertainty quantification. We design efficient Markov chain Monte Carlo algorithms for implementation. Posterior contraction rates are established to study large sample behaviors of the posterior distribution. We demonstrate the wide applicability and competitive performance of the proposed methods through simulations and two real data applications using the European Central Bank Survey of Professional Forecasters data set and a data set from the UC Irvine Machine Learning Repository (UCI). 
    more » « less
  2. Ensemble-based change detection can improve map accuracies by combining information from multiple datasets. There is a growing literature investigating ensemble inputs and applications for forest disturbance detection and mapping. However, few studies have evaluated ensemble methods other than Random Forest classifiers, which rely on uninterpretable “black box” algorithms with hundreds of parameters. Additionally, most ensemble-based disturbance maps do not utilize independently and systematically collected field-based forest inventory measurements. Here, we compared three approaches for combining change detection results generated from multi-spectral Landsat time series with forest inventory measurements to map forest harvest events at an annual time step. We found that seven-parameter degenerate decision tree ensembles performed at least as well as 500-tree Random Forest ensembles trained and tested on the same LandTrendr segmentation results and both supervised decision tree methods consistently outperformed the top-performing voting approach (majority). Comparisons with an existing national forest disturbance dataset indicated notable improvements in accuracy that demonstrate the value of developing locally calibrated, process-specific disturbance datasets like the harvest event maps developed in this study. Furthermore, by using multi-date forest inventory measurements, we are able to establish a lower bound of 30% basal area removal on detectable harvests, providing biophysical context for our harvest event maps. Our results suggest that simple interpretable decision trees applied to multi-spectral temporal segmentation outputs can be as effective as more complex machine learning approaches for characterizing forest harvest events ranging from partial clearing to clear cuts, with important implications for locally accurate mapping of forest harvests and other types of disturbances. 
    more » « less
  3. The measurement of the charge asymmetry for highly boosted top quark pairs decaying to a single lepton and jets is presented. The analysis is performed using 138 fb−1 of data collected in pp collisions at s√=13 TeV with the CMS detector during Run 2 of the Large Hadron Collider. The selection is optimized for top quark-antiquark pairs produced with large Lorentz boosts, resulting in non-isolated leptons and overlapping jets. The top quark charge asymmetry is measured for events with tt⎯⎯ invariant mass larger than 750 GeV and corrected for detector and acceptance effects using a binned maximum likelihood fit. The measured top quark charge asymmetry is in good agreement with the standard model prediction at next-to-next-to-leading order in perturbation theory with next-to-leading order electroweak corrections. Differential distributions for two invariant mass ranges are also presented. 
    more » « less
  4. null (Ed.)
    Abstract The rate for Higgs ( $${\mathrm{H}} $$ H ) bosons production in association with either one ( $${\mathrm{t}} {\mathrm{H}} $$ t H ) or two ( $${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$ t t ¯ H ) top quarks is measured in final states containing multiple electrons, muons, or tau leptons decaying to hadrons and a neutrino, using proton–proton collisions recorded at a center-of-mass energy of $$13\,\text {TeV} $$ 13 TeV by the CMS experiment. The analyzed data correspond to an integrated luminosity of 137 $$\,\text {fb}^{-1}$$ fb - 1 . The analysis is aimed at events that contain $${\mathrm{H}} \rightarrow {\mathrm{W}} {\mathrm{W}} $$ H → W W , $${\mathrm{H}} \rightarrow {\uptau } {\uptau } $$ H → τ τ , or $${\mathrm{H}} \rightarrow {\mathrm{Z}} {\mathrm{Z}} $$ H → Z Z decays and each of the top quark(s) decays either to lepton+jets or all-jet channels. Sensitivity to signal is maximized by including ten signatures in the analysis, depending on the lepton multiplicity. The separation among $${\mathrm{t}} {\mathrm{H}} $$ t H , $${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$ t t ¯ H , and the backgrounds is enhanced through machine-learning techniques and matrix-element methods. The measured production rates for the $${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$ t t ¯ H and $${\mathrm{t}} {\mathrm{H}} $$ t H signals correspond to $$0.92 \pm 0.19\,\text {(stat)} ^{+0.17}_{-0.13}\,\text {(syst)} $$ 0.92 ± 0.19 (stat) - 0.13 + 0.17 (syst) and $$5.7 \pm 2.7\,\text {(stat)} \pm 3.0\,\text {(syst)} $$ 5.7 ± 2.7 (stat) ± 3.0 (syst) of their respective standard model (SM) expectations. The corresponding observed (expected) significance amounts to 4.7 (5.2) standard deviations for $${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$ t t ¯ H , and to 1.4 (0.3) for $${\mathrm{t}} {\mathrm{H}} $$ t H production. Assuming that the Higgs boson coupling to the tau lepton is equal in strength to its expectation in the SM, the coupling $$y_{{\mathrm{t}}}$$ y t of the Higgs boson to the top quark divided by its SM expectation, $$\kappa _{{\mathrm{t}}}=y_{{\mathrm{t}}}/y_{{\mathrm{t}}}^{\mathrm {SM}}$$ κ t = y t / y t SM , is constrained to be within $$-0.9< \kappa _{{\mathrm{t}}}< -0.7$$ - 0.9 < κ t < - 0.7 or $$0.7< \kappa _{{\mathrm{t}}}< 1.1$$ 0.7 < κ t < 1.1 , at 95% confidence level. This result is the most sensitive measurement of the $${\mathrm{t}} {{\overline{{{\mathrm{t}}}}}} {\mathrm{H}} $$ t t ¯ H production rate to date. 
    more » « less
  5. ABSTRACT

    We developed convolutional neural networks (CNNs) to rapidly and directly infer the planet mass from radio dust continuum images. Substructures induced by young planets in protoplanetary discs can be used to infer the potential young planets’ properties. Hydrodynamical simulations have been used to study the relationships between the planet’s properties and these disc features. However, these attempts either fine-tuned numerical simulations to fit one protoplanetary disc at a time, which was time consuming, or azimuthally averaged simulation results to derive some linear relationships between the gap width/depth and the planet mass, which lost information on asymmetric features in discs. To cope with these disadvantages, we developed Planet Gap neural Networks (PGNets) to infer the planet mass from two-dimensional images. We first fit the gridded data in Zhang et al. as a classification problem. Then, we quadrupled the data set by running additional simulations with near-randomly sampled parameters, and derived the planet mass and disc viscosity together as a regression problem. The classification approach can reach an accuracy of 92 per cent, whereas the regression approach can reach 1σ as 0.16 dex for planet mass and 0.23 dex for disc viscosity. We can reproduce the degeneracy scaling α ∝ $M_\mathrm{ p}^3$ found in the linear fitting method, which means that the CNN method can even be used to find degeneracy relationship. The gradient-weighted class activation mapping effectively confirms that PGNets use proper disc features to constrain the planet mass. We provide programs for PGNets and the traditional fitting method from Zhang et al., and discuss each method’s advantages and disadvantages.

     
    more » « less