skip to main content

This content will become publicly available on June 1, 2022

Title: Secondary vertex finding in jets with neural networks
Abstract Jet classification is an important ingredient in measurements and searches for new physics at particle colliders, and secondary vertex reconstruction is a key intermediate step in building powerful jet classifiers. We use a neural network to perform vertex finding inside jets in order to improve the classification performance, with a focus on separation of bottom vs. charm flavor tagging. We implement a novel, universal set-to-graph model, which takes into account information from all tracks in a jet to determine if pairs of tracks originated from a common vertex. We explore different performance metrics and find our method to outperform traditional approaches in accurate secondary vertex reconstruction. We also find that improved vertex finding leads to a significant improvement in jet classification performance.
; ; ; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
The European Physical Journal C
Sponsoring Org:
National Science Foundation
More Like this
  1. Doglioni, C. ; Kim, D. ; Stewart, G.A. ; Silvestris, L. ; Jackson, P. ; Kamleh, W. (Ed.)
    One of the most computationally challenging problems expected for the High-Luminosity Large Hadron Collider (HL-LHC) is finding and fitting particle tracks during event reconstruction. Algorithms used at the LHC today rely on Kalman filtering, which builds physical trajectories incrementally while incorporating material effects and error estimation. Recognizing the need for faster computational throughput, we have adapted Kalman-filterbased methods for highly parallel, many-core SIMD and SIMT architectures that are now prevalent in high-performance hardware. Previously we observed significant parallel speedups, with physics performance comparable to CMS standard tracking, on Intel Xeon, Intel Xeon Phi, and (to a limited extent) NVIDIA GPUs.more »While early tests were based on artificial events occurring inside an idealized barrel detector, we showed subsequently that our mkFit software builds tracks successfully from complex simulated events (including detector pileup) occurring inside a geometrically accurate representation of the CMS-2017 tracker. Here, we report on advances in both the computational and physics performance of mkFit, as well as progress toward integration with CMS production software. Recently we have improved the overall efficiency of the algorithm by preserving short track candidates at a relatively early stage rather than attempting to extend them over many layers. Moreover, mkFit formerly produced an excess of duplicate tracks; these are now explicitly removed in an additional processing step. We demonstrate that with these enhancements, mkFit becomes a suitable choice for the first iteration of CMS tracking, and eventually for later iterations as well. We plan to test this capability in the CMS High Level Trigger during Run 3 of the LHC, with an ultimate goal of using it in both the CMS HLT and offline reconstruction for the HL-LHC CMS tracker.« less
  2. Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traffic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifiable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of moneymore »laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for efficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOSPLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and efficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and efficiency of our proposed HOSPLOC algorithm.« less
  3. Network embedding is an effective approach to learn the low-dimensional representations of vertices in networks, aiming to capture and preserve the structure and inherent properties of networks. The vast majority of existing network embedding methods exclusively focus on vertex proximity of networks, while ignoring the network internal community structure. However, the homophily principle indicates that vertices within the same community are more similar to each other than those from different communities, thus vertices within the same community should have similar vertex representations. Motivated by this, we propose a novel network embedding framework NECS to learn the Network Embedding with Community Structuralmore »information, which preserves the high-order proximity and incorporates the community structure in vertex representation learning. We formulate the problem into a principled optimization framework and provide an effective alternating algorithm to solve it. Extensive experimental results on several benchmark network datasets demonstrate the effectiveness of the proposed framework in various network analysis tasks including network reconstruction, link prediction and vertex classification.

    « less
  4. Inspired by recent efforts to model cancer evolution with phylogenetic trees, we consider the problem of finding a consensus tumor evolution tree from a set of conflicting input trees. In contrast to traditional phylogenetic trees, the tumor trees we consider contain features such as mutation labels on internal vertices (in addition to the leaves) and allow multiple mutations to label a single vertex. We describe several distance measures between these tumor trees and present an algorithm to solve the consensus problem called GraPhyC. Our approach uses a weighted directed graph where vertices are sets of mutations and edges are weightedmore »using a function that depends on the number of times a parental relationship is observed between their constituent mutations in the set of input trees. We find a minimum weight spanning arborescence in this graph and prove that the resulting tree minimizes the total distance to all input trees for one of our presented distance measures. We evaluate our GraPhyC method using both simulated and real data. On simulated data we show that our method outperforms a baseline method at finding an appropriate representative tree. Using a set of tumor trees derived from both whole-genome and deep sequencing data from a Chronic Lymphocytic Leukemia patient we find that our approach identifies a tree not included in the set of input trees, but that contains characteristics supported by other reported evolutionary reconstructions of this tumor.« less
  5. A key challenge in conformer sampling is finding low-energy conformations with a small number of energy evaluations. We recently demonstrated the Bayesian Optimization Algorithm (BOA) is an effective method for finding the lowest energy conformation of a small molecule. Our approach balances between exploitation and exploration, and is more efficient than exhaustive or random search methods. Here, we extend strategies used on proteins and oligopeptides ( e.g. Ramachandran plots of secondary structure) and study correlated torsions in small molecules. We use bivariate von Mises distributions to capture correlations, and use them to constrain the search space. We validate the performancemore »of our new method, Bayesian Optimization with Knowledge-based Expected Improvement (BOKEI), on a dataset consisting of 533 diverse small molecules, using (i) a force field (MMFF94); and (ii) a semi-empirical method (GFN2), as the objective function. We compare the search performance of BOKEI, BOA with Expected Improvement (BOA-EI), and a genetic algorithm (GA), using a fixed number of energy evaluations. In more than 60% of the cases examined, BOKEI finds lower energy conformations than global optimization with BOA-EI or GA. More importantly, we find correlated torsions in up to 15% of small molecules in larger data sets, up to 8 times more often than previously reported. The BOKEI patterns not only describe steric clashes, but also reflect favorable intramolecular interactions such as hydrogen bonds and π–π stacking. Increasing our understanding of the conformational preferences of molecules will help improve our ability to find low energy conformers efficiently, which will have impact in a wide range of computational modeling applications.« less