<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Automated Collider Event Selection, Plotting, &amp; Machine Learning with AEACuS, RHADAManTHUS, &amp; MInOS</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>07/14/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10351875</idno>
					<idno type="doi">10.22323/1.409.0027</idno>
					<title level='j'>Computational Tools for High Energy Physics and Cosmology (CompTools2021)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Joel W. Walker</author><author>Alexandre Arbey</author><author>G. Bélanger</author><author>Nishita Desai</author><author>Tomas Gonzalo</author><author>Robert V. Harlander</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[A trio of automated collider event analysis tools are described and demonstrated, in the form of a quick-start tutorial. AEACuS interfaces with the standard MadGraph/MadEvent, Pythia, and Delphes simulation chain, via the Root file output. An extensive algorithm library facilitates the computation of standard collider event variables and the transformation of object groups (including jet clustering and substructure analysis). Arbitrary user-defined variables and external function calls are also supported. An efficient mechanism is provided for sorting events into channels with distinct features. RHADAManTHUS generates publication-quality one- and two-dimensional histograms from event statistics computed by AEACuS, calling MatPlotLib on the back end. Large batches of simulation (representing either distinct final states and/or oversampling of a common phase space) are merged internally, and per-event weights are handled consistently throughout. Arbitrary bin-wise functional transformations are readily specified, e.g. for visualizing signal-to-background significance as a function of cut threshold. MInOS implements machine learning on computed event statistics with XGBoost. Ensemble training against distinct background components may be combined to generate composite classifications with enhanced discrimination. ROC curves, as well as score distribution, feature importance, and significance plots are generated on the fly. Each of these tools is controlled via instructions supplied in a reusable cardfile, employing a simple, compact, and powerful meta-language syntax.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>This article provides an introduction to three related programs, AEAC S, RHADAM THUS, and MI OS, which respectively automate event selection, plotting, and machine learning on simulated collider data. It closely mirrors the content of a live software tutorial <ref type="bibr">[1]</ref> presented by the author at the 2021 Computational Tools for High Energy Physics and Cosmology <ref type="bibr">[2]</ref> workshop, hosted by the IP2I in Lyon, France. Accordingly, it is not intended to provide comprehensive documentation of the associated tools, but rather a quick-start interactive experience in the "learning by doing" style. An archive demo.zip containing all referenced data sets, cardfiles, and software tools (updated slightly from CompTools2021) is available for download from the workshop's Indico <ref type="bibr">[1]</ref> page. While working through this tutorial, one may wish to follow along with a video recording of the live presentation, which is likewise available from the same location. Updates to the core software are distributed at GitHub <ref type="bibr">[3]</ref>, under the GNU General Public License <ref type="bibr">[4]</ref>, v3.</p><p>The operation of each software tool is specified via instructions in a reusable cardfile, employing a simple, compact, and powerful meta-language syntax. In keeping, this tutorial will focus largely on deconstructing the effect of commands stipulated in various exemplar cards. At the end of this tutorial, the reader who has followed along on their own computer should have developed a clear sense of how each software tool operates, along with a broad conception of what features and functional capabilities are available.</p><p>We will proceed in Section 2 by demonstrating how AEAC S interfaces with standard collider simulation utilities. In Section 3, we introduce the AEAC S meta-language. Section 4 explores an example physics analysis, using AEAC S to generate a lightweight event summary with computation of requested observables from raw simulation data. We next transition to demonstrating analysis tools in the context of a larger pre-generated data set (available in demo.zip), highlighting channel sorting with AEAC S in Section 5, then plotting with RHADAM THUS in Section 6, and finally machine learning with MI OS in Section 7. We close with conclusions and acknowledgments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Interfacing AEAC S with simulation tools</head><p>AEAC S <ref type="bibr">[3]</ref> automates the generalized analysis of simulated collider events. It contains an extensive algorithm library that facilitates the computation of standard collider event observables and the classification of object groups, including jet clustering and substructure analysis. Arbitrary user-defined functions and external processing calls are also supported. An efficient mechanism is provided for sorting events into channels with distinct features. It is specifically designed to interface with the standard M G /M E <ref type="bibr">[5]</ref>, P <ref type="bibr">[6]</ref>, and D <ref type="bibr">[7]</ref> simulation chain. In particular, the current distribution of AEAC S 4 <ref type="bibr">[3]</ref> (beta release) has been thoroughly tested against M G 5_ MC@NLO 3.3.1, P 8.306, and D 3.5.0, with P and D installed using the ./bin/mg5_aMC interface. D also requires prior installation of R <ref type="bibr">[8,</ref><ref type="bibr">9]</ref>, and AEAC S derives most of its event input from the D R file output. Tests have been conducted with R 6.24/06, installed from source with the flag -DPYTHON_EXECUTABLE=$( which python3 ). This tutorial assumes a sufficiently similarly configured installation, running on a U /L flavor variant (including M OS).</p><p>We will begin with a full-stack simulation of &#119901; &#119901; &#8594; &#119882; &#119885; events, to be processed with the instructions in Card A, which attempts to replicate the CMS Standard Model (SM) study in Ref. <ref type="bibr">[10]</ref>. One should navigate via the command line to their M G installation folder, and call the executable ./bin/mg5_aMC to start an interactive session. In the M G shell, define a multi-particle incorporating the two charged &#119882;-bosons with the instruction define w = w+ w-. Then, generate the process of interest with generate p p &gt; w z. Next, stipulate the inclusive simulation of up to one additional light partonic jet, as add process p p &gt; w z j. Finally, output a folder to hold the requested process, as output WZJ, and exit.</p><p>Entering the created folder cd WZJ, we will then navigate cd Cards to the card folder to make some edits. To ensure that P and D are called, we will copy the default cards for those tools into their active forms, as cp pythia8_card_default.dat pythia8_card.dat and cp delphes_card_default.dat delphes_card.dat . A few modifications to the run_card.dat should also be made, using a text editor. In particular, it is important to set the run tag, e.g. WZJ = run_tag, as this identifier will be carried through the full analysis. We may also wish to trim down the event production, e.g. to 25 = nevents, for the sake of efficiency during the demonstration. Note that it is no longer necessary to stipulate False = use_systematics as suggested in the video, if using a current AEAC S distribution. A copy of the analysis instructions in Card A (distributed as cut_card_WZ.dat in the Cards folder of the demo.zip archive from Indico <ref type="bibr">[1]</ref>) should also be placed into the local Cards directory of the M G process folder. Subsequently, we can back out cd ../ one level of depth and initiate event generation ./bin/generate_events -f. We suggest resubmitting this command once the first process completes, in order to demonstrate how AEAC S handles oversampling of a repeated phase space, as indicated by duplicative run tags.</p><p>The AEAC S "executable" is delivered as a single P script named aeacus.pl within the source archive at GitHub <ref type="bibr">[3]</ref>. Since P is an interpreted language, this file serves both as the program source document and runtime portal. Benefits of this paradigm include transparency, selfsufficiency, and inherent platform independence -AEAC S is ready for immediate use, without the need for installation or compilation, on any computer with a reasonably modern P environment (versions 5.8.9 and following). A copy of the script aeacus.pl should be placed in the local binary path ./bin/internal/ and called ./bin/internal/aeacus.pl cut_card_WZ.dat with the intended control card (defaulting to cut_card.dat) as an input parameter.</p><p>This invocation sets a number of processes in motion. First, D R files under the ./Events path are located, converted to the LHCO <ref type="bibr">[11]</ref> format (with extensions for the handling of per-event weights and specified auxiliary data), and stored in the ./Events directory. Requested event observables are computed, specified cuts are applied, and results are output with statistics summarizing the selection flow as .cut files in the ./Cuts directory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">The AEAC S meta-language</head><p>This section describes generic features of the meta-language syntax, which apply to control cards for all three programs. Perhaps the most distinctive feature of this language is the fact that it does not provide variables in the traditional sense. We will describe the structures which it employs instead as "shelves". The important concept is that a shelf exists independently of whether any goods are currently stored on it. Additionally, a shelf may be labeled with its specified function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PoS(CompTools2021)027</head><p>AEACuS, RHADAManTHUS, &amp; MInOS Joel W. Walker Each separate type of shelf is assigned a unique three-character alpha-numeric (with leading alpha) key. For example, JET shelves successively refine and gather groups of jet objects, PTM shelves extract and store transverse momentum magnitudes, and MT2 shelves calculate and retain values of the &#119872; &#119879; 2 statistic. Since multiple instances of a given type may be required, e.g. for housing distinct lepton classifications, shelves also carry a three-digit index, from to 999, joined to the key by an underscore, like "LEP_ 1".</p><p>Each instruction line in an AEAC S cardfile begins with a shelf identifier, e.g. JET_ 1, followed by an equals sign (=), and then a parameter specification consisting of key and value pairs, individually joined by a colon (:), and separated from adjacent pairs by a comma (,). Each parameter key is a three-character alphanumeric (leading alpha) string, e.g. PRM for pseudo-rapidity magnitude, that uniquely specifies the role of the subsequently input value. The value assigned to each key may be a number, i.e. integers or floating point numbers (including signed values and values in scientific notation like +9.119E+ 1), strings enclosed in double quotes (e.g. "$M_Z$"), another key, or a full shelf identifier. There is also a special syntax for function definitions, to be described subsequently. Following M G , the base unit of energy and momentum in AEAC S is the GeV.</p><p>Alternatively, a list of values may be specified, containing elements drawn from the prior classes, individually separated by commas (,) and enclosed in square brackets, e.g [ ,2.5]. A frequently employed formatting idiom is the "[MIN,MAX]" pair, consisting of a two-element list indicating a numeric range. So long as the provided values are sequentially ordered, the set of matched values are all numbers inclusively bounded by the specified range. An undefined value for MIN is treated as indefinitely small, and an undefined value for MAX is treated as indefinitely large. If no numerically valid limits are provided, then all values match. If the upper and lower boundaries are numerically equivalent, then only that single value matches. However, a subtle additional functionality is accessed under the circumstance that provided values are out of numerical sequence. In this case, a match is achieved if the comparison value is either at least as large as MIN or at least as small as MAX; in other words, the interval from MAX up to MIN is rejected, exclusive of the boundaries.</p><p>Shelf designations must be placed at the beginning of a line, with no leading whitespace, and continuation from the prior line is indicated through simple indentation, with any number of spaces or tabs. Comments are initiated with the hash/pound (#) character, extending to the end of the line. The order in which lines are specified in the cardfile is not material to the sequence in which they are ultimately evaluated. Rather, program execution follows a prescribed order through shelf keys in conjunction with an ascending numerical sort on shelf indices.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Event selection with AEAC S</head><p>This section focuses on interpreting instructions from the example analysis in Card A. Starting on Line 3, all retained electron candidates are required to have a transverse momentum magnitude &#119875; &#119879; &gt; 7 GeV and a pseudorapidity magnitude |&#120578;| &lt; 2.5. The cut applied to this line requires at least 0 such leptons. This is not therefore a meaningful cut, but its inclusion will force this shelf to be represented in the output .cut file. Next, candidate muons and jets are similarly restricted.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PoS(CompTools2021)027</head><p>Card A: Rendering of CMS &#119901; &#119901; &#8594; &#119882; &#119885; study from Ref. <ref type="bibr">[10]</ref>.</p><p>Proceeding to Line 6, we next begin to form compounded sets of jets and leptons. Whereas the zeroth member of each category is automatically supplied, subsequent entries must stipulate the source SRC category or categories from which they are drawn. JET_ 1 is built from the zeroth jet classification, and members must have a comparative delta-&#119877; of less than 0.4 radians from the comparison set (jets compare against leptons and leptons compare against jets), namely the zeroth lepton shelf. The event is rejected by application of a CUT if the number of members having the described characteristics is not exactly zero. In other words, this line vetoes events where a candidate jet is on top of a poorly isolated candidate lepton. Next, admission to jet classification</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PoS(CompTools2021)027</head><p>AEACuS, RHADAManTHUS, &amp; MInOS Joel W. Walker (electron/muon/tau) restriction of "not 3", which amounts to exclusion of hadronic taus, and the event is vetoed unless there are precisely three electron/muon candidates. The second lepton classification takes the first classification as its source and extracts a subset of member objects using the dilepton selection algorithm. Parameters indicate that this should be an opposite-sign, same-flavor dilepton, with reconstructed mass as close as possible to the &#119885;-boson mass, and not further from this mass than 15 GeV. Members are required to have a transverse momentum magnitude &#119875; &#119879; &gt; 7 GeV (these values are also output), and there must be two such members surviving. The third classification proceeds from the second, extracting the leading member, and increasing the applicable selection to &#119875; &#119879; &gt; 25 GeV. The fourth classification is a compound set, sourcing members from set one (the three leptons) that are not also members of set two (the reconstructed &#119885;). In other words, it represents the lepton associated with the candidate &#119882;-boson. The transverse momentum magnitude selection is again elevated, and that value is output, along with all four-vector coordinates of the associated object. The fifth classification sources from the dilepton, and introduces a new keyword for the creation of an effective object, applying the fourvector summing algorithm, and outputting the resultant mass and transverse momentum magnitude. The sixth classification similarly sums the trilepton, and generates output. Classification seven builds a dilepton candidate from the three classified leptons, with no restrictions on sign or flavor, preferencing a reconstructed mass of zero, within 4 GeV. The applied cut then serves to veto events where all constructible dilepton pairs do not have a mass of at least 4 GeV. The fourth jet classification actually constructs a sort of missing transverse momentum vector, summing all jets together with all leptons and inverting the resulting four vector. The parameters to the SUM algorithm are Boolean flags for various transformations, respectively masslessness, transverseness, and momentum inversion. The resulting &#119909; and &#119910; momentum components are output.</p><p>Starting on Line 22, we transition from object reconstruction to event selection. The calorimetric missing transverse energy (more precisely, the value exported by D</p><p>) is output, with flags indicating that the components and their associated azimuthal angle should likewise be computed and stored. The missing transverse energy as summed internally by AEAC S is required satisfy / &#119864; &#119879; &gt; 30 GeV. Likewise a transverse mass is computed from the zeroth MET and the trilepton system.</p><p>Line 25 and following exhibit a powerful language feature that allows users to compute new observables as arbitrary functions of previously stored values. Such functions are wrapped in curly braces {}, where the zeroth entry defines the functional form, and subsequent entries list the function inputs. The first input is aliased to $1 in the formula, the second to $2, and etc. Referencing prior outputs, the five inputs to the first custom variable function are momentum components (&#119875; &#119882; &#119909; , &#119875; &#119882; &#119910; , &#119875; &#119882; &#119911; ) and ( / &#119875; &#119909; , / &#119875; &#119910; ), respectively associated with the candidate &#119882;-boson and the inverted four vector sum in the fourth jet classification. In terms of these inputs, together with the known &#119882;-boson mass, the prescribed calculation may be expressed as follows.</p><p>After calling AEAC S in a simulation folder as described in the prior section, an output file is created in the ./Cuts folder for each processed LHCO event record. This file contains headers summarizing the event selection flow, together with matrices detailing the overlap of any cuts applied in parallel. Each requested event statistic is then tabulated for every surviving event, in a manner</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PoS(CompTools2021)027</head><p>AEACuS, RHADAManTHUS, &amp; MInOS Joel W. Walker that facilitates subsequent reprocessing by all three programs. Multiple statistically independent simulations of a common final state are named with a unique trailing index.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Event channel sorting with AEAC S</head><p>In this section and those following we will be working with a larger preprocessed data set, distributed as part of the described demo.zip archive. These samples were produced with Card B, as adapted from the supersymmetry search with small mass gaps in Ref. <ref type="bibr">[12]</ref>. We will not elaborate the line-by-line operation of this card, but will instead focus on a few features not previously described. For reference, the various event selection shelves are storing values of the missing transverse energy, the ditau mass statistic, cos &#120579; * , &#119872; &#119879; 2 , the MET delta-phi angle, various azimuthal object separations, and a collection of user-defined variables.</p><p>Line 4 of Card B introduces the concept of event selection channels. The zeroth channel encompasses all previously described functionality, and related keys may be referenced explicitly in order to customize its operation. In this case we stipulate directories (two levels up) for the storage of generated .lhco and .cut files. This is particularly useful when doing large simulations in a cluster environment. Several copies of the process subfolder may be deployed simultaneously for oversampling of the target phase space, while output files are accumulated in a common location, and automatically assigned a unique incremented index.</p><p>Starting in Line 48, we present an example of channel sorting. First, an event selection cut is defined (though not yet applied), based on previously computed observables. In this case case the event selection retains monojet events, according to the first jet classification. Next, positively indexed channels are defined by subscription to some number of event selections, referenced by their shelf index. Negation of the index is used for anti-selection. Presently, we are generating two event channels, the first of which contains monojet events, and the second everything else. Specifically, the second set features dijet events, as may be seen by inspecting the definition of the referenced jet classification.</p><p>The CHN_ 1 monojet outputs are collected under the ./Cuts folder of the demonstration archive, and will form the basis of all subsequent analysis. Readers are encouraged to explore that folder, where they will find records associated with a supersymmetry model featuring a 110 GeV slepton and an 80 GeV neutralino, as well as SM &#119905; t, &#119885;, &#119885; &#119885;, &#119882; &#119885;, and &#119882;&#119882; final states, all with inclusive simulation of one or more additional jets. There are five batches of statistically independent simulation for the supersymmetric final state, and eighty batches (to promote sufficient residual statistical power after event selection) of the SM final states, each starting from some tensof-thousands of events. The downstream RHADAM THUS and MI OS utilities treat files with distinct base run tags as non-overlapping final states, whereas repeated tags with distinct trailing indices are treated as oversampling. It is recommended that the scattering process and any tranching cuts be embedded in the selected run tag name. It is also a good habit to close with the collision energy, e.g. "_14TeV", since ending on an alpha character in this manner prevents the accidental clipping of otherwise trailing digits.</p><p>It is also possible to perform secondary channel sorting on .cut files after the initial processing phase. This will be exhibited presently, referencing instructions in Card C, which is also distributed as cut_card_flow.dat in the Cards folder of the demonstration archive. In this card, a number   of event selection cuts are defined, referencing observables computed with Card B. Note, as in selection 1 3, that it is also possible to create an event selection KEY on the fly, using the function syntax. In this case, the MET delta-phi angle is simply rescaled to units of &#119901;&#119894;. Since we are no longer within an enclosing M G process folder, it is necessary in the channel definitions to stipulate which directories and files are to be reprocessed. For the sake of example, we take all &#119885; plus inclusive jet samples in CHN_ 1 of the local ./Cuts folder. Three channel sortings are defined by subscription to various defined selections. Readers may test the described functionality by navigating to the folder where demo.zip has been unpacked and issuing the command ./aeacus.pl ./Cards/cut_card_flow.dat. Note that the card location can be anywhere on the system if specified explicitly with leading path information. This operation should result in three new channel subfolders created underneath the original source channel and containing the filtered event records.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PoS(CompTools2021)027</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Plotting with RHADAM THUS</head><p>RHADAM THUS generates publication-quality one-and two-dimensional histograms from event statistics computed by AEAC S, invoking M P L <ref type="bibr">[13]</ref> on the back end. Large batches of simulation (representing either distinct final states and/or oversampling of a common phase space) are merged internally, and per-event weights are handled consistently throughout. Arbitrary bin-wise functional transformations are readily specified, e.g. for visualizing signal-to-background significance as a function of cut threshold. Figures <ref type="figure">1</ref> and<ref type="figure">2</ref> have been generated from the .cut files in the demonstration archive, based on instructions contained in Card D, which is also distributed as plt_card.dat in the subdirectory Cards. The reader is invited to reproduce these plots by calling ./rhadamanthus.pl from the folder to which demo.zip is unpacked. The cardfile may be omitted because it corresponds to the default name and location. This operation should result in the creation of a new folder named Plots containing the requested plots. If the M P L library is not installed then P script literals will be delivered instead of figures. Execution of these scripts on a compliant system will yield the desired graphics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PoS(CompTools2021)027</head><p>We proceed now with an analysis of commands itemized in Card D. Starting on Line 1, a number of data sets are defined by referencing file names and source directories. The wildcard character * is used here to select multiple files, and it is also possible to provide multiple entries for FIL. In Line 5, data sets are assembled into channels, together with a plotting KEY computed previously by AEAC S.</p><p>In Line 8, the zeroth histogram key is used to store defaults. In particular, this example disables P 3, hatching, and filling. It further enables a bolded line for the first three data sets plotted, while turning it back off for the fourth. The output format is set to "PDF". Plots are normalized with a logarithmic vertical axis and no bin summing or bin smoothing. The plot title and legend (with one entry per data set) are provided as character strings, with embedded L A T E X equation formatting. A list of custom plotting colors is also provided, in the form of hexadecimal strings.</p><p>Positively indexed histograms are slated for output next, starting from Line 14. Channel shelves are referenced by index in order to indicate which observables and data sets are to be represented. A binning specification requires defined values for three keys out of the four representing left boundary, right boundary, bin span, and bin count. Minimal and maximal values may be provided explicitly for the plotting range. Horizontal and vertical axis labels are formatted similarly to the plot title and legend. Various text shortcuts are made available, e.g. &lt;DEF&gt;, which renders as the differential event fraction &#119889;&#120590;/&#119889;&#119864; &#247; &#120590;. Finally, a name is provided for the output plot. Normalized signal versus background shape plots such as those produced in histograms 1, 2, and 3 (cf. panels a, b, and c of Figure <ref type="figure">1</ref>) facilitate the visual identification of concentrations, inflections, and knees in event distributions, informing the construction of optimized discriminants.</p><p>Various advance features are presented next, starting from the definition of a new event selection cut in Line 32. This particular selection vetoes events where the first mass shelf value lies within a 20 GeV wide window about the &#119885;-boson. Two new channels are defined with a common key, each subscribing to the prior cut. The first channel references the three SM data sets, while the second references the signal data. Histogram 1 1 overrides several of the default assignments, introducing a rightward summation, turning off normalization, smoothing with a width of two bins, and disabling the log axis option. The undefined plot range values indicate that suitable boundaries should be selected automatically. Since the plot is not normalized, it is important to specify the projected luminosity, set here to 30 events per femtobarn. The plotting channel is a functional transformation of previously defined channels corresponding to the bin-wise significance ratio &#119878;/ &#8730; 1 + &#119861;. Three such significance curves are plotted, with the single signal data set broadcast across all three data sets for the background. The requested bin summing yields an upper bound selection threshold plot in the horizontal axis observable. Panel d of Figure <ref type="figure">1</ref> suggests that such a cut is not immediately beneficial in the current context.</p><p>Preparation for an example two-dimensional histogram begins in Line 45. Pairs of new data sets and channels are defined, and associated with a new event selections. Two plotting keys are provided, one of which is a key function, corresponding to the two plotting axes. A two-dimensional histogram has a single plotting channel, which is assigned here to a functional transformation of the signal and background. The output in Figure <ref type="figure">2</ref> suggests that further event selections focus on parameter regions to the upper-left.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Machine learning with MI OS</head><p>MI OS implements machine learning on computed event statistics with XGB <ref type="bibr">[14]</ref>. Ensemble training against distinct background components may be combined to generate composite classifications with enhanced discrimination. ROC curves, as well as score distribution, feature importance, and significance plots are generated on the fly.</p><p>Figures <ref type="figure">3</ref> and<ref type="figure">4</ref> have been generated from the .cut files in the demonstration archive, based on instructions contained in Card E, which is also distributed as min_card.dat in the subdirectory Cards. The reader is invited to reproduce these plots by calling ./minos.pl from the folder to which demo.zip is unpacked. The cardfile may be omitted because it corresponds to the default name and location. This operation should result in the creation of a new folder hierarchy Models/TRN_ 1 . Subdirectories CSV and Plots contain reduced event data and plots. If the XGB library is not installed then a P script literal will be delivered but not run. Execution of this script on a compliant system will yield the desired training, testing, and graphical output.</p><p>We proceed now with an analysis of commands itemized in Card E. The reader will observe several similarities to the structure of a plotting card. Data sets are defined starting on Line 1, as before. The zeroth training shelf may be used for the specification of defaults, as in Line 5. Presently, it provides the list of observable keys to be included in training and also a list of manual overrides for the L A T E X formatting of selected keys. Event selection cuts are itemized, starting on    </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PoS(CompTools2021)027</head><p>AEACuS, RHADAManTHUS, &amp; MInOS Joel W. Walker Line 21. Channels are built from data sets, starting on Line 23. Channels are associated with any desired cuts and assigned a category label, e.g. 0 for background and 1 for signal. Training is requested on Line 25, referencing each of the previously defined channels.</p><p>The Boosted Decision Tree (BDT) assigns each member of the test data set a score on the continuum from 0 (background like) to 1 (signal like). Panel a of Figure <ref type="figure">3</ref> suggests that events known to be background (blue curve) are indeed assigned systematically lower training scores than events known to be signal (orange curve), though the two distributions are not well isolated in this case. Depending upon the placement of a threshold cut along the signal classification axis, there will be a variable instance of false positive and false negative assignments. The significance curves in panel b facilitate visual optimization of this transition point, in terms of the surviving event count &#119878;, the regulated signal to background ratio &#119878;/(1 + &#119861;), which is relevant to controlling systematics, and the regulated signal significance &#119878;/ &#8730;</p><p>1 + &#119861; at a certain target luminosity. The ROC curve in panel c provides a standardized metric of signal and background separability, where better performance is indicated by extension of the shaded blue area into the upper-left corner of the plot. BDTs are known to provide good process transparency relative to many other types of machine learning, and the feature importance chart in panel d is used to rank the discriminating power of individual observables.</p><p>The plots in Figure <ref type="figure">3</ref> were established by merging all events (with correct cross-section weighting) within each label category in anticipation of a single joint training. However, MI OS also generates a matrix of distinct classification scores by training separately on pairings of individual data sets comprising each label category. A composite classification score assembled from this ensemble of trainings can sometimes give better separation than a single joint training. The plots in Figure <ref type="figure">4</ref> are of this latter type.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Conclusions</head><p>The programs AEAC S, RHADAM THUS, and MI OS have been introduced with a tutorial including execution of code on reference data and elaboration of example process control cards. These programs are maintained at GitHub <ref type="bibr">[3]</ref> and are freely available for public download (GNU GPLv3 <ref type="bibr">[4]</ref>). The author is quite open to inquiries, suggestions, and requests for support on any topic regarding application of this software to the physics analysis of simulated collider data.  Card B: A search for supersymmetry with small mass gaps, cf. Ref. <ref type="bibr">[12]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PoS(CompTools2021)027</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>PoS(CompTools2021)027</head><p>AEACuS  LGD:[ "$t\bar{t}j$","Zjj","$VVj$", "$S_{3 }^{11 }jj$" ],</p><note type="other">, RHADAManTHUS, &amp; MInOS Joel W. Walker</note><p>13 CLR:[ "e41a1c", "377eb8", "4daf4a", "ff7f " ] Card E: Machine learning cardfile example.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0"><p>requires a heavy flavor tag, but the membership of this class must be zero. In other words, this is a &#119887;-jet veto. Classification</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1"><p>is formed from a subset of the source category using the algorithm LED, which extracts just the leading member by transverse momentum magnitude. The transverse momentum of this jet is extracted and stored on the fourth PTM shelf.On Line 9, a similar subclassification of leptons is begun. The first lepton shelf has a flavor</p></note>
		</body>
		</text>
</TEI>
