skip to main content

Title: Constraints on Future Analysis Metadata Systems in High Energy Physics
Abstract

In high energy physics (HEP), analysis metadata comes in many forms—from theoretical cross-sections, to calibration corrections, to details about file processing. Correctly applying metadata is a crucial and often time-consuming step in an analysis, but designing analysis metadata systems has historically received little direct attention. Among other considerations, an ideal metadata tool should be easy to use by new analysers, should scale to large data volumes and diverse processing paradigms, and should enable future analysis reinterpretation. This document, which is the product of community discussions organised by the HEP Software Foundation, categorises types of metadata by scope and format and gives examples of current metadata solutions. Important design considerations for metadata systems, including sociological factors, analysis preservation efforts, and technical factors, are discussed. A list of best practices and technical requirements for future analysis metadata systems is presented. These best practices could guide the development of a future cross-experimental effort for analysis metadata tools.

Authors:
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; « less
Publication Date:
NSF-PAR ID:
10375325
Journal Name:
Computing and Software for Big Science
Volume:
6
Issue:
1
ISSN:
2510-2036
Publisher:
Springer Science + Business Media
Sponsoring Org:
National Science Foundation
More Like this
  1. Although engineering graduates are well prepared in the technical aspects of engineering, it is widely acknowledged that there is a need for a greater understanding of the socio-economic contexts in which they will practice their profession. The National Academy of Engineering (NAE) reinforces the critical role that engineers should play in addressing both problems and opportunities that are technical, social, economic, and political in nature in solving the grand challenges. This paper provides an overview of a nascent effort to address this educational need. Through a National Science Foundation (NSF) funded program, a team of researchers at West Virginia University has launched a Holistic Engineering Project Experience (HEPE). This undergraduate course provides the opportunity for engineering students to work with social science students from the fields of economics and strategic communication on complex and open-ended transportation engineering problems. This course involves cross-disciplinary teams working under diverse constraints of real-world social considerations, such as economic impacts, public policy concerns, and public perception and outreach factors, considering the future autonomous transportation systems. The goal of the HEPE platform is for engineering students to have an opportunity to build non-technical—but highly in-demand—professional skills that promote collaboration with others involved in the socio-economic contextmore »of engineering matters. Conversely, the HEPE approach provides an opportunity for non-engineering students to become exposed to key concepts and practices in engineering. This paper outlines the initial implementation of the HEPE program, by placing the effort in context of broader trends in education, by outlining the overall purposes of the program, discussing the course design and structure, reviewing the learning experience and outcomes assessment process, and providing preliminary results of a baseline survey that gauges students interests and attitudes towards collaborative and interdisciplinary learning.« less
  2. One of the most costly factors in providing a global computing infrastructure such as the WLCG is the human effort in deployment, integration, and operation of the distributed services supporting collaborative computing, data sharing and delivery, and analysis of extreme scale datasets. Furthermore, the time required to roll out global software updates, introduce new service components, or prototype novel systems requiring coordinated deployments across multiple facilities is often increased by communication latencies, staff availability, and in many cases expertise required for operations of bespoke services. While the WLCG (and distributed systems implemented throughout HEP) is a global service platform, it lacks the capability and flexibility of a modern platform-as-a-service including continuous integration/continuous delivery (CI/CD) methods, development-operations capabilities (DevOps, where developers assume a more direct role in the actual production infrastructure), and automation. Most importantly, tooling which reduces required training, bespoke service expertise, and the operational effort throughout the infrastructure, most notably at the resource endpoints (sites), is entirely absent in the current model. In this paper, we explore ideas and questions around potential NoOps models in this context: what is realistic given organizational policies and constraints? How should operational responsibility be organized across teams and facilities? What are the technicalmore »gaps? What are the social and cybersecurity challenges? Conversely what advantages does a NoOps model deliver for innovation and for accelerating the pace of delivery of new services needed for the HL-LHC era? We will describe initial work along these lines in the context of providing a data delivery network supporting IRIS-HEP DOMA R&D.« less
  3. The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. The Research Data Alliance COVID-19 Working Group recently published a set of recommendations and guidelines on data sharing and related best practices for COVID-19 research. These guidelines include recommendations for researchers, policymakers, funders, publishers and infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations). Several overarching themes have emerged from this document such as the need to balance the creation of data adherent to FAIR principles (findable, accessible, interoperable and reusable), with the need for quick data release; the use of trustworthy research data repositories; the use of well-annotated data with meaningful metadata; and practices of documenting methods and software. The resulting document marks an unprecedented cross-disciplinary, cross-sectoral, and cross-jurisdictional effort authored by over 160 experts from around the globe. This letter summarises key points of the Recommendations and Guidelines, highlights the relevant findings, shines a spotlight on the process, andmore »suggests how these developments can be leveraged by the wider scientific community.« less
  4. Abstract

    Spaceflight presents a multifaceted environment for plants, combining the effects on growth of many stressors and factors including altered gravity, the influence of experiment hardware, and increased radiation exposure. To help understand the plant response to this complex suite of factors this study compared transcriptomic analysis of 15Arabidopsis thalianaspaceflight experiments deposited in the National Aeronautics and Space Administration’s GeneLab data repository. These data were reanalyzed for genes showing significant differential expression in spaceflight versus ground controls using a single common computational pipeline for either the microarray or the RNA-seq datasets. Such a standardized approach to analysis should greatly increase the robustness of comparisons made between datasets. This analysis was coupled with extensive cross-referencing to a curated matrix of metadata associated with these experiments. Our study reveals that factors such as analysis type (i.e., microarray versus RNA-seq) or environmental and hardware conditions have important confounding effects on comparisons seeking to define plant reactions to spaceflight. The metadata matrix allows selection of studies with high similarity scores, i.e., that share multiple elements of experimental design, such as plant age or flight hardware. Comparisons between these studies then helps reduce the complexity in drawing conclusions arising from comparisons made between experiments withmore »very different designs.

    « less
  5. This paper discusses three aspects of nonlinear dynamic analysis (NDA) practices that are important for evaluating the seismic performance of geotechnical structures affected by liquefaction or cyclic softening: (1) selection and calibration of constitutive models, (2) comparison of NDA results using two or more constitutive models, and (3) documentation. The ability of the selected constitutive models and calibration protocols to approximate the loading responses important to the system being analyzed is one of several technical factors affecting the quality of results from an NDA. Comparisons of single element simulations against empirical data for a broad range of loading conditions are essential for evaluating this factor. Critical comparisons of NDAs using two or more constitutive models are valuable for evaluating modeling uncertainty for specific systems and for identifying modeling limitations that need improvement. The utility of an NDA study depends on the documentation being sufficiently thorough to facilitate effective reviews, advance best practices, and support future reexaminations of a system's seismic performance.