skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Handling Detector Characterization Data (Metadata) in XENONnT
Effective metadata management is a consistent challenge faced by many scientific experiments. These challenges are magnified by the evolving needs of the experiment, the intricacies of seamlessly integrating a new system with existing analytical frameworks, and the crucial mandate to maintain database integrity. In this work we present the various challenges faced by experiments that produce a large amount of metadata and describe the solution used by the XENON experiment for metadata management.  more » « less
Award ID(s):
2112801
PAR ID:
10615378
Author(s) / Creator(s):
; ; ;
Editor(s):
De_Vita, R; Espinal, X; Laycock, P; Shadura, O
Publisher / Repository:
EPJ
Date Published:
Journal Name:
EPJ Web of Conferences
Volume:
295
ISSN:
2100-014X
Page Range / eLocation ID:
01033
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. File systems that store metadata on a single machine or via a shared-disk abstraction face scalability challenges, especially in contexts demanding the management of billions of files. Recent work has shown that employing shared-nothing, distributed database system (DDBMS) for metadata storage can alleviate these scalability challenges without compromising on high availability guarantees. However, for low-scale deployments -- where metadata can fit in memory on a single machine -- these DDBMS-based systems typically perform an order of magnitude worse than systems that store metadata in memory on a single machine. This has limited the impact of these distributed database approaches, since they are only currently applicable to file systems of extreme scale. This paper describes FileScale, a three-tier architecture that incorporates a DDBMS as part of a comprehensive approach to file system metadata management. In contrast to previous approaches, FileScale performs comparably to the single-machine architecture at a small scale, while enabling linear scalability as the file system metadata increases. 
    more » « less
  2. AbstractManaging, processing, and sharing research data and experimental context produced on modern scientific instrumentation all present challenges to the materials research community. To address these issues, two MaRDA Working Groups on FAIR Data in Materials Microscopy Metadata and Materials Laboratory Information Management Systems (LIMS) convened and generated recommended best practices regarding data handling in the materials research community. Overall, the Microscopy Metadata Group recommends (1) instruments should capture comprehensive metadata about operators, specimens/samples, instrument conditions, and data formation; and (2) microscopy data and metadata should use standardized vocabularies and community standard identifiers. The LIMS Group produced the following guides and recommendations: (1) a cost and benefit comparison when implementing LIMS; (2) summaries of prerequisite requirements, capabilities, and roles of LIMS stakeholders; and (3) a review of metadata schemas and information-storage best practices in LIMS. Together, the groups hope these recommendations will accelerate breakthrough scientific discoveries via FAIR data. Impact statementWith the deluge of data produced in today’s materials research laboratories, it is critical that researchers stay abreast of developments in modern research data management, particularly as it relates to the international effort to make data more FAIR – findable, accessible, interoperable, and reusable. Most crucially, being able to responsibly share research data is a foundational means to increase progress on the materials research problems of high importance to science and society. Operational data management and accessibility are pivotal in accelerating innovation in materials science and engineering and to address mounting challenges facing our world, but the materials research community generally lags behind its cognate disciplines in these areas. To address this issue, the Materials Research Coordination Network (MaRCN) convened two working groups comprised of experts from across the materials data landscape in order to make recommendations to the community related to improvements in materials microscopy metadata standards and the use of Laboratory Information Management Systems (LIMS) in materials research. This manuscript contains a set of recommendations from the working groups and reflects the culmination of their 18-month efforts, with the hope of promoting discussion and reflection within the broader materials research community in these areas. Graphical abstract 
    more » « less
  3. ABSTRACT This paper reports on a demonstration of YAMZ (Yet Another Metadata Zoo) as a mechanism for building community consensus around metadata terms. The demonstration is motivated by the complexity of the metadata standards environment and the need for more user-friendly approaches for researchers to achieve vocabulary consensus. The paper reviews a series of metadata standardization challenges, explores crowdsourcing factors that offer possible solutions, and introduces the YAMZ system. A YAMZ demonstration is presented with members of the Toberer materials science laboratory at the Colorado School of Mines, where there is a need to confirm and maintain a shared understanding for the vocabulary supporting research documentation, data management, and their larger metadata infrastructure. The demonstration involves three key steps: 1) Sampling terms for the demonstration, 2) Engaging graduate student researchers in the demonstration, and 3) Reflecting on the demonstration. The results of these steps, including examples of the dialog provenance among lab members and voting, show the ease with YAMZ can facilitate building metadata vocabulary consensus. The conclusion discusses implications and highlights next steps. 
    more » « less
  4. null; null (Ed.)
    Biodiversity image repositories are crucial sources of training data for machine learning approaches to biological research. Metadata, specifically metadata about object quality, is putatively an important prerequisite to selecting sample subsets for these experiments. This study demonstrates the importance of image quality metadata to a species classification experiment involving a corpus of 1935 fish specimen images which were annotated with 22 metadata quality properties. A small subset of high quality images produced an F1 accuracy of 0.41 compared to 0.35 for a taxonomically matched subset of low quality images when used by a convolutional neural network approach to species identification. Using the full corpus of images revealed that image quality differed between correctly classified and misclassified images. We found the visibility of all anatomical features was the most important quality feature for classification accuracy. We suggest biodiversity image repositories consider adopting a minimal set of image quality metadata to support future machine learning projects. 
    more » « less
  5. De_Vita, R; Espinal, X; Laycock, P; Shadura, O (Ed.)
    The Large Hadron Collider (LHC) experiments distribute data by leveraging a diverse array of National Research and Education Networks (NRENs), where experiment data management systems treat networks as a “blackbox” resource. After the High Luminosity upgrade, the Compact Muon Solenoid (CMS) experiment alone will produce roughly 0.5 exabytes of data per year. NREN Networks are a critical part of the success of CMS and other LHC experiments. However, during data movement, NRENs are unaware of data priorities, importance, or need for quality of service, and this poses a challenge for operators to coordinate the movement of data and have predictable data flows across multi-domain networks. The overarching goal of SENSE (The Software-defined network for End-to-end Networked Science at Exascale) is to enable National Labs and universities to request and provision end-to-end intelligent network services for their application workflows leveraging SDN (Software-Defined Networking) capabilities. This work aims to allow LHC Experiments and Rucio, the data management software used by CMS Experiment, to allocate and prioritize certain data transfers over the wide area network. In this paper, we will present the current progress of the integration of SENSE, Multi-domain end-to-end SDN Orchestration with QoS (Quality of Service) capabilities, with Rucio, the data management software used by CMS Experiment. 
    more » « less