skip to main content


Title: IceCube's Long Term Archive Software
IceCube is a cubic kilometer neutrino detector located at the South Pole. It generates 1 TiB of raw data per day, which must be archived for possible retrieval years or decades later. Other low-level data products are also archived for easy retrieval in the event of a catastrophic data center failure. The Long Term Archive software is IceCube's answer to archiving this data across several computing sites.  more » « less
Award ID(s):
1841479
NSF-PAR ID:
10110671
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning) - PEARC '19
Page Range / eLocation ID:
1 to 5
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Surface Light Scattering Spectroscopy (SLSS) can characterize the dynamics of an interface between two immiscible fluids by measuring the frequency spectrum of coherent light scattered from thermophysical fluctuations—‘ripplons’. In principle, and for many interfaces, SLSS can simultaneously measure surface tension and viscosity, with the potential for higher-order properties, such as surface elasticity and bending moments. Previously, this has been challenging. We describe and present some measurements from an instrument with improvements in optical design, specimen access, vibrational stability, signal-to-noise ratio, electronics, and data processing. Quantitative improvements include total internal reflection at the interface to enhance the typically available signal by a factor of order 40 and optical improvements that minimize adverse effects of sloshing induced by external vibrations. Information retrieval is based on a comprehensive surface response function, an instrument function, which compensates for real geometrical and optical limitations, and processing of almost real-time data to report results and their likely accuracy. Detailed models may be fit to the power spectrum in real time. The raw one-dimensional digitized data stream is archived to allow post-experiment processing. This paper reports a system design and implementation that offers substantial improvements in accuracy, simplicity, ease of use, and cost. The presented data are for systems in regions of low viscosity where the ripplons are underdamped, but the hardware described is more widely applicable.

     
    more » « less
  2. null (Ed.)
    The vastness of the web imposes a prohibitive cost on building large-scale search engines with limited resources. Crawl frontiers thus need to be optimized to improve the coverage and freshness of crawled content. In this paper, we propose an approach for modeling the dynamics of change in the web using archived copies of webpages. To evaluate its utility, we conduct a preliminary study on the scholarly web using 19,977 seed URLs of authors’ homepages obtained from their Google Scholar profiles. We first obtain archived copies of these webpages from the Internet Archive (IA), and estimate when their actual updates occurred. Next, we apply maximum likelihood to estimate their mean update frequency ( ) values. Our evaluation shows that   values derived from a short history of archived data provide a good estimate for the true update frequency in the short-term, and that our method provides better estimations of updates at a fraction of resources compared to the baseline models. Based on this, we demonstrate the utility of archived data to optimize the crawling strategy of web crawlers, and uncover important challenges that inspire future research directions. 
    more » « less
  3. Abstract

    The Caspar Creek Experimental Watersheds are the site of a long‐term paired watershed study in the northern Coast Ranges of California. The watersheds are predominately forested with coast redwood and Douglas‐fir. Old‐growth forest was logged between 1860 and 1904. Two harvesting experiments have been completed since then and a third experiment is currently underway. Caspar Creek data are split into three phases corresponding to three experiments: Phase 1 (1962–1985) reports on a selection harvest (1971–1973) and initial recovery in the South Fork watershed; Phase 2 (1985–2017) includes clearcut harvesting of ~50% of the North Fork watershed (1985–1992) and recovery; and Phase 3 (2017 onward) corresponds to a second selection harvest in the South Fork watershed with a range of subwatershed harvest intensities (2017–2019) and recovery. All three experiments included harvest‐related road‐building and relied primarily on measurements of streamflow and sediment delivery from both treated and reference watersheds. Major findings include modest increases in post‐harvest peak flows and cumulative flow volumes, post‐harvest low flows that initially increased and then decreased 12 to 15 years after harvesting, and the consequences of different yarding techniques and road design on sediment yields. Some of the data for Phase 1 and Phase 2 are available in a USDA Forest Service online archive. The archived data include precipitation, streamflow, suspended sediment concentrations, turbidity, accumulated weir pond sediment volumes, bedload transport rates, water stable isotope data, and geospatial data. Archiving activities are ongoing. Phase 3 data are currently being collected and will be archived after a post‐harvest monitoring period.

     
    more » « less
  4. This is the data archive for:
    Meyer et al. 2022. Plant neighborhood shapes diversity and reduces interspecific variation of the phyllosphere microbiome. ISME-J. Please cite this article when using these archived data.
    DOI: 10.1038/s41396-021-01184-6

    Included are raw genetic sequences of the V5-V7 region of the 16S rRNA gene derived from experimental leaf surfaces of tomato, pepper, and bean plants.

    Included in this archive are:
    Raw sequence data (RawFASTQ.zip)
    Reproducible R scripts (MeyerEtAl2021_RScript.R, VarPartSupplement.R)
    R objects corresponding to archived scripts (.RDS)
    Data for generating certain plots (PermanovaRValues.txt, PermanovaValuesByHost.txt, NeutralModelRValuesByHarvest.txt, VarPartHostEffects.txt)
    Sample metadata (NeighborhoodMetaData.txt)
    Phylogenetic Tree file for sample ASVs (PhyloTree.tre)
    Geographic distance matrix for distances between plots (GeodistNeighborhood.txt)
    ddPCR (microbial abundance) data (ddPCR_Neighborhood.csv)
    R script for rarefication function (Rarefy_mean.R)
    Taxonomic assignments for all ASVs in study (Taxonomy_Neighborhood.txt)
    R image files to load R environment instead of running script (MeyerEtAl2021_RScript.RData, VarPartSupplement.RData)



     
    more » « less
  5. Data-driven methods have attracted increasingly more attention in materials research since the advent of the material genome initiative. The combination of materials science with computer science, statistics, and data-driven methods aims to expediate materials research and applications and can utilize both new and archived research data. In this paper, we present a data driven and deep learning approach that builds a portion of the structure–property relationship for polymer nanocomposites. Analysis of archived experimental data motivates development of a computational model which allows demonstration of the approach and gives flexibility to sufficiently explore a wide range of structures. Taking advantage of microstructure reconstruction methods and finite element simulations, we first explore qualitative relationships between microstructure descriptors and mechanical properties, resulting in new findings regarding the interplay of interphase, volume fraction and dispersion. Then we present a novel deep learning approach that combines convolutional neural networks with multi-task learning for building quantitative correlations between microstructures and property values. The performance of the model is compared with other state-of-the-art strategies including two-point statistics and structure descriptor-based approaches. Lastly, the interpretation of the deep learning model is investigated to show that the model is able to capture physical understandings while learning. 
    more » « less