Abstract Surface Light Scattering Spectroscopy (SLSS) can characterize the dynamics of an interface between two immiscible fluids by measuring the frequency spectrum of coherent light scattered from thermophysical fluctuations—‘ripplons’. In principle, and for many interfaces, SLSS can simultaneously measure surface tension and viscosity, with the potential for higher-order properties, such as surface elasticity and bending moments. Previously, this has been challenging. We describe and present some measurements from an instrument with improvements in optical design, specimen access, vibrational stability, signal-to-noise ratio, electronics, and data processing. Quantitative improvements include total internal reflection at the interface to enhance the typically available signal by a factor of order 40 and optical improvements that minimize adverse effects of sloshing induced by external vibrations. Information retrieval is based on a comprehensive surface response function, an instrument function, which compensates for real geometrical and optical limitations, and processing of almost real-time data to report results and their likely accuracy. Detailed models may be fit to the power spectrum in real time. The raw one-dimensional digitized data stream is archived to allow post-experiment processing. This paper reports a system design and implementation that offers substantial improvements in accuracy, simplicity, ease of use, and cost. The presented data are for systems in regions of low viscosity where the ripplons are underdamped, but the hardware described is more widely applicable. 
                        more » 
                        « less   
                    
                            
                            IceCube's Long Term Archive Software
                        
                    
    
            IceCube is a cubic kilometer neutrino detector located at the South Pole. It generates 1 TiB of raw data per day, which must be archived for possible retrieval years or decades later. Other low-level data products are also archived for easy retrieval in the event of a catastrophic data center failure. The Long Term Archive software is IceCube's answer to archiving this data across several computing sites. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1841479
- PAR ID:
- 10110671
- Date Published:
- Journal Name:
- Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning) - PEARC '19
- Page Range / eLocation ID:
- 1 to 5
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            null (Ed.)The vastness of the web imposes a prohibitive cost on building large-scale search engines with limited resources. Crawl frontiers thus need to be optimized to improve the coverage and freshness of crawled content. In this paper, we propose an approach for modeling the dynamics of change in the web using archived copies of webpages. To evaluate its utility, we conduct a preliminary study on the scholarly web using 19,977 seed URLs of authors’ homepages obtained from their Google Scholar profiles. We first obtain archived copies of these webpages from the Internet Archive (IA), and estimate when their actual updates occurred. Next, we apply maximum likelihood to estimate their mean update frequency ( ) values. Our evaluation shows that values derived from a short history of archived data provide a good estimate for the true update frequency in the short-term, and that our method provides better estimations of updates at a fraction of resources compared to the baseline models. Based on this, we demonstrate the utility of archived data to optimize the crawling strategy of web crawlers, and uncover important challenges that inspire future research directions.more » « less
- 
            Genomic data are being produced and archived at a prodigious rate, and current studies could become historical baselines for future global genetic diversity analyses and monitoring programs. However, when we evaluated the potential utility of genomic data from wild and domesticated eukaryote species in the world’s largest genomic data repository, we found that most archived genomic datasets (87%) lacked the spatiotemporal metadata necessary for genetic biodiversity surveillance. Labor-intensive scouring of a subset of published papers yielded geospatial coordinates and collection years for only 39% (51% if place names were considered) of these genomic datasets. Streamlined data input processes, updated metadata deposition policies, and enhanced scientific community awareness are urgently needed to preserve these irreplaceable records of today’s genetic biodiversity and to plug the growing metadata gap.more » « less
- 
            Long-context LLMs are increasingly in demand for applications such as retrieval-augmented generation. To defray the cost of pretraining LLMs over long contexts, recent work takes an approach of synthetic context extension: fine-tuning LLMs with synthetically generated long-context data in a post-training stage. However, it remains unclear how and why this synthetic context extension imparts abilities for downstream long-context tasks. In this paper, we investigate fine-tuning on synthetic data for three long-context tasks that require retrieval and reasoning. We vary the realism of "needle" concepts to be retrieved and diversity of the surrounding "haystack" context, from using LLMs to construct synthetic documents to using templated relations and creating symbolic datasets. We find that models trained on synthetic data fall short of the real data, but surprisingly, the mismatch can be interpreted and even predicted in terms of a special set of attention heads that are responsible for retrieval over long context, retrieval heads (Wu et al., 2024). The retrieval heads learned on synthetic data have high overlap with retrieval heads learned on real data, and there is a strong correlation between the recall of heads learned and the downstream performance of a model. Furthermore, with attention knockout and activation patching, we mechanistically show that retrieval heads are necessary and explain model performance, although they are not totally sufficient. Our results shed light on how to interpret synthetic data fine-tuning performance and how to approach creating better data for learning real-world capabilities over long contexts.more » « less
- 
            This archived Paleoclimatology Study is available from the NOAA National Centers for Environmental Information (NCEI), under the World Data Service (WDS) for Paleoclimatology. The associated NCEI study type is Cave. The data include parameters of speleothems with a geographic location of Mexico. The time period coverage is from 62500 to 5858 in calendar years before present (BP). See metadata information for parameter and study location details. Please cite this study when using the data.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    