Abstract The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have seen increasing adoption since they are portable, real-time and provide long reads. However, there has been limited progress on compression of nanopore sequencing reads obtained in FASTQ files since most existing tools are either general-purpose or specialized for short read data. We present NanoSpring, a reference-free compressor for nanopore sequencing reads, relying on an approximate assembly approach. We evaluate NanoSpring on a variety of datasets including bacterial, metagenomic, plant, animal, and human whole genome data. For recently basecalled high quality nanopore datasets, NanoSpring, which focuses only on the base sequences in the FASTQ file, uses just 0.35–0.65 bits per base which is 3–6$$\times$$ lower than general purpose compressors like gzip. NanoSpring is competitive in compression ratio and compression resource usage with the state-of-the-art tool CoLoRd while being significantly faster at decompression when using multiple threads (> 4$$\times$$ faster decompression with 20 threads). NanoSpring is available on GitHub athttps://github.com/qm2/NanoSpring.
more »
« less
This content will become publicly available on December 1, 2025
The mechanical and sensory signature of plant-based and animal meat
Abstract Eating less meat is associated with a healthier body and planet. Yet, we remain reluctant to switch to a plant-based diet, largely due to the sensory experience of plant-based meat. Food scientists characterize meat using a double compression test, which only probes one-dimensional behavior. Here we use tension, compression, and shear tests–combined with constitutive neural networks–to automatically discover the behavior of eight plant-based and animal meats across the entire three-dimensional spectrum. We find that plant-based sausage and hotdog, with stiffnesses of 95.9 ± 14.1 kPa and 38.7 ± 3.0 kPa, successfully mimic their animal counterparts, with 63.5 ± 45.7 kPa and 44.3 ± 13.2 kPa, while tofurky is twice as stiff, and tofu is twice as soft. Strikingly, a complementary food tasting survey produces in nearly identical stiffness rankings for all eight products (ρ= 0.833,p = 0.015). Probing the fully three-dimensional signature of meats is critical to understand subtle differences in texture that may result in a different perception of taste. Our data and code are freely available athttps://github.com/LivingMatterLab/CANN
more »
« less
- Award ID(s):
- 2320933
- PAR ID:
- 10601102
- Publisher / Repository:
- Springer
- Date Published:
- Journal Name:
- npj Science of Food
- Volume:
- 8
- Issue:
- 1
- ISSN:
- 2396-8370
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Clustering is a fundamental task in machine learning. One of the most successful and broadly used algorithms is DBSCAN, a density-based clustering algorithm. DBSCAN requires ϵ-nearest neighbor graphs of the input dataset, which are computed with range-search algorithms and spatial data structures like KD-trees. Despite many efforts to design scalable implementations for DBSCAN, existing work is limited to low-dimensional datasets, as constructing ϵ-nearest neighbor graphs can be expensive in high-dimensions. This article introduces a modified DBSCAN, usingk-nearest neighbor (kNN) graphs to improve efficiency. We outline conditions forkNN-DBSCAN to match DBSCAN’s results and present a parallel implementation using OpenMP and MPI for shared and distributed memory systems. Testing on datasets up to 32 dimensions, we achieve remarkable scalability. Our implementation clusters one billion 3D points in under one second on 28K cores at TACC’s Frontera system. In a larger run, we cluster 65 billion points in 20 dimensions in under 40 seconds using 114,688 cores. Our method is up to 37× faster than state-of-the-art parallel DBSCAN on a 20-dimensional dataset with 4 million points. Code is available athttps://github.com/ut-padas/knndbscan.more » « less
-
Abstract Numerous artificial intelligence-based weather prediction (AIWP) models have emerged over the past 2 years, mostly in the private sector. There is an urgent need to evaluate these models from a meteorological perspective, but access to the output of these models is limited. We detail two new resources to facilitate access to AIWP model output data in the hope of accelerating the investigation of AIWP models by the meteorological community. First, a 3-yr (and growing) reforecast archive beginning in October 2020 containing twice daily 10-day forecasts forFourCastNet v2-small,Pangu-Weather, andGraphCast Operationalis now available via an Amazon Simple Storage Service (S3) bucket through NOAA’s Open Data Dissemination (NODD) program (https://noaa-oar-mlwp-data.s3.amazonaws.com/index.html). This reforecast archive was initialized with both the NOAA’s Global Forecast System (GFS) and ECMWF’s Integrated Forecasting System (IFS) initial conditions in the hope that users can begin to perform the feature-based verification of impactful meteorological phenomena. Second, real-time output for these three models is visualized on our web page (https://aiweather.cira.colostate.edu) along with output from the GFS and the IFS. This allows users to easily compare output between each AIWP model and traditional, physics-based models with the goal of familiarizing users with the characteristics of AIWP models and determine whether the output aligns with expectations, is physically consistent and reasonable, and/or is trustworthy. We view these two efforts as a first step toward evaluating whether these new AIWP tools have a place in forecast operations.more » « less
-
Abstract Nonnegative matrix factorization (NMF) is widely used to analyze high-dimensional count data because, in contrast to real-valued alternatives such as factor analysis, it produces an interpretable parts-based representation. However, in applications such as spatial transcriptomics, NMF fails to incorporate known structure between observations. Here, we present nonnegative spatial factorization (NSF), a spatially-aware probabilistic dimension reduction model based on transformed Gaussian processes that naturally encourages sparsity and scales to tens of thousands of observations. NSF recovers ground truth factors more accurately than real-valued alternatives such as MEFISTO in simulations, and has lower out-of-sample prediction error than probabilistic NMF on three spatial transcriptomics datasets from mouse brain and liver. Since not all patterns of gene expression have spatial correlations, we also propose a hybrid extension of NSF that combines spatial and nonspatial components, enabling quantification of spatial importance for both observations and features. A TensorFlow implementation of NSF is available fromhttps://github.com/willtownes/nsf-paper.more » « less
-
There is increasing consumer demand for alternative animal protein products that are delicious and sustainably produced to address concerns about the impacts of mass-produced meat on human and planetary health. Cultured meat has the potential to provide a source of nutritious dietary protein that both is palatable and has reduced environmental impact. However, strategies to support the production of cultured meats at the scale required for food consumption will be critical. In this review, we discuss the current challenges and opportunities of using edible scaffolds for scaling up the production of cultured meat. We provide an overview of different types of edible scaffolds, scaffold fabrication techniques, and common scaffold materials. Finally, we highlight potential advantages of using edible scaffolds to advance cultured meat production by accelerating cell growth and differentiation, providing structure to build complex 3D tissues, and enhancing the nutritional and sensory properties of cultured meat.more » « less
An official website of the United States government
