skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: iSamples Complete Export Dataset - April 2025
This dataset contains a complete export of all iSamples records as of April 21, 2025, in GeoParquet format. The dataset includes over 6.6 million sample records with rich metadata including geographic coordinates, material classifications, context categories, and related resources. The data was exported using the iSamples export client with the query 'source:*', capturing the complete state of the iSample.xyz repository. Each record includes sample identifiers, descriptions, classifications, geospatial information (using WGS 84 coordinate system), timestamps, and various categorical attributes.  This GeoParquet file provides an efficient format for analyzing the global distribution and classification of physical samples across scientific domains. The dataset is valuable for researchers working with physical samples in geoscience, material science, biology, and related fields who need to discover, access, or analyze sample collections at scale.  more » « less
Award ID(s):
2004839
PAR ID:
10627389
Author(s) / Creator(s):
; ; ; ; ; ;
Corporate Creator(s):
Publisher / Repository:
Zenodo
Date Published:
Subject(s) / Keyword(s):
iSamples GeoParquet sample data geospatial data material samples scientific samples sample repository Environmental Science Earth Sciences Archaeology
Format(s):
Medium: X
Right(s):
Creative Commons Attribution Non Commercial Share Alike 4.0 International
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Material samples are indispensable data sources in many natural science, social science, and humanity disciplines. More and more researchers recognize that samples collected in one discipline can be of great value for another. This has motivated organizations that manage a large number of samples to make their holdings accessible to the world. Currently, multiple projects are working to connect natural history and other samples managed by individual institutions or individuals into a universe of samples that follow FAIR principles. This poster reports the progress of the US NSF‐funded iSamples project, in the context of other efforts initiated by US DOE, DiSCCo, BCoN, and GBIF. By October 2021, we will also be able to present an iSamples prototype. We encourage individual organizations that hold material samples to get to know these projects and help shape these projects to realize the goal of a global linked sample cloud that connects all material samples and is accessible to all. 
    more » « less
  2. {"Abstract":["This dataset contains machine learning and volunteer classifications from the Gravity Spy project. It includes glitches from observing runs O1, O2, O3a and O3b that received at least one classification from a registered volunteer in the project. It also indicates glitches that are nominally retired from the project using our default set of retirement parameters, which are described below. See more details in the Gravity Spy Methods paper. <\/p>\n\nWhen a particular subject in a citizen science project (in this case, glitches from the LIGO datastream) is deemed to be classified sufficiently it is "retired" from the project. For the Gravity Spy project, retirement depends on a combination of both volunteer and machine learning classifications, and a number of parameterizations affect how quickly glitches get retired. For this dataset, we use a default set of retirement parameters, the most important of which are: <\/p>\n\nA glitches must be classified by at least 2 registered volunteers<\/li>Based on both the initial machine learning classification and volunteer classifications, the glitch has more than a 90% probability of residing in a particular class<\/li>Each volunteer classification (weighted by that volunteer's confusion matrix) contains a weight equal to the initial machine learning score when determining the final probability<\/li><\/ol>\n\nThe choice of these and other parameterization will affect the accuracy of the retired dataset as well as the number of glitches that are retired, and will be explored in detail in an upcoming publication (Zevin et al. in prep). <\/p>\n\nThe dataset can be read in using e.g. Pandas: \n```\nimport pandas as pd\ndataset = pd.read_hdf('retired_fulldata_min2_max50_ret0p9.hdf5', key='image_db')\n```\nEach row in the dataframe contains information about a particular glitch in the Gravity Spy dataset. <\/p>\n\nDescription of series in dataframe<\/strong><\/p>\n\n['1080Lines', '1400Ripples', 'Air_Compressor', 'Blip', 'Chirp', 'Extremely_Loud', 'Helix', 'Koi_Fish', 'Light_Modulation', 'Low_Frequency_Burst', 'Low_Frequency_Lines', 'No_Glitch', 'None_of_the_Above', 'Paired_Doves', 'Power_Line', 'Repeating_Blips', 'Scattered_Light', 'Scratchy', 'Tomte', 'Violin_Mode', 'Wandering_Line', 'Whistle']\n\tMachine learning scores for each glitch class in the trained model, which for a particular glitch will sum to unity<\/li><\/ul>\n\t<\/li>['ml_confidence', 'ml_label']\n\tHighest machine learning confidence score across all classes for a particular glitch, and the class associated with this score<\/li><\/ul>\n\t<\/li>['gravityspy_id', 'id']\n\tUnique identified for each glitch on the Zooniverse platform ('gravityspy_id') and in the Gravity Spy project ('id'), which can be used to link a particular glitch to the full Gravity Spy dataset (which contains GPS times among many other descriptors)<\/li><\/ul>\n\t<\/li>['retired']\n\tMarks whether the glitch is retired using our default set of retirement parameters (1=retired, 0=not retired)<\/li><\/ul>\n\t<\/li>['Nclassifications']\n\tThe total number of classifications performed by registered volunteers on this glitch<\/li><\/ul>\n\t<\/li>['final_score', 'final_label']\n\tThe final score (weighted combination of machine learning and volunteer classifications) and the most probable type of glitch<\/li><\/ul>\n\t<\/li>['tracks']\n\tArray of classification weights that were added to each glitch category due to each volunteer's classification<\/li><\/ul>\n\t<\/li><\/ul>\n\n <\/p>\n\n```\nFor machine learning classifications on all glitches in O1, O2, O3a, and O3b, please see Gravity Spy Machine Learning Classifications on Zenodo<\/p>\n\nFor the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.<\/p>\n\nFor detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo. <\/p>"]} 
    more » « less
  3. null (Ed.)
    Abstract Sampling the natural world and built environment underpins much of science, yet systems for managing material samples and associated (meta)data are fragmented across institutional catalogs, practices for identification, and discipline-specific (meta)data standards. The Internet of Samples (iSamples) is a standards-based collaboration to uniquely, consistently, and conveniently identify material samples, record core metadata about them, and link them to other samples, data, and research products. iSamples extends existing resources and best practices in data stewardship to render a cross-domain cyberinfrastructure that enables transdisciplinary research, discovery, and reuse of material samples in 21st century natural science. 
    more » « less
  4. ABSTRACT We catalogue the 443 bright supernovae (SNe) discovered by the All-Sky Automated Survey for Supernovae (ASAS-SN) in 2018−2020 along with the 519 SNe recovered by ASAS-SN and 516 additional mpeak ≤ 18 mag SNe missed by ASAS-SN. Our statistical analysis focuses primarily on the 984 SNe discovered or recovered in ASAS-SN g-band observations. The complete sample of 2427 ASAS-SN SNe includes earlier V-band samples and unrecovered SNe. For each SN, we identify the host galaxy, its UV to mid-IR photometry, and the SN’s offset from the centre of the host. Updated peak magnitudes, redshifts, spectral classifications, and host galaxy identifications supersede earlier results. With the increase of the limiting magnitude to g ≤ 18 mag, the ASAS-SN sample is nearly complete up to mpeak = 16.7 mag and is 90 per cent complete for mpeak ≤ 17.0 mag. This is an increase from the V-band sample, where it was roughly complete up to mpeak = 16.2 mag and 70 per cent complete for mpeak ≤ 17.0 mag. 
    more » « less
  5. Dataset accompanying code and paper: AircraftVerse: A Large-Scale Multimodal Dataset of Aerial Vehicle Designs We present AircraftVerse, a publicly available aerial vehicle design dataset. AircraftVerse contains 27,714 diverse battery powered aircraft designs that have been evaluated using state-of-the-art physics models that characterize performance metrics such as maximum flight distance and hover-time. This repository contains: A zip file "AircraftVerse.zip", where each design_X contains: design_tree.json: The design tree describes the design topology, choice of propulsion and energy subsystems. The tree also contains continuous parameters such as wing span, wing chord and arm length.design_seq.json: A preorder traversal of the design tree and store this as design_seq.json.design_low_level.json: The most low level representation of the design. This low level representation includes significant repetition that is avoided in the tree representation through the use of symmetry.Geom.stp: CAD design for the Aircraft in composition STP format (ISO 10303 standard).cadfile.stl: CAD design for the Aircraft in stereolithographic STL file,output.json: Summary containing the UAV's performance metrics such as maximum flight distance, maximum hover time, fight distance at maximum speed, maximum current draw, and mass.trims.npy: Contains the [Distance, Flight Time, Pitch, Control Input, Thrust, Lift, Drag, Current, Power] at each evaluated trim state (velocity).pointCloud.npy: Numpy array containing the corresponding point clouds for each design. corpus_dic: The corpus of components (e.g. batteries, propellers) that make up all aircraft designs. It is structured as a dictionary of dictionaries, with the high level components: ['Servo', 'GPS', 'ESC', 'Wing', 'Sensor', 'Propeller', 'Receiver', 'Motor', 'Battery', 'Autopilot'], containing a list of dictionaries corresponding to the component type. E.g. corpus_dic['Battery']['TurnigyGraphene2200mAh3S75C'] contains the detail of this particular battery. Corresponding code for this work is included at https://github.com/SRI-CSL/AircraftVerse.  Acknowledgements: This material is based upon work supported by the United States Air Force and DARPA under Contract No. FA8750-20-C-0002.  Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Air Force and DARPA.   
    more » « less