skip to main content


Title: The NANOGrav Search for Signals from New Physics: MCMC chains

MCMC chains for the GWB analyses performed in the paper "The NANOGrav 15 yr Data Set: Search for Signals from New Physics". 

The data is provided in pickle format. Each file contains a NumPy array with the MCMC chain (with burn-in already removed), and a dictionary with the model parameters' names as keys and their priors as values. You can load them as

with open ('path/to/file.pkl', 'rb') as pick: temp = pickle.load(pick) params = temp[0] chain = temp[1]

The naming convention for the files is the following:

  • igw: inflationary Gravitational Waves (GWs)
  • sigw: scalar-induced GWs
    • sigw_box: assumes a box-like feature in the primordial power spectrum.
    • sigw_delta: assumes a delta-like feature in the primordial power spectrum.
    • sigw_gauss: assumes a Gaussian peak feature in the primordial power spectrum.
  • pt: cosmological phase transitions
    • pt_bubble: assumes that the dominant contribution to the GW productions comes from bubble collisions.
    • pt_sound: assumes that the dominant contribution to the GW productions comes from sound waves.
  • stable: stable cosmic strings
    • stable-c: stable strings emitting GWs only in the form of GW bursts from cusps on closed loops.
    • stable-k: stable strings emitting GWs only in the form of GW bursts from kinks on closed loops.
    • stable-m: stable strings emitting monochromatic GW at the fundamental frequency.
    • stable-n: stable strings described by numerical simulations including GWs from cusps and kinks.
  • meta: metastable cosmic strings
    • meta-l: metastable strings with GW emission from loops only.
    • meta-ls metastable strings with GW emission from loops and segments.
  • super: cosmic superstrings.
  • dw: domain walls
    • dw-sm: domain walls decaying into Standard Model particles.
    • dw-dr: domain walls decaying into dark radiation.

For each model, we provide four files. One for the run where the new-physics signal is assumed to be the only GWB source. One for the run where the new-physics signal is superimposed to the signal from Supermassive Black Hole Binaries (SMBHB), for these files "_bhb" will be appended to the model name. Then, for both these scenarios, in the "compare" folder we provide the files for the hypermodel runs that were used to derive the Bayes' factors.

In addition to chains for the stochastic models, we also provide data for the two deterministic models considered in the paper (ULDM and DM substructures). For the ULDM model, the naming convention of the files is the following (all the ULDM signals are superimposed to the SMBHB signal, see the discussion in the paper for more details)

  • uldm_e: ULDM Earth signal.
  • uldm_p: ULDM pulsar signal
    • uldm_p_cor: correlated limit
    • uldm_p_unc: uncorrelated limit
  • uldm_c: ULDM combined Earth + pulsar signal direct coupling 
    • uldm_c_cor: correlated limit
    • uldm_c_unc: uncorrelated limit
  • uldm_vecB: vector ULDM coupled to the baryon number
    • uldm_vecB_cor: correlated limit
    • uldm_vecB_unc: uncorrelated limit 
  • uldm_vecBL: vector ULDM coupled to B-L
    • uldm_vecBL_cor: correlated limit
    • uldm_vecBL_unc: uncorrelated limit
  • uldm_c_grav: ULDM combined Earth + pulsar signal for gravitational-only coupling
    • uldm_c_grav_cor: correlated limit
      • uldm_c_cor_grav_low: low mass region  
      • uldm_c_cor_grav_mon: monopole region
      • uldm_c_cor_grav_low: high mass region
    • uldm_c_unc: uncorrelated limit
      • uldm_c_unc_grav_low: low mass region  
      • uldm_c_unc_grav_mon: monopole region
      • uldm_c_unc_grav_low: high mass region

For the substructure (static) model, we provide the chain for the marginalized distribution (as for the ULDM signal, the substructure signal is always superimposed to the SMBHB signal)

 
more » « less
Award ID(s):
2207267 2111738
NSF-PAR ID:
10435015
Author(s) / Creator(s):
Publisher / Repository:
Zenodo
Date Published:
Edition / Version:
1.0.0
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data files were used in support of the research paper titled “Mitigating RF Jamming Attacks at the Physical Layer with Machine Learning" which has been submitted to the IET Communications journal.

    ---------------------------------------------------------------------------------------------

    All data was collected using the SDR implementation shown here: https://github.com/mainland/dragonradio/tree/iet-paper. Particularly for antenna state selection, the files developed for this paper are located in 'dragonradio/scripts/:'

    • 'ModeSelect.py': class used to defined the antenna state selection algorithm
    • 'standalone-radio.py': SDR implementation for normal radio operation with reconfigurable antenna
    • 'standalone-radio-tuning.py': SDR implementation for hyperparameter tunning
    • 'standalone-radio-onmi.py': SDR implementation for omnidirectional mode only

    ---------------------------------------------------------------------------------------------

    Authors: Marko Jacovic, Xaime Rivas Rey, Geoffrey Mainland, Kapil R. Dandekar
    Contact: krd26@drexel.edu

    ---------------------------------------------------------------------------------------------

    Top-level directories and content will be described below. Detailed descriptions of experiments performed are provided in the paper.

    ---------------------------------------------------------------------------------------------

    classifier_training: files used for training classifiers that are integrated into SDR platform

    • 'logs-8-18' directory contains OTA SDR collected log files for each jammer type and under normal operation (including congested and weaklink states)
    • 'classTrain.py' is the main parser for training the classifiers
    • 'trainedClassifiers' contains the output classifiers generated by 'classTrain.py'

    post_processing_classifier: contains logs of online classifier outputs and processing script

    • 'class' directory contains .csv logs of each RTE and OTA experiment for each jamming and operation scenario
    • 'classProcess.py' parses the log files and provides classification report and confusion matrix for each multi-class and binary classifiers for each observed scenario - found in 'results->classifier_performance'

    post_processing_mgen: contains MGEN receiver logs and parser

    • 'configs' contains JSON files to be used with parser for each experiment
    • 'mgenLogs' contains MGEN receiver logs for each OTA and RTE experiment described. Within each experiment logs are separated by 'mit' for mitigation used, 'nj' for no jammer, and 'noMit' for no mitigation technique used. File names take the form *_cj_* for constant jammer, *_pj_* for periodic jammer, *_rj_* for reactive jammer, and *_nj_* for no jammer. Performance figures are found in 'results->mitigation_performance'

    ray_tracing_emulation: contains files related to Drexel area, Art Museum, and UAV Drexel area validation RTE studies.

    • Directory contains detailed 'readme.txt' for understanding.
    • Please note: the processing files and data logs present in 'validation' folder were developed by Wolfe et al. and should be cited as such, unless explicitly stated differently. 
      • S. Wolfe, S. Begashaw, Y. Liu and K. R. Dandekar, "Adaptive Link Optimization for 802.11 UAV Uplink Using a Reconfigurable Antenna," MILCOM 2018 - 2018 IEEE Military Communications Conference (MILCOM), 2018, pp. 1-6, doi: 10.1109/MILCOM.2018.8599696.

    results: contains results obtained from study

    • 'classifier_performance' contains .txt files summarizing binary and multi-class performance of online SDR system. Files obtained using 'post_processing_classifier.'
    • 'mitigation_performance' contains figures generated by 'post_processing_mgen.'
    • 'validation' contains RTE and OTA performance comparison obtained by 'ray_tracing_emulation->validation->matlab->outdoor_hover_plots.m'

    tuning_parameter_study: contains the OTA log files for antenna state selection hyperparameter study

    • 'dataCollect' contains a folder for each jammer considered in the study, and inside each folder there is a CSV file corresponding to a different configuration of the learning parameters of the reconfigurable antenna. The configuration selected was the one that performed the best across all these experiments and is described in the paper.
    • 'data_summary.txt'this file contains the summaries from all the CSV files for convenience.
     
    more » « less
  2. Binder is a publicly accessible online service for executing interactive notebooks based on Git repositories. Binder dynamically builds and deploys containers following a recipe stored in the repository, then gives the user a browser-based notebook interface. The Binder group periodically releases a log of container launches from the public Binder service. Archives of launch records are available here. These records do not include identifiable information like IP addresses, but do give the source repo being launched along with some other metadata. The main content of this dataset is in the binder.sqlite file. This SQLite database includes launch records from 2018-11-03 to 2021-06-06 in the events table, which has the following schema.

    CREATE TABLE events( version INTEGER, timestamp TEXT, provider TEXT, spec TEXT, origin TEXT, ref TEXT, guessed_ref TEXT ); CREATE INDEX idx_timestamp ON events(timestamp);
    • version indicates the version of the record as assigned by Binder. The origin field became available with version 3, and the ref field with version 4. Older records where this information was not recorded will have the corresponding fields set to null.
    • timestamp is the ISO timestamp of the launch
    • provider gives the type of source repo being launched ("GitHub" is by far the most common). The rest of the explanations assume GitHub, other providers may differ.
    • spec gives the particular branch/release/commit being built. It consists of <github-id>/<repo>/<branch>.
    • origin indicates which backend was used. Each has its own storage, compute, etc. so this info might be important for evaluating caching and performance. Note that only recent records include this field. May be null.
    • ref specifies the git commit that was actually used, rather than the named branch referenced by spec. Note that this was not recorded from the beginning, so only the more recent entries include it. May be null.
    • For records where ref is not available, we attempted to clone the named reference given by spec rather than the specific commit (see below). The guessed_ref field records the commit found at the time of cloning. If the branch was updated since the container was launched, this will not be the exact version that was used, and instead will refer to whatever was available at the time (early 2021). Depending on the application, this might still be useful information. Selecting only records with version 4 (or non-null ref) will exclude these guessed commits. May be null.

    The Binder launch dataset identifies the source repos that were used, but doesn't give any indication of their contents. We crawled GitHub to get the actual specification files in the repos which were fed into repo2docker when preparing the notebook environments, as well as filesystem metadata of the repos. Some repos were deleted/made private at some point, and were thus skipped. This is indicated by the absence of any row for the given commit (or absence of both ref and guessed_ref in the events table). The schema is as follows.

    CREATE TABLE spec_files ( ref TEXT NOT NULL PRIMARY KEY, ls TEXT, runtime BLOB, apt BLOB, conda BLOB, pip BLOB, pipfile BLOB, julia BLOB, r BLOB, nix BLOB, docker BLOB, setup BLOB, postbuild BLOB, start BLOB );

    Here ref corresponds to ref and/or guessed_ref from the events table. For each repo, we collected spec files into the following fields (see the repo2docker docs for details on what these are). The records in the database are simply the verbatim file contents, with no parsing or further processing performed.

    • runtime: runtime.txt
    • apt: apt.txt
    • conda: environment.yml
    • pip: requirements.txt
    • pipfile: Pipfile.lock or Pipfile
    • julia: Project.toml or REQUIRE
    • r: install.R
    • nix: default.nix
    • docker: Dockerfile
    • setup: setup.py
    • postbuild: postBuild
    • start: start

    The ls field gives a metadata listing of the repo contents (excluding the .git directory). This field is JSON encoded with the following structure based on JSON types:

    • Object: filesystem directory. Keys are file names within it. Values are the contents, which can be regular files, symlinks, or subdirectories.
    • String: symlink. The string value gives the link target.
    • Number: regular file. The number value gives the file size in bytes.
    CREATE TABLE clean_specs ( ref TEXT NOT NULL PRIMARY KEY, conda_channels TEXT, conda_packages TEXT, pip_packages TEXT, apt_packages TEXT );

    The clean_specs table provides parsed and validated specifications for some of the specification files (currently Pip, Conda, and APT packages). Each column gives either a JSON encoded list of package requirements, or null. APT packages have been validated using a regex adapted from the repo2docker source. Pip packages have been parsed and normalized using the Requirement class from the pkg_resources package of setuptools. Conda packages have been parsed and normalized using the conda.models.match_spec.MatchSpec class included with the library form of Conda (distinct from the command line tool). Users might want to use these parsers when working with the package data, as the specifications can become fairly complex.

    The missing table gives the repos that were not accessible, and event_logs records which log files have already been added. These tables are used for updating the dataset and should not be of interest to users.

     
    more » « less
  3. This dataset contains machine learning and volunteer classifications from the Gravity Spy project. It includes glitches from observing runs O1, O2, O3a and O3b that received at least one classification from a registered volunteer in the project. It also indicates glitches that are nominally retired from the project using our default set of retirement parameters, which are described below. See more details in the Gravity Spy Methods paper. 

    When a particular subject in a citizen science project (in this case, glitches from the LIGO datastream) is deemed to be classified sufficiently it is "retired" from the project. For the Gravity Spy project, retirement depends on a combination of both volunteer and machine learning classifications, and a number of parameterizations affect how quickly glitches get retired. For this dataset, we use a default set of retirement parameters, the most important of which are: 

    1. A glitches must be classified by at least 2 registered volunteers
    2. Based on both the initial machine learning classification and volunteer classifications, the glitch has more than a 90% probability of residing in a particular class
    3. Each volunteer classification (weighted by that volunteer's confusion matrix) contains a weight equal to the initial machine learning score when determining the final probability

    The choice of these and other parameterization will affect the accuracy of the retired dataset as well as the number of glitches that are retired, and will be explored in detail in an upcoming publication (Zevin et al. in prep). 

    The dataset can be read in using e.g. Pandas: 
    ```
    import pandas as pd
    dataset = pd.read_hdf('retired_fulldata_min2_max50_ret0p9.hdf5', key='image_db')
    ```
    Each row in the dataframe contains information about a particular glitch in the Gravity Spy dataset. 

    Description of series in dataframe

    • ['1080Lines', '1400Ripples', 'Air_Compressor', 'Blip', 'Chirp', 'Extremely_Loud', 'Helix', 'Koi_Fish', 'Light_Modulation', 'Low_Frequency_Burst', 'Low_Frequency_Lines', 'No_Glitch', 'None_of_the_Above', 'Paired_Doves', 'Power_Line', 'Repeating_Blips', 'Scattered_Light', 'Scratchy', 'Tomte', 'Violin_Mode', 'Wandering_Line', 'Whistle']
      • Machine learning scores for each glitch class in the trained model, which for a particular glitch will sum to unity
    • ['ml_confidence', 'ml_label']
      • Highest machine learning confidence score across all classes for a particular glitch, and the class associated with this score
    • ['gravityspy_id', 'id']
      • Unique identified for each glitch on the Zooniverse platform ('gravityspy_id') and in the Gravity Spy project ('id'), which can be used to link a particular glitch to the full Gravity Spy dataset (which contains GPS times among many other descriptors)
    • ['retired']
      • Marks whether the glitch is retired using our default set of retirement parameters (1=retired, 0=not retired)
    • ['Nclassifications']
      • The total number of classifications performed by registered volunteers on this glitch
    • ['final_score', 'final_label']
      • The final score (weighted combination of machine learning and volunteer classifications) and the most probable type of glitch
    • ['tracks']
      • Array of classification weights that were added to each glitch category due to each volunteer's classification

     

    ```
    For machine learning classifications on all glitches in O1, O2, O3a, and O3b, please see Gravity Spy Machine Learning Classifications on Zenodo

    For the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.

    For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo. 

     
    more » « less
  4. The historical settlement data compilation for Spain (HISDAC-ES) is a geospatial dataset consisting of over 240 gridded surfaces measuring the physical, functional, age-related, and evolutionary characteristics of the Spanish building stock. We scraped, harmonized, and aggregated cadastral building footprint data for Spain, covering over 12,000,000 building footprints including construction year attributes, to create a multi-faceted series of gridded surfaces (GeoTIFF format), describing the evolution of human settlements in Spain from 1900 to 2020, at 100m spatial and 5 years temporal resolution. Also, the dataset contains aggregated characteristics and completeness statistics at the municipality level, in CSV and GeoPackage format.

    !!! UPDATE 08-2023 !!!: We provide a new, improved version of HISDAC-ES. Specifically, we fixed two bugs in the production code that caused an incorrect rasterization of the multitemporal BUFA layers and of the PHYS layers (BUFA, BIA, DWEL, BUNITS sum and mean). Moreover, we added decadal raster datasets measuring residential building footprint and building indoor area (1900-2020), and provide a country-wide, harmonized building footprint centroid dataset in GeoPackage vector data format.

    File descriptions:

    Datasets are available in three spatial reference systems:

    1. HISDAC-ES_All_LAEA.zip: Raster data in Lambert Azimuthal Equal Area (LAEA) covering all Spanish territory.
    2. HISDAC-ES_IbericPeninsula_UTM30.zip: Raster data in UTM Zone 30N covering all the Iberic Peninsula + Céuta and Melilla.
    3. HISDAC-ES_CanaryIslands_REGCAN.zip: Raster data in REGCAN-95, covering the Canary Islands only.
    4. HISDAC-ES_MunicipAggregates.zip: Municipality-level aggregates and completeness statistics (CSV, GeoPackage), in LAEA projection.
    5. ES_building_centroids_merged_spatjoin.gpkg: 7,000,000+ building footprint centroids in GeoPackage format, harmonized from the different cadastral systems, representing the input data for HISDAC-ES. These data can be used for sanity checks or for the creation of further, user-defined gridded surfaces.

    Source data:

    HISDAC-ES is derived from cadastral building footprint data, available from different authorities in Spain:

    • Araba province: https://geo.araba.eus/WFS_Katastroa?SERVICE=WFS&VERSION=1.1.0&REQUEST=GetCapabilities
    • Bizkaia province: https://web.bizkaia.eus/es/inspirebizkaia
    • Gipuzkoa province: https://b5m.gipuzkoa.eus/web5000/es/utilidades/inspire/edificios/
    • Navarra region: https://inspire.navarra.es/services/BU/wfs
    • Other regions: http://www.catastro.minhap.es/INSPIRE/buildings/ES.SDGC.bu.atom.xml
    • Data source of municipality polygons: Centro Nacional de Información Geográfica (https://centrodedescargas.cnig.es/CentroDescargas/index.jsp)

    Technical notes:

    Gridded data

    File nomenclature:

    ./region_projection_theme/hisdac_es_theme_variable_version_resolution[m][_year].tif

    Regions:

    • all: complete territory of Spain
    • can: Canarian Islands only
    • ibe: Iberic peninsula + Céuta + Melilla

    Projections:

    • laea: Lambert azimuthal equal area (EPSG:3035)
    • regcan: REGCAN95 / UTM zone 28N (EPSG:4083)
    • utm: ETRS89 / UTM zone 30N (EPSG:25830)

    Themes:

    • evolution / evol: multi-temporal physical measurements
    • landuse: multi-temporal building counts per land use (i.e., building function) class
    • physical / phys: physical building characteristics in 2020
    • temporal / temp: temporal characteristics (construction year statistics)

    Variables: evolution

    • budens: building density (count per grid cell area)
    • bufa: building footprint area
    • deva: developed area (any grid cell containing at least one building)
    • resbufa: residential building footprint area
    • resbia: residential building indoor area

    Variables: physical

    • bia: building indoor area
    • bufa: building footprint area
    • bunits: number of building units
    • dwel: number of dwellings

    Variables: temporal

    • mincoy: minimum construction year per grid cell
    • maxcoy: minimum construction year per grid cell
    • meancoy: mean construction year per grid cell
    • medcoy: median construction year per grid cell
    • modecoy: mode (most frequent) construction year per grid cell
    • varcoy: variety of construction years per grid cell

    Variable: landuse

    Counts of buildings per grid cell and land use type.

    Municipality-level data

    • hisdac_es_municipality_stats_multitemporal_longform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in long form. Note that a value of 0 for the year attribute denotes the statistics for records without construction year information.
    • hisdac_es_municipality_stats_multitemporal_wideform_v1.csv: This CSV file contains the zonal sums of the gridded surfaces (e.g., number of buildings per year and municipality) in wide form. Note that a value of 0 for the year suffix denotes the statistics for records without construction year information.
    • hisdac_es_municipality_stats_completeness_v1.csv: This CSV file contains the missingness rates (in %) of the building attribute per municipality, ranging from 0.0 (attribute exists for all buildings) to 100.0 (attribute exists for none of the buildings) in a given municipality.

    Column names for the completeness statistics tables:

    • NATCODE: National municipality identifier*
    • num_total: number of buildings per munic
    • perc_bymiss: Percentage of buildings with missing built year (construction year)
    • perc_lumiss: Percentage of buildings with missing landuse attribute
    • perc_luother: Percentage of buildings with landuse type "other"
    • perc_num_floors_miss: Percentage of buildings without valid number of floors attribute
    • perc_num_dwel_miss: Percentage of buildings without valid number of dwellings attribute
    • perc_num_bunits_miss: Percentage of buildings without valid number of building units attribute
    • perc_offi_area_miss: Percentage of buildings without valid official area (building indoor area, BIA) attribute
    • perc_num_dwel_and_num_bunits_miss: Percentage of buildings missing both number of dwellings and number of building units attribute

    The same statistics are available as geopackage file including municipality polygons in Lambert azimuthal equal area (EPSG:3035).

    *From the NATCODE, other regional identifiers can be derived as follows:

    • NATCODE: 34 01 04 04001
    • Country: 34
    • Comunidad autónoma (CA_CODE): 01
    • Province (PROV_CODE): 04
    • LAU code: 04001 (province + municipality code)
     
    more » « less
  5. Abstract The 15 yr pulsar timing data set collected by the North American Nanohertz Observatory for Gravitational Waves (NANOGrav) shows positive evidence for the presence of a low-frequency gravitational-wave (GW) background. In this paper, we investigate potential cosmological interpretations of this signal, specifically cosmic inflation, scalar-induced GWs, first-order phase transitions, cosmic strings, and domain walls. We find that, with the exception of stable cosmic strings of field theory origin, all these models can reproduce the observed signal. When compared to the standard interpretation in terms of inspiraling supermassive black hole binaries (SMBHBs), many cosmological models seem to provide a better fit resulting in Bayes factors in the range from 10 to 100. However, these results strongly depend on modeling assumptions about the cosmic SMBHB population and, at this stage, should not be regarded as evidence for new physics. Furthermore, we identify excluded parameter regions where the predicted GW signal from cosmological sources significantly exceeds the NANOGrav signal. These parameter constraints are independent of the origin of the NANOGrav signal and illustrate how pulsar timing data provide a new way to constrain the parameter space of these models. Finally, we search for deterministic signals produced by models of ultralight dark matter (ULDM) and dark matter substructures in the Milky Way. We find no evidence for either of these signals and thus report updated constraints on these models. In the case of ULDM, these constraints outperform torsion balance and atomic clock constraints for ULDM coupled to electrons, muons, or gluons. 
    more » « less