skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Automated Detection of Antenna Malfunctions in Large‐ N Interferometers: A Case Study With the Hydrogen Epoch of Reionization Array
Abstract We present a framework for identifying and flagging malfunctioning antennas in large radio interferometers. We outline two distinct categories of metrics designed to detect outliers along known failure modes of large arrays: cross‐correlation metrics, based on all antenna pairs, and auto‐correlation metrics, based solely on individual antennas. We define and motivate the statistical framework for all metrics used, and present tailored visualizations that aid us in clearly identifying new and existing systematics. We implement these techniques using data from 105 antennas in the Hydrogen Epoch of Reionization Array (HERA) as a case study. Finally, we provide a detailed algorithm for implementing these metrics as flagging tools on real data sets.  more » « less
Award ID(s):
1836019
PAR ID:
10446848
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  more » ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;   « less
Publisher / Repository:
DOI PREFIX: 10.1029
Date Published:
Journal Name:
Radio Science
Volume:
57
Issue:
1
ISSN:
0048-6604
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The Canadian Hydrogen Intensity Mapping Experiment (CHIME) is a drift scan radio telescope operating across the 400–800 MHz band. CHIME is located at the Dominion Radio Astrophysical Observatory near Penticton, BC, Canada. The instrument is designed to map neutral hydrogen over the redshift range 0.8–2.5 to constrain the expansion history of the universe. This goal drives the design features of the instrument. CHIME consists of four parallel cylindrical reflectors, oriented north–south, each 100 m × 20 m and outfitted with a 256-element dual-polarization linear feed array. CHIME observes a two-degree-wide stripe covering the entire meridian at any given moment, observing three-quarters of the sky every day owing to Earth’s rotation. An FX correlator utilizes field-programmable gate arrays and graphics processing units to digitize and correlate the signals, with different correlation products generated for cosmological, fast radio burst, pulsar, very long baseline interferometry, and 21 cm absorber back ends. For the cosmology back end, the N feed 2 correlation matrix is formed for 1024 frequency channels across the band every 31 ms. A data receiver system applies calibration and flagging and, for our primary cosmological data product, stacks redundant baselines and integrates for 10 s. We present an overview of the instrument, its performance metrics based on the first 3 yr of science data, and we describe the current progress in characterizing CHIME’s primary beam response. We also present maps of the sky derived from CHIME data; we are using versions of these maps for a cosmological stacking analysis, as well as for investigation of Galactic foregrounds. 
    more » « less
  2. Abstract MotivationSingle-cell RNA sequencing (scRNAseq) technologies allow for measurements of gene expression at a single-cell resolution. This provides researchers with a tremendous advantage for detecting heterogeneity, delineating cellular maps or identifying rare subpopulations. However, a critical complication remains: the low number of single-cell observations due to limitations by rarity of subpopulation, tissue degradation or cost. This absence of sufficient data may cause inaccuracy or irreproducibility of downstream analysis. In this work, we present Automated Cell-Type-informed Introspective Variational Autoencoder (ACTIVA): a novel framework for generating realistic synthetic data using a single-stream adversarial variational autoencoder conditioned with cell-type information. Within a single framework, ACTIVA can enlarge existing datasets and generate specific subpopulations on demand, as opposed to two separate models [such as single-cell GAN (scGAN) and conditional scGAN (cscGAN)]. Data generation and augmentation with ACTIVA can enhance scRNAseq pipelines and analysis, such as benchmarking new algorithms, studying the accuracy of classifiers and detecting marker genes. ACTIVA will facilitate analysis of smaller datasets, potentially reducing the number of patients and animals necessary in initial studies. ResultsWe train and evaluate models on multiple public scRNAseq datasets. In comparison to GAN-based models (scGAN and cscGAN), we demonstrate that ACTIVA generates cells that are more realistic and harder for classifiers to identify as synthetic which also have better pair-wise correlation between genes. Data augmentation with ACTIVA significantly improves classification of rare subtypes (more than 45% improvement compared with not augmenting and 4% better than cscGAN) all while reducing run-time by an order of magnitude in comparison to both models. Availability and implementationThe codes and datasets are hosted on Zenodo (https://doi.org/10.5281/zenodo.5879639). Tutorials are available at https://github.com/SindiLab/ACTIVA. Supplementary informationSupplementary data are available at Bioinformatics online. 
    more » « less
  3. In classification problems, mislabeled data can have a dramatic effect on the capability of a trained model. The traditional method of dealing with mislabeled data is through expert review. However, this is not always ideal, due to the large volume of data in many classification datasets, such as image datasets supporting deep learning models, and the limited availability of human experts for reviewing the data. Herein, we propose an ordered sample consensus (ORSAC) method to support data cleaning by flagging mislabeled data. This method is inspired by the random sample consensus (RANSAC) method for outlier detection. In short, the method involves iteratively training and testing a model on different splits of the dataset, recording misclassifications, and flagging data that is frequently misclassified as probably mislabeled. We evaluate the method by purposefully mislabeling subsets of data and assessing the method’s capability to find such data. We demonstrate with three datasets, a mosquito image dataset, CIFAR-10, and CIFAR-100, that this method is reliable in finding mislabeled data with a high degree of accuracy. Our experimental results indicate a high proficiency of our methodology in identifying mislabeled data across these diverse datasets, with performance assessed using different mislabeling frequencies. 
    more » « less
  4. Islands have long represented natural laboratories for studying many aspects of ecology and evolutionary biology, from speciation to community assembly. One aspect that has been well documented is the correlation between island size and taxonomic diversity, likely due to decreased complexity and population size on small islands. This same logic can apply to genetic diversity, which should predictably decrease with effective population size. The island size–diversity correlation has received support over the years but often focuses on single metrics of genetic diversity. Here, we useZosteropswhite-eyes in the Solomon Islands to study the correlation between island size and various metrics related to genetic diversity, including runs of homozygosity and fixation of transposable elements. We find that almost all these metrics strongly correlate with island size, and in turn with each other. We infer that island size is independently correlated with these different variables, demonstrating that population size impacts genomic metrics of diversity in a variety of ways across temporal and hierarchical scales. 
    more » « less
  5. Zhang, Ying (Ed.)
    ABSTRACT Treponema pallidum, the causative agent of syphilis, poses a significant global health threat. Its strict reliance on host-derived nutrients and difficulties inin vitrocultivation have impeded detailed metabolic characterization. In this study, we present iTP251, the first genome-scale metabolic model ofT. pallidum, reconstructed and extensively curated to capture its unique metabolic features. These refinements included the curation of key reactions such as pyrophosphate-dependent phosphorylation and pathways for nucleotide synthesis, amino acid synthesis, and cofactor metabolism. The model demonstrated high predictive accuracy, validated by a MEMOTE score of 92%. To further enhance its predictive capabilities, we developed ec-iTP251, an enzyme-constrained version of iTP251, incorporating enzyme turnover rate and molecular weight information for all reactions having gene-protein-reaction associations. Ec-iTP251 provides detailed insights into protein allocation across carbon sources, showing strong agreement with proteomics data (Pearson’s correlation of 0.88) in the central carbon pathway. Moreover, the thermodynamic analysis revealed that lactate uptake serves as an additional ATP-generating strategy to utilize unused proteomes, albeit at the cost of reducing the driving force of the central carbon pathway by 27%. Subsequent analysis identified glycerol-3-phosphate dehydrogenase as an alternative electron sink, compensating for the absence of a conventional electron transport chain while maintaining cellular redox balance. These findings highlightT. pallidum’s metabolic adaptations for survival and redox balance in nutrient-limited, extracellular host environments, providing a foundation for future research into its unique bioenergetics. IMPORTANCEThis study advances our understanding ofTreponema pallidum, the syphilis-causing pathogen, through the reconstruction of iTP251, the first genome-scale metabolic model for this organism, and its enzyme-constrained version, ec-iTP251. The work addresses the challenges of studyingT. pallidum, an extracellular, host-adapted pathogen, due to its strict dependence on host-derived nutrients and challenges inin vitrocultivation. Validated with strong agreement to proteomics data, the model demonstrates high predictive reliability. Key insights include unique metabolic adaptations such as lactate uptake for ATP production and alternative redox-balancing mechanisms. These findings provide a robust framework for future studies aimed at unraveling the pathogen's survival strategies and identifying potential metabolic vulnerabilities. 
    more » « less