skip to main content


Title: Temporal Correlation of Internet Observatories and Outposts
The Internet has become a critical component of modern civilization requiring scientific exploration akin to endeavors to understand the land, sea, air, and space environments. Understanding the baseline statistical distributions of traffic are essential to the scientific understanding of the Internet. Correlating data from different Internet observatories and outposts can be a useful tool for gaining insights into these distributions. This work compares observed sources from the largest Internet telescope (the CAIDA darknet telescope) with those from a commercial outpost (the GreyNoise honeyfarm). Neither of these locations actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Newly developed GraphBLAS hyperspace matrices and D4M associative array technologies enable the efficient analysis of these data on significant scales. The CAIDA sources are well approximated by a Zipf-Mandelbrot distribution. Over a 6-month period 70% of the brightest (highest frequency) sources in the CAIDA telescope are consistently detected by coeval observations in the GreyNoise honeyfarm. This overlap drops as the sources dim (reduce frequency) and as the time difference between the observations grows. The probability of seeing a CAIDA source is proportional to the logarithm of the brightness. The temporal correlations are well described by a modified Cauchy distribution. These observations are consistent with a correlated high frequency beam of sources that drifts on a time scale of a month.  more » « less
Award ID(s):
1724853
NSF-PAR ID:
10351411
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; « less
Date Published:
Journal Name:
2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Page Range / eLocation ID:
247 to 254
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    ABSTRACT We present a search for radio afterglows from long gamma-ray bursts using the Australian Square Kilometre Array Pathfinder (ASKAP). Our search used the Rapid ASKAP Continuum Survey, covering the entire celestial sphere south of declination +41○, and three epochs of the Variables and Slow Transients Pilot Survey (Phase 1), covering ∼5000 square degrees per epoch. The observations we used from these surveys spanned a nine-month period from 2019 April 21 to 2020 January 11. We cross-matched radio sources found in these surveys with 779 well-localized (to ≤15 arcsec) long gamma-ray bursts occurring after 2004 and determined whether the associations were more likely afterglow- or host-related through the analysis of optical images. In our search, we detected one radio afterglow candidate associated with GRB 171205A, a local low-luminosity gamma-ray burst with a supernova counterpart SN 2017iuk, in an ASKAP observation 511 d post-burst. We confirmed this detection with further observations of the radio afterglow using the Australia Telescope Compact Array at 859 and 884 d post-burst. Combining this data with archival data from early-time radio observations, we showed the evolution of the radio spectral energy distribution alone could reveal clear signatures of a wind-like circumburst medium for the burst. Finally, we derived semi-analytical estimates for the microphysical shock parameters of the burst: electron power-law index p = 2.84, normalized wind-density parameter A* = 3, fractional energy in electrons ϵe = 0.3, and fractional energy in magnetic fields ϵB = 0.0002. 
    more » « less
  2. The Internet has never been more important to our society, and understanding the behavior of the Internet is essential. The Center for Applied Internet Data Analysis (CAIDA) Telescope observes a continuous stream of packets from an unsolicited darkspace representing 1/256 of the Internet. During 2019 and 2020 over 40,000,000,000,000 unique packets were collected representing the largest ever assembled public corpus of Internet traffic. Using the combined resources of the Supercomputing Centers at UC San Diego, Lawrence Berkeley National Laboratory, and MIT, the spatial temporal structure of anonymized source-destination pairs from the CAIDA Telescope data has been analyzed with GraphBLAS hierarchical hyper-sparse matrices. These analyses provide unique insight on this unsolicited Internet darkspace traffic with the discovery of many previously unseen scaling relations. The data show a significant sustained increase in unsolicited traffic corresponding to the start of the COVID19 pandemic, but relatively little change in the underlying scaling relations associated with unique sources, source fan-outs, unique links, destination fan-ins, and unique destinations. This work provides a demonstration of the practical feasibility and benefit of the safe collection and analysis of significant quantities of anonymized Internet traffic. 
    more » « less
  3. Abstract: With the proliferation of Dynamic Spectrum Access (DSA), Internet of Things (IoT), and Mobile Edge Computing (MEC) technologies, various methods have been proposed to deduce key network and user information in cellular systems, such as available cell bandwidths, as well as user locations and mobility. Not only is such information dominated by cellular networks of vital significance on other systems co-located spectrum-wise and/or geographically, but applications within cellular systems can also benefit remarkably from inferring such information, as exemplified by the endeavours made by video streaming to predict cell bandwidth. Hence, we are motivated to develop a new tool to uncover as much information used to be closed to outsiders or user devices as possible with off-the-shelf products. Given the wide-spread deployment of LTE and its continuous evolution to 5G, we design and implement U-CIMAN, a client-side system to accurately UnCover as much Information in Mobile Access Networks as allowed by LTE encryption. Among the many potential applications of U-CIMAN, we highlight one use case of accurately measuring the spectrum tenancy of a commercial LTE cell. Besides measuring spectrum tenancy in unit of resource blocks, U-CIMAN discovers user mobility and traffic types associated with spectrum usage through decoded control messages and user data bytes. We conduct 4-month detailed accurate spectrum measurement on a commercial LTE cell, and the observations include the predictive power of Modulation and Coding Scheme on spectrum tenancy, and channel off-time bounded under 10 seconds, to name a few. 
    more » « less
  4. Estimating multimodal distributions of travel times (TT) from real-world data is critical for understanding and managing congestion. Mixture models can estimate the overall distribution when distinct peaks exist in the probability density function, but no transfer of mixture information under epistemic uncertainty across different spatiotemporal scales has been considered for capturing unobserved heterogeneity. In this paper, a physics-informed and -regularized (PIR) prediction model is developed that shares observations across similarly distributed network segments over time and space. By grouping similar mixture models, the model uses a particular sample distribution at distant non-contiguous unexplored locations and improves TT prediction. The model includes hierarchical Kalman filtering (KF) updates using the traffic fundamental diagram to regulate any spurious correlation and estimates the mixture of TT distributions from observations at the current location and time sampled from the multimodal and multivariate TT distributions at other locations and times. In order to overcome the limitations of KF, this study developed dynamic graph neural network (GCN) model which uses time evolving spatial correlations. The KF model with PIR predicts traffic state with 19% more accuracy than TMML model in Park et al.(2022) and GCN model will further reduce the uncertainty in prediction. This study uses information gain from explored correlated links to obtain accurate predictions for unexplored ones. 
    more » « less
  5. Frey, Sandor (Ed.)
    The ability to collect unprecedented amounts of astronomical data has enabled the nomical data has enabled the stu scientific questions that were impractical to study in the pre-information era. This study uses large datasets collected by four different robotic telescopes to profile the large-scale distribution of the spin directions of spiral galaxies. These datasets cover the Northern and Southern hemispheres, in addition to data acquired from space by the Hubble Space Telescope. The data were annotated automatically by a fully symmetric algorithm, as well as manually through a long labor-intensive process, leading to a dataset of nearly 10^6 galaxies. The data show possible patterns of asymmetric distribution of the spin directions, and the patterns agree between the different telescopes. The profiles also agree when using automatic or manual annotation of the galaxies, showing very similar large-scale patterns. Combining all data from all telescopes allows the most comprehensive analysis of its kind to date in terms of both the number of galaxies and the footprint size. The results show a statistically significant profile that is consistent across all telescopes. The instruments used in this study are DECam, HST, SDSS, and Pan-STARRS. The paper also discusses possible sources of bias and analyzes the design of previous work that showed different results. Further research will be required to understand and validate these preliminary observations. 
    more » « less