skip to main content

This content will become publicly available on December 7, 2021

Title: Trust in 5G Open RANs through Machine Learning: RF Fingerprinting on the POWDER PAWR Platform
5G and open radio access networks (Open RANs) will result in vendor-neutral hardware deployment that will require additional diligence towards managing security risks. This new paradigm will allow the same network infrastructure to support virtual network slices for transmit different waveforms, such as 5G New Radio, LTE, WiFi, at different times. In this multi- vendor, multi-protocol/waveform setting, we propose an additional physical layer authentication method that detects a specific emitter through a technique called as RF fingerprinting. Our deep learning approach uses convolutional neural networks augmented with triplet loss, where examples of similar/dissimilar signal samples are shown to the classifier over the training duration. We demonstrate the feasibility of RF fingerprinting base stations over the large-scale over-the-air experimental POWDER platform in Salt Lake City, Utah, USA. Using real world datasets, we show how our approach overcomes the challenges posed by changing channel conditions and protocol choices with 99.86% detection accuracy for different training and testing days.
; ; ;
Award ID(s):
Publication Date:
Journal Name:
IEEE Global Communications Conference
Sponsoring Org:
National Science Foundation
More Like this
  1. Millimeter wave (mmW) communications is viewed as the key enabler of 5G cellular networks due to vast spectrum availability that could boost peak rate and capacity. Due to increased propagation loss in mmW band, transceivers with massive antenna array are required to meet a link budget, but their power consumption and cost become limiting factors for commercial systems. Radio designs based on hybrid digital and analog array architectures and the usage of radio frequency (RF) signal processing via phase shifters have emerged as potential solutions to improve radio energy efficiency and deliver performances close to the conventional digital antenna arrays.more »In this paper, we provide an overview of the state-of-the-art mmW massive antenna array designs and comparison among three array architectures, namely digital array, partially-connected hybrid array (sub-array), and fully-connected hybrid array. The comparison of performance, power, and area for these three architectures is performed for three representative 5G downlink use cases, which cover a range of pre-beamforming signal-to-noise-ratios (SNR) and multiplexing regimes. This is the first study to comprehensively model and quantitatively analyze all design aspects and criteria including: 1) optimal linear precoder, 2) impact of quantization error in digital-to-analog converter (DAC) and phase shifters, 3) RF signal distribution network, 4) power and area estimation based on state-of-the-art mmW circuits including baseband digital precoding, digital signal distribution network, high-speed DACs, oscillators, mixers, phase shifters, RF signal distribution network, and power amplifiers. Our simulation results show that the fully-digital array architecture is the most power and area efficient compared against optimized designs for sub-array and hybrid array architectures. Our analysis shows that digital array architecture benefits greatly from multi-user multiplexing. The analysis also reveals that sub-array architecture performance is limited by reduced beamforming gain due to array partitioning, while the system bottleneck of the fully-connected hybrid architecture is the excessively complicated and power hungry RF signal distribution network.« less
  2. Website fingerprinting attacks, which use statistical analysis on network traffic to compromise user privacy, have been shown to be effective even if the traffic is sent over anonymity-preserving networks such as Tor. The classical attack model used to evaluate website fingerprinting attacks assumes an on-path adversary, who can observe all traffic traveling between the user’s computer and the secure network. In this work we investigate these attacks under a different attack model, in which the adversary is capable of sending a small amount of malicious JavaScript code to the target user’s computer. The malicious code mounts a cache side-channel attack,more »which exploits the effects of contention on the CPU’s cache, to identify other websites being browsed. The effectiveness of this attack scenario has never been systematically analyzed, especially in the open-world model which assumes that the user is visiting a mix of both sensitive and non-sensitive sites. We show that cache website fingerprinting attacks in JavaScript are highly feasible. Specifically, we use machine learning techniques to classify traces of cache activity. Unlike prior works, which try to identify cache conflicts, our work measures the overall occupancy of the last-level cache. We show that our approach achieves high classification accuracy in both the open-world and the closed-world models. We further show that our attack is more resistant than network-based fingerprinting to the effects of response caching, and that our techniques are resilient both to network-based defenses and to side-channel countermeasures introduced to modern browsers as a response to the Spectre attack. To protect against cache-based website fingerprinting, new defense mechanisms must be introduced to privacy-sensitive browsers and websites. We investigate one such mechanism, and show that generating artificial cache activity reduces the effectiveness of the attack and completely eliminates it when used in the Tor Browser« less
  3. Obeid, Iyad ; Picone, Joseph ; Selesnick, Ivan (Ed.)
    The Neural Engineering Data Consortium (NEDC) is developing a large open source database of high-resolution digital pathology images known as the Temple University Digital Pathology Corpus (TUDP) [1]. Our long-term goal is to release one million images. We expect to release the first 100,000 image corpus by December 2020. The data is being acquired at the Department of Pathology at Temple University Hospital (TUH) using a Leica Biosystems Aperio AT2 scanner [2] and consists entirely of clinical pathology images. More information about the data and the project can be found in Shawki et al. [3]. We currently have a Nationalmore »Science Foundation (NSF) planning grant [4] to explore how best the community can leverage this resource. One goal of this poster presentation is to stimulate community-wide discussions about this project and determine how this valuable resource can best meet the needs of the public. The computing infrastructure required to support this database is extensive [5] and includes two HIPAA-secure computer networks, dual petabyte file servers, and Aperio’s eSlide Manager (eSM) software [6]. We currently have digitized over 50,000 slides from 2,846 patients and 2,942 clinical cases. There is an average of 12.4 slides per patient and 10.5 slides per case with one report per case. The data is organized by tissue type as shown below: Filenames: tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_0a001_00123456_lvl0001_s000.svs tudp/v1.0.0/svs/gastro/000001/00123456/2015_03_05/0s15_12345/0s15_12345_00123456.docx Explanation: tudp: root directory of the corpus v1.0.0: version number of the release svs: the image data type gastro: the type of tissue 000001: six-digit sequence number used to control directory complexity 00123456: 8-digit patient MRN 2015_03_05: the date the specimen was captured 0s15_12345: the clinical case name 0s15_12345_0a001_00123456_lvl0001_s000.svs: the actual image filename consisting of a repeat of the case name, a site code (e.g., 0a001), the type and depth of the cut (e.g., lvl0001) and a token number (e.g., s000) 0s15_12345_00123456.docx: the filename for the corresponding case report We currently recognize fifteen tissue types in the first installment of the corpus. The raw image data is stored in Aperio’s “.svs” format, which is a multi-layered compressed JPEG format [3,7]. Pathology reports containing a summary of how a pathologist interpreted the slide are also provided in a flat text file format. A more complete summary of the demographics of this pilot corpus will be presented at the conference. Another goal of this poster presentation is to share our experiences with the larger community since many of these details have not been adequately documented in scientific publications. There are quite a few obstacles in collecting this data that have slowed down the process and need to be discussed publicly. Our backlog of slides dates back to 1997, meaning there are a lot that need to be sifted through and discarded for peeling or cracking. Additionally, during scanning a slide can get stuck, stalling a scan session for hours, resulting in a significant loss of productivity. Over the past two years, we have accumulated significant experience with how to scan a diverse inventory of slides using the Aperio AT2 high-volume scanner. We have been working closely with the vendor to resolve many problems associated with the use of this scanner for research purposes. This scanning project began in January of 2018 when the scanner was first installed. The scanning process was slow at first since there was a learning curve with how the scanner worked and how to obtain samples from the hospital. From its start date until May of 2019 ~20,000 slides we scanned. In the past 6 months from May to November we have tripled that number and how hold ~60,000 slides in our database. This dramatic increase in productivity was due to additional undergraduate staff members and an emphasis on efficient workflow. The Aperio AT2 scans 400 slides a day, requiring at least eight hours of scan time. The efficiency of these scans can vary greatly. When our team first started, approximately 5% of slides failed the scanning process due to focal point errors. We have been able to reduce that to 1% through a variety of means: (1) best practices regarding daily and monthly recalibrations, (2) tweaking the software such as the tissue finder parameter settings, and (3) experience with how to clean and prep slides so they scan properly. Nevertheless, this is not a completely automated process, making it very difficult to reach our production targets. With a staff of three undergraduate workers spending a total of 30 hours per week, we find it difficult to scan more than 2,000 slides per week using a single scanner (400 slides per night x 5 nights per week). The main limitation in achieving this level of production is the lack of a completely automated scanning process, it takes a couple of hours to sort, clean and load slides. We have streamlined all other aspects of the workflow required to database the scanned slides so that there are no additional bottlenecks. To bridge the gap between hospital operations and research, we are using Aperio’s eSM software. Our goal is to provide pathologists access to high quality digital images of their patients’ slides. eSM is a secure website that holds the images with their metadata labels, patient report, and path to where the image is located on our file server. Although eSM includes significant infrastructure to import slides into the database using barcodes, TUH does not currently support barcode use. Therefore, we manage the data using a mixture of Python scripts and manual import functions available in eSM. The database and associated tools are based on proprietary formats developed by Aperio, making this another important point of community-wide discussion on how best to disseminate such information. Our near-term goal for the TUDP Corpus is to release 100,000 slides by December 2020. We hope to continue data collection over the next decade until we reach one million slides. We are creating two pilot corpora using the first 50,000 slides we have collected. The first corpus consists of 500 slides with a marker stain and another 500 without it. This set was designed to let people debug their basic deep learning processing flow on these high-resolution images. We discuss our preliminary experiments on this corpus and the challenges in processing these high-resolution images using deep learning in [3]. We are able to achieve a mean sensitivity of 99.0% for slides with pen marks, and 98.9% for slides without marks, using a multistage deep learning algorithm. While this dataset was very useful in initial debugging, we are in the midst of creating a new, more challenging pilot corpus using actual tissue samples annotated by experts. The task will be to detect ductal carcinoma (DCIS) or invasive breast cancer tissue. There will be approximately 1,000 images per class in this corpus. Based on the number of features annotated, we can train on a two class problem of DCIS or benign, or increase the difficulty by increasing the classes to include DCIS, benign, stroma, pink tissue, non-neoplastic etc. Those interested in the corpus or in participating in community-wide discussions should join our listserv,, to be kept informed of the latest developments in this project. You can learn more from our project website:« less
  4. There is much interest in integrating millimeter wave radios (mmWave) into wireless LANs and 5G cellular networks to benefit from their multi-GHz of available spectrum. Yet, unlike existing technologies, e.g., WiFi, mmWave radios require highly directional antennas. Since the antennas have pencil-beams, the transmitter and receiver need to align their beams before they can communicate. Existing systems scan the space to find the best alignment. Such a process has been shown to introduce up to seconds of delay and is unsuitable for wireless networks where an access point has to quickly switch between users and accommodate mobile clients. This papermore »presents Agile-Link, a new protocol that can find the best mmWave beam alignment without scanning the space. Given all possible directions for setting the antenna beam, Agile-Link provably finds the optimal direction in logarithmic number of measurements. Further, Agile-Link works within the existing 802.11ad standard for mmWave LAN, and can support both clients and access points. We have implemented Agile-Link in a mmWave radio and evaluated it empirically. Our results show that it reduces beam alignment delay by orders of magnitude. In particular, for highly directional mmWave devices operating under 802.11ad, the delay drops from over a second to 2.5 ms.« less
  5. This paper presents Virginia Tech’s wireless testbed supporting research on long-term evolution (LTE) signaling and radio frequency (RF) spectrum coexistence. LTE is continuously refined and new features released. As the communications contexts for LTE expand, new research problems arise and include operation in harsh RF signaling environments and coexistence with other radios. Our testbed provides an integrated research tool for investigating these and other research problems; it allows analyzing the severity of the problem, designing and rapidly prototyping solutions, and assessing them with standard-compliant equipment and test procedures. The modular testbed integrates general-purpose software-defined radio hardware, LTE-specific test equipment, RFmore »components, free open-source and commercial LTE software, a configurable RF network and recorded radar waveform samples. It supports RF channel emulated and over-the-air radiated modes. The testbed can be remotely accessed and configured. An RF switching network allows for designing many different experiments that can involve a variety of real and virtual radios with support for multiple-input multiple-output (MIMO) antenna operation. We present the testbed, the research it has enabled and some valuable lessons that we learned and that may help designing, developing, and operating future wireless testbeds.« less