Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Point cloud is an important type of geometric data structure for many embedded applications such as autonomous driving and augmented reality. Current Point Cloud Networks (PCNs) have proven to achieve great success in using inference to perform point cloud analysis, including object part segmentation, shape classification, and so on. However, point cloud applications on the computing edge require more than just the inference step. They require an end-to-end (E2E) processing of the point cloud workloads: pre-processing of raw data, input preparation, and inference to perform point cloud analysis. Current PCN approaches to support end-to-end processing of point cloud workload cannot meet the real-time latency requirement on the edge, i.e., the ability of the AI service to keep up with the speed of raw data generation by 3D sensors. Latency for end-to-end processing of the point cloud workloads stems from two reasons: memory-intensive down-sampling in the pre-processing phase and the data structuring step for input preparation in the inference phase. In this paper, we present HgPCN, an end-to-end heterogeneous architecture for real-time embedded point cloud applications. In HgPCN, we introduce two novel methodologies based on spatial indexing to address the two identified bottlenecks. In the Pre-processing Engine of HgPCN, an Octree-Indexed-Sampling method is used to optimize the memory-intensive down-sampling bottleneck of the pre-processing phase. In the Inference Engine, HgPCN extends a commercial DLA with a customized Data Structuring Unit which is based on a Voxel-Expanded Gathering method to fundamentally reduce the workload of the data structuring step in the inference phase. The initial prototype of HgPCN has been implemented on an Intel PAC (Xeon+FPGA) platform. Four commonly available point cloud datasets were used for comparison, running on three baseline devices: Intel Xeon W-2255, Nvidia Xavier NX Jetson GPU, and Nvidia 4060ti GPU. These point cloud datasets were also run on two existing PCN accelerators for comparison: PointACC and Mesorasi. Our results show that for the inference phase, depending on the dataset size, HgPCN achieves speedup from 1.3× to 10.2× vs. PointACC, 2.2× to 16.5× vs. Mesorasi, and 6.4× to 21× vs. Jetson NX GPU. Along with optimization of the memory-intensive down-sampling bottleneck in pre-processing phase, the overall latency shows that HgPCN can reach the real-time requirement by providing end-to-end service with keeping up with the raw data generation rate.more » « lessFree, publicly-accessible full text available November 2, 2025
-
SATCOM is crucial for tactical networks, particularly submarines with sporadic communi- cation requirements. Emerging SATCOM technologies, such as low-earth-orbit (LEO) satellite networks, provide lower latency, greater data reliability, and higher throughput than long-distance geostationary (GEO) satellites. Software-defined networking (SDN) has been introduced to SATCOM networks due to its ability to enhance management while strengthening network control and security. In our previous work, we proposed a SD-LEO constellation for naval submarine communication networks, as well as an extreme gradient boosting (XGBoost) machine-learning (ML) approach for classifying denial-of-service attacks against the constellation. Nevertheless, zero-day attacks have the potential to cause major damage to the SATCOM network, particularly the controller architecture, due to the scarcity of data for training and testing ML models due to their novelty. This study tackles this challenge by employing a predictive queuing analysis of the SD-SATCOM controller design to rapidly generate ML training data for zero- day attack detection. In addition, we redesign our singular controller architecture to a decentralized controller architecture to eliminate singular points of failure. To our knowledge, no prior research has investigated using queuing analysis to predict SD-SATCOM controller architecture network performance for ML training to prevent zero-day attacks. Our queuing analysis accelerates the training of ML models and enhances data adaptability, enabling network operators to defend against zero-day attacks without precollected data. We utilized the CatBoost algorithm to train a multi-output regression model to predict network performance statistics. Our method successfully identified and classified normal, non-attack samples and zero-day cyberattacks with over 94% accuracy, precision, recall, and f1-scores.more » « less
-
Description / Abstract: In order to effectively provide INaaS (Inference-as-a-Service) for AI applications in resource-limited cloud environments, two major challenges must be overcome: achieving low latency and providing multi-tenancy. This paper presents EIF (Efficient INaaS Framework), which uses a heterogeneous CPU-FPGA architecture to provide three methods to address these challenges (1) spatial multiplexing via software-hardware co-design virtualization techniques, (2) temporal multiplexing that exploits the sparsity of neural-net models, and (3) streaming-mode inference which overlaps data transfer and computation. The prototype EIF is implemented on an Intel PAC (shared-memory CPU-FPGA) platform. For evaluation, 12 types of DNN models were used as benchmarks, with different size and sparsity. Based on these experiments, we show that in EIF, the temporal multiplexing technique can improve the user density of an AI Accelerator Unit from 2$$\times$$ to 6$$\times$$, with marginal performance degradation. In the prototype system, the spatial multiplexing technique supports eight AI Accelerators Unit on one FPGA. By using a streaming mode based on a Mediated Pass-Through architecture, EIF can overcome the FPGA on-chip memory limitation to improve multi-tenancy and optimize the latency of INaaS. To further enhance INaaS, EIF utilizes the MapReduce function to provide a more flexible QoS. Together with the temporal/spatial multiplexing techniques, EIF can support 48 users simultaneously on a single FPGA board in our prototype system. In all tested benchmarks, cold-start latency accounts for only approximately 5\% of the total response time.more » « less
-
First responders and other tactical teams rely on mo- bile tactical networks to coordinate and accomplish emergent time- critical tasks. The information exchanged through these networks is vulnerable to various strategic cyber network attacks. Detecting and mitigating them is a challenging problem due to the volatile and mobile nature of an ad hoc environment. This paper proposes MalCAD, a graph machine learning-based framework for detecting cyber attacks in mobile tactical software-defined networks. Mal- CAD operates based on observing connectivity features among various nodes obtained using graph theory, instead of collecting information at each node. The MalCAD framework is based on the XGBOOST classification algorithm and is evaluated for lost versus wasted connectivity and random versus targeted cyber attacks. Results show that, while the initial cyber attacks create a loss of 30%–60% throughput, MalCAD results in a gain of average throughput by 25%–50%, demonstrating successful attack mitigation.more » « less
-
For the next generation of wireless technologies, Orthogonal Frequency Division Multiplexing (OFDM) remains a key signaling technique. Peak-to-Average Power Ratio (PAPR) reduction must be included with OFDM to reduce the detrimental high PAPR exhibited by OFDM. The cost of PAPR reduction techniques stems from adding multiple IFFT iterations, which are computationally expensive and increase latency. We propose a novel PAPR Estimation Technique called PESTNet which reduces the necessary IFFT operations for PAPR reduction techniques by using deep learning to estimate the PAPR before the IFFT is applied. This paper gives a brief background on PAPR in OFDM systems and describes the PESTNet algorithm and the training methodologies. A case study of the estimation model is provided where results demonstrate PESTNet is able to give an accurate estimate of PAPR and can compute large batches of resource grids up to 10 times faster than IFFT based techniques.more » « less
-
Satellite communication (SATCOM) is a critical infrastructure for tactical networks--especially for the intermittent communication of submarines. To ensure data reliability, recent SATCOM research has begun to embrace several advances, such as low earth orbit (LEO) satellite networks to reduce latency and increase throughput compared to long-distance geostationary (GEO) satellites, and software-defined networking (SDN) to increase network control and security. This paper proposes an SD-LEO constellation for submarines in communication networks. An SD-LEO architecture is proposed, to Denial-of-Service (DoS) attack detection and classification using the extreme gradient boosting (XGBoost) algorithm. Numerical results demonstrate greater than ninety-eight percent in accuracy, precision, recall, and F1-scores.more » « less
-
As the design space for high-performance computer (HPC) systems grows larger and more complex, modeling and simulation (MODSIM) techniques become more important to better optimize systems. Furthermore, recent extreme-scale systems and newer technologies can lead to higher system fault rates, which negatively affect system performance and other metrics. Therefore, it is important for system designers to consider the effects of faults and fault-tolerance (FT) techniques on system design through MODSIM. BE-SST is an existing MODSIM methodology and workflow that facilitates preliminary exploration & reduction of large design spaces, particularly by highlighting areas of the space for detailed study and pruning less optimal areas. This paper presents the overall methodology for adding fault-tolerance awareness (FT-awareness) into BE-SST. We present the process used to extend BE-SST, enabling the creation of models that predict the time needed to perform a checkpoint instance for the given system configuration. Additionally, this paper presents a case study where a full HPC system is simulated using BE-SST, including application, hardware, and checkpointing. We validate the models and simulation against actual system measurements, finding an average percent error of less than 17% for the instance models and about 20% for system simulation, a level of accuracy acceptable for initial exploration and pruning of the design space. Finally, we show how FT-aware simulation results are used for comparing FT levels in the design space.more » « less
-
null (Ed.)Large Convolutional Neural Networks (CNNs) are often pruned and compressed to reduce the amount of parameters and memory requirement. However, the resulting irregularity in the sparse data makes it difficult for FPGA accelerators that contains systolic arrays of Multiply-and-Accumulate (MAC) units, such as Intel’s FPGA-based Deep Learning Accelerator (DLA), to achieve their maximum potential. Moreover, FPGAs with low-bandwidth off-chip memory could not satisfy the memory bandwidth requirement for sparse matrix computation. In this paper, we present 1) a sparse matrix packing technique that condenses sparse inputs and filters before feeding them into the systolic array of MAC units in the Intel DLA, and 2) a customization of the Intel DLA which allows the FPGA to efficiently utilize a high bandwidth memory (HBM2) integrated in the same package. For end-to-end inference with randomly pruned ResNet-50/MobileNet CNN models, our experiments demonstrate 2.7x/3x performance improvement compared to an FPGA with DDR4, 2.2x/2.1x speedup against a server-class Intel SkyLake CPU, and comparable performance with 1.7x/2x power efficiency gain as compared to an NVidia V100 GPU.more » « less
An official website of the United States government

Full Text Available