Modern high-speed devices (e.g., network adapters, storage, accelerators) use new host interfaces, which expose multiple software queues directly to the device. These multi-queue interfaces allow mutually distrusting applications to access the device without any cross-core interaction, enabling throughput in the order of millions of IOP/s on multicore systems. Unfortunately, while independent device access is scalable, it also introduces a new problem: unfairness. Mechanisms that were used to provide fairness for older devices are no longer tenable in the wake of multi-queue design, and straightforward attempts to re-introduce it would require cross-core synchronization that undermines the scalability for which multiple queues were designed. To address these challenges, we present Multi-Queue Fair Queueing (MQFQ), the first fair, work-conserving scheduler suitable for multi-queue systems. Specifically, we (1) reformulate a classical fair queueing algorithm to accommodate multiqueue designs, and (2) describe a scalable implementation that bounds potential unfairness while minimizing synchronization overhead. Our implementation of MQFQ in Linux 4.15 demonstrates both fairness and high throughput. Evaluation with an NVMe over RDMA fabric (NVMf) device shows that MQFQ can reach up to 3.1 Million IOP/s on a single machine--20× higher than the state-of-the-art Linux Budget Fair Queueing. Compared to a system with no fairness, MQFQmore »
Exploring Faster RCNN for Fabric Defect Detection
This paper presents a fabric defect detection network (FabricNet) for automatic fabric defect detection. Our proposed FabricNet incorporates several effective techniques, such as Feature Pyramid Network (FPN), Deformable Convolution (DC) network, and Distance IoU Loss function, into vanilla Faster RCNN to improve the accuracy and speed of fabric defect detection. Our experiment shows that, when optimizations are combined, the FabricNet achieves 62.07% mAP and 97.37% AP50 on DAGM 2007 dataset, and an average prediction speed of 17 frames per second.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Artificial Intelligence for Industries (AI4I)
- Page Range or eLocation-ID:
- 52 to 55
- Sponsoring Org:
- National Science Foundation
More Like this
FABRIC is a unique national research infrastructure to enable cutting-edge andexploratory research at-scale in networking, cybersecurity, distributed computing andstorage systems, machine learning, and science applications. It is an everywhere-programmable nationwide instrument comprised of novel extensible network elementsequipped with large amounts of compute and storage, interconnected by high speed,dedicated optical links. It will connect a number of specialized testbeds for cloudresearch (NSF Cloud testbeds CloudLab and Chameleon), for research beyond 5Gtechnologies (Platforms for Advanced Wireless Research or PAWR), as well as productionhigh-performance computing facilities and science instruments to create a rich fabric fora wide variety of experimental activities.
Efficient Defect Identification via Oxide Memristive Crossbar Array Based Morphological Image ProcessingWang, Huan (Ed.)Defect identification has been a significant task in various fields to prevent the potential problems caused by imperfection. There is great attention for developing technology to accurately extract defect information from the image using a computing system without human error. However, image analysis using conventional computing technology based on Von Neumann structure is facing bottlenecks to efficiently process the huge volume of input data at low power and high speed. Herein efficient defect identification is demonstrated via a morphological image process with minimal power consumption using an oxide transistor and a memristor‐based crossbar array that can be applied to neuromorphic computing. Using a hardware and software codesigned neuromorphic system combined with a dynamic Gaussian blur kernel operation, an enhanced defect detection performance is successfully demonstrated with about 104 times more power‐efficient computation compared to the conventional complementary metal‐oxide semiconductor (CMOS)‐based digital implementation. It is believed the back end of line (BEOL)‐compatible all‐oxide‐based memristive crossbar array provides the unique potential toward universal artificial intelligence of things (AIoT) applications where conventional hardware can hardly be used.
Recent effort to test deep learning systems has produced an intuitive and compelling test criterion called neuron coverage (NC), which resembles the notion of traditional code coverage. NC measures the proportion of neurons activated in a neural network and it is implicitly assumed that increasing NC improves the quality of a test suite. In an attempt to automatically generate a test suite that increases NC, we design a novel diversity promoting regularizer that can be plugged into existing adversarial attack algorithms. We then assess whether such attempts to increase NC could generate a test suite that (1) detects adversarial attacks successfully, (2) produces natural inputs, and (3) is unbiased to particular class predictions. Contrary to expectation, our extensive evaluation finds that increasing NC actually makes it harder to generate an effective test suite: higher neuron coverage leads to fewer defects detected, less natural inputs, and more biased prediction preferences. Our results invoke skepticism that increasing neuron coverage may not be a meaningful objective for generating tests for deep neural networks and call for a new test generation technique that considers defect detection, naturalness, and output impartiality in tandem.
SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous DrivingObject detection is a crucial task for autonomous driving. In addition to requiring high accuracy to ensure safety, object detection for autonomous driving also requires realtime inference speed to guarantee prompt vehicle control, as well as small model size and energy efficiency to enable embedded system deployment. In this work, we propose SqueezeDet, a fully convolutional neural network for object detection that aims to simultaneously satisfy all of the above constraints. In our network we use convolutional layers not only to extract feature maps, but also as the output layer to compute bounding boxes and class probabilities. The detection pipeline of our model only contains a single forward pass of a neural network, thus it is extremely fast. Our model is fully convolutional, which leads to small model size and better energy efficiency. Finally, our experiments show that our model is very accurate, achieving state-of-the-art accuracy on the KITTI  benchmark. The source code of SqueezeDet is open-source released1.