Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
A dual-gate Si photodiode (PD) is proposed as an optoelectronic physical unclonable function (PUF) for its high reconfigurability, resistance to adversarial attacks, and all-silicon construction. Leveraging its high-capacity photoresponse via gating, the figures-of-merit of this PUF (uniformity, uniqueness, reliability) are on par with prior technologies. Furthermore, its reconfigurability (45) yields high-capacity keys for privacy-preserving Internet-of-Things (IoT), which effectively mask user images from attackers. Altogether, PD-based PUFs offer a promising solution for edge cryptosystems in IoT networks with low hardware overhead.more » « less
-
With increasingly deployed cameras and the rapid advances of Computer Vision, large-scale live video analytics becomes feasible. However, analyzing videos is compute-intensive. In addition, live video analytics needs to be performed in real time. In this paper, we design an edge server system for live video analytics. We propose to perform configuration adaptation without profiling video online. We select configurations with a prediction model based on object movement features. In addition, we reduce the latency through resource orchestration on video analytics servers. The key idea of resource orchestration is to batch inference tasks that use the same CNN model, and schedule tasks based on a priority value that estimates their impact on the total latency. We evaluate our system with two video analytic applications, road traffic monitoring and pose detection. The experimental results show that our profiling-free adaptation reduces the workload by 80% of the state-of-the-art adaptation without lowering the accuracy. The average serving latency is reduced by up to 95% comparing with the profiling-based adaptation.more » « less
-
Graph convolutional network (GCN) has been shown effective in many applications with graph structures. However, training a large-scale GCN is still challenging due to the high computation cost that grows with the size of the graph. In this paper, we propose CM-GCN, a distributed GCN framework using cohesive mini-batches to accelerate large-scale GCN training. The cohesive mini-batches group nodes that are tightly connected in the graph. As a result, CM-GCN can reduce the computation required to train a GCN. We propose a computation cost function to quantify the computation required for mini-batches. By exploring the submodular property of the computation cost function, we develop an efficient algorithm to partition nodes into tightly coupled mini-batches. Based on the computation cost function, we evenly distribute the workloads of mini-batches to workers. We design asynchronous computations between GCN layers to further eliminating the waiting among workers. We implement a CM-GCN framework and evaluate its performance with graphs that contain millions of nodes. Our evaluation shows that CM-GCN can achieve up to 3X speedup without compromising the training accuracy.more » « less
-
Machine learning models such as deep neural networks have been shown to be successful in solving a wide range of problems. Training such a model typically requires stochastic gradient descent, and the process is time-consuming and expensive in terms of computing resources. In this paper, we propose a distributed framework that supports the prioritized execution of the gradient computation. Our proposed distributed framework identifies important data points through computing or estimating the priority for each data point. We evaluate the proposed distributed framework with several machine learning models including multi-layer perceptron (MLP) and convolutional neural networks (CNN). Our experimental results show that prioritized SGD accelerates the training of machine learning models by as much as 1.6X over that of the mini-batch SGD. Further, the distributed framework scales linearly with the number of workers.more » « less
-
Data parallel frameworks become essential for training machine learning models. The classic Bulk Synchronous Parallel (BSP) model updates the model parameters through pre-defined synchronization barriers. However, when a worker computes significantly slower than other workers, waiting for the slow worker will lead to excessive waste of computing resources. In this paper, we propose a novel proactive data-parallel (PDP) framework. PDP enables the parameter server to initiate the update of the model parameter. That is, we can perform the update at any time without pre-defined update points. PDP not only initiates the update but also determines when to update. The global decision on the frequency of updates will accelerate the training. We further propose asynchronous PDP to reduce the idle time caused by synchronizing parameter updates. We theoretically prove the convergence property of asynchronous PDP. We implement a distributed PDP framework and evaluate PDP with several popular machine learning algorithms including Multilayer Perceptron, Convolutional Neural Network, K-means, and Gaussian Mixture Model. Our evaluation shows that PDP can achieve up to 20X speedup over the BSP model and scale to large clusters.more » « less
-
Video analytics has many applications in traffic control, security monitoring, action/event analysis, etc. With the adoption of deep neural networks, the accuracy of video analytics in video streams has been greatly improved. However, deep neural networks for performing video analytics are compute-intensive. In order to reduce processing time, many systems switch to the lower frame rate or resolution. State-of-the-art switching approaches adjust configurations by profiling video clips on a large configuration space. Multiple configurations are tested periodically and the cheapest one with a desired accuracy is adopted. In this paper, we propose a method that adapts the configuration by analyzing past video analytics results instead of profiling candidate configurations. Our method adopts a lower/higher resolution or frame rate when objects move slow/fast. We train a model that automatically selects the best configuration. We evaluate our method with two real-world video analytics applications: traffic tracking and pose estimation. Compared to the periodic profiling method, our method achieves 3%-12% higher accuracy with the same resource cost and 8-17x faster with comparable accuracy.more » « less
An official website of the United States government
