As a model of recurrent spiking neural networks, the Liquid State Machine (LSM) offers a powerful brain-inspired computing platform for pattern recognition and machine learning applications. While operated by processing neural spiking activities, the LSM naturally lends itself to an efficient hardware implementation via exploration of typical sparse firing patterns emerged from the recurrent neural network and smart processing of computational tasks that are orchestrated by different firing events at runtime. We explore these opportunities by presenting a LSM processor architecture with integrated on-chip learning and its FPGA implementation. Our LSM processor leverage the sparsity of firing activities to allow for efficient event-driven processing and activity-dependent clock gating. Using the spoken English letters adopted from the TI46 [1] speech recognition corpus as a benchmark, we show that the proposed FPGA-based neural processor system is up to 29% more energy efficient than a baseline LSM processor with little extra hardware overhead.
more »
« less
Machine learning evaluation in the Global Event Processor FPGA for the ATLAS trigger upgrade
Abstract The Global Event Processor (GEP) FPGA is an area-constrained, performance-critical element of the Large Hadron Collider's (LHC) ATLAS experiment. It needs to very quickly determine which small fraction of detected events should be retained for further processing, and which other events will be discarded. This system involves a large number of individual processing tasks, brought together within the overall Algorithm Processing Platform (APP), to make filtering decisions at an overall latency of no more than 8ms. Currently, such filtering tasks are hand-coded implementations of standard deterministic signal processing tasks.In this paper we present methods to automatically create machine learning based algorithms for use within the APP framework, and demonstrate several successful such deployments. We leverage existing machine learning to FPGA flows such ashls4mlandfwXto significantly reduce the complexity of algorithm design. These have resulted in implementations of various machine learning algorithms with latencies of 1.2 μs and less than 5% resource utilization on an Xilinx XCVU9P FPGA. Finally, we implement these algorithms into the GEP system and present their actual performance.Our work shows the potential of using machine learning in the GEP for high-energy physics applications. This can significantly improve the performance of the trigger system and enable the ATLAS experiment to collect more data and make more discoveries. The architecture and approach presented in this paper can also be applied to other applications that require real-time processing of large volumes of data.
more »
« less
- PAR ID:
- 10514579
- Publisher / Repository:
- Journal of Instrumentation
- Date Published:
- Journal Name:
- Journal of Instrumentation
- Volume:
- 19
- Issue:
- 05
- ISSN:
- 1748-0221
- Page Range / eLocation ID:
- P05031
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The processing demands of current and emerging applications, such as image/video processing, are increasing due to the deluge of data, generated by mobile and edge devices. This raises challenges for a vast range of computing systems, starting from smart-phones and reaching cloud and data centers. Heterogeneous computing demonstrates its ability as an efficient computing model due to its capability to adapt to various workload requirements. Field programmable gate arrays (FPGAs) provide power and performance benefits and have been used in many application domains from embedded systems to the cloud. In this paper, we used a closely coupled CPU-FPGA heterogeneous system to accelerate a sliding window based image processing algorithm, Canny edge detector. We accelerated Canny using two different implementations: Code partitioned and data partitioned. In the data partitioned implementation, we proposed a weighted round-robin based algorithm that partitions input images and distributes the load between the CPU and the FPGA based on latency. The paper also compares the performance of the proposed accelerators with separate CPU and FPGA implementations. Using our hybrid CPU-FPGA based algorithm, we achieved a speedup of up to 4.8× over a CPU-only and up to 2.1× over a FPGA-only implementations. Moreover, the estimated total energy consumption of our algorithm is more efficient than a CPU-only implementation. Our results show a significant reduction in energy-delay product (EDP) compared to the CPU-only implementation, and comparable EDP results to the FPGA-only implementation.more » « less
-
null (Ed.)Accessible machine learning algorithms, software, and diagnostic tools for energy-efficient devices and systems are extremely valuable across a broad range of application domains. In scientific domains, real-time near-sensor processing can drastically improve experimental design and accelerate scientific discoveries. To support domain scientists, we have developed hls4ml, an open-source software-hardware codesign workflow to interpret and translate machine learning algorithms for implementation with both FPGA and ASIC technologies. We expand on previous hls4ml work by extending capabilities and techniques towards low-power implementations and increased usability: new Python APIs, quantization-aware pruning, end-to-end FPGA workflows, long pipeline kernels for low power, and new device backends include an ASIC workflow. Taken together, these and continued efforts in hls4ml will arm a new generation of domain scientists with accessible, efficient, and powerful tools for machine-learning-accelerated discovery.more » « less
-
The Genomics Education Partnership (GEP; https://thegep.org) began as a consortium of 16 faculty in 2006 with a goal of providing students with Course-based Undergraduate Research Experiences (CUREs) in genomics. Today, GEP has over 200 faculty from more than 180 institutions and engages more than 3,900 undergraduates in authentic genomics research annually. These faculty joined and continued to participate in the GEP for many reasons, including the collaborative nature of the research, the well-established infrastructure, and the supportive network of like-minded colleagues. Faculty implement GEP materials in diverse settings ? ranging from short modules (2-8 weeks) within a course, to a standalone full-semester course, to independent student research. GEP students show significant gains in scientific knowledge and attitudes toward science. In addition to improving their understanding of the research process and how new knowledge is created in the field, GEP students acquire desirable and transferable skills essential for future participation in the workforce, such as problem solving, independence, application of knowledge, team-work, and collaboration. Students also gain competence in the use of computational algorithms to analyze large biological datasets ? thereby preparing students for a growing need of a workforce trained at applying statistics and computational tools to analyze large datasets. In addition, GEP students and their faculty mentors are eligible to be co-authors on the scientific publications that are based on their work. In this workshop, we will provide an overview of the GEP community, a hands-on guided tour of our introductory curriculum aimed to teach gene structure, transcription, translation, and processing, and a step-by-step walkthrough that illustrates the protocol for annotating a protein-coding gene in Drosophila. Participants will receive information on how to join the GEP community and receive training and resources to enable their implementations.more » « less
-
Abstract The ATLAS experiment at the Large Hadron Collider has a broad physics programme ranging from precision measurements to direct searches for new particles and new interactions, requiring ever larger and ever more accurate datasets of simulated Monte Carlo events. Detector simulation with Geant4 is accurate but requires significant CPU resources. Over the past decade, ATLAS has developed and utilized tools that replace the most CPU-intensive component of the simulation—the calorimeter shower simulation—with faster simulation methods. Here, AtlFast3, the next generation of high-accuracy fast simulation in ATLAS, is introduced. AtlFast3 combines parameterized approaches with machine-learning techniques and is deployed to meet current and future computing challenges, and simulation needs of the ATLAS experiment. With highly accurate performance and significantly improved modelling of substructure within jets, AtlFast3 can simulate large numbers of events for a wide range of physics processes.more » « less
An official website of the United States government

