The current state of neuromorphic computing broadly encompasses domain-specific computing architectures designed to accelerate machine learning (ML) and artificial intelligence (AI) algorithms. As is well known, AI/ML algorithms are limited by memory bandwidth. Novel computing architectures are necessary to overcome this limitation. There are several options that are currently under investigation using both mature and emerging memory technologies. For example, mature memory technologies such as high-bandwidth memories (HBMs) are integrated with logic units on the same die to bring memory closer to the computing units. There are also research efforts where in-memory computing architectures have been implemented using DRAMs or flash memory technologies. However, DRAMs suffer from scaling limitations, while flash memory devices suffer from endurance issues. Additionally, in spite of this significant progress, the massive energy consumption needed in neuromorphic processors while meeting the required training and inferencing performance for AI/ML algorithms for future applications needs to be addressed. On the AI/ML algorithm side, there are several pending issues such as life-long learning, explainability, context-based decision making, multimodal association of data, adaptation to address personalized responses, and resiliency. These unresolved challenges in AI/ML have led researchers to explore brain-inspired computing architectures and paradigms.
more »
« less
Reducing Smart Phone Environmental Footprints with In-Memory Processing
Smart phones have revolutionized the availability of computing to the consumer. Recently, smart phones have been aggressively integrating artificial intelligence (AI) capabilities into their devices. The custom designed processors for the latest phones integrate incredibly capable and energy efficient graphics processors (GPUs) and tensor processors (TPUs) to accommodate this emerging AI workload and on-device inference. Unfor- tunately, smart phones are far from sustainable and have a substantial carbon footprint that continues to be dominated by environmental impacts from their manufacture and far less so by the energy required to power their operation. In this paper we explore the possibility of reversing the trend to increase the dedicated silicon dedicated to emerging application workloads in the phone. Instead we consider how in-memory processing using the DRAM already present in the phone could be used in place of dedicated GPU/TPU devices for AI inference. We explore the potential savings in embodied carbon that could be possible with this tradeoff and provide some analysis of the potential of in- memory computing to compete with these accelerators. While it may not be possible to achieve the same throughput, we suggest that the responsiveness to the user may be sufficient using in- memory computing, while both the embodied and operational carbon footprints could be improved. Our approach can save circa 10–15kgCO2e.
more »
« less
- PAR ID:
- 10553683
- Publisher / Repository:
- IEEE
- Date Published:
- Subject(s) / Keyword(s):
- sustainability embodied carbon in-memory processing AI inference
- Format(s):
- Medium: X
- Location:
- Raleigh, NC
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Neuromorphic computing, commonly understood as a computing approach built upon neurons, synapses, and their dynamics, as opposed to Boolean gates, is gaining large mindshare due to its direct application in solving current and future computing technological problems, such as smart sensing, smart devices, self-hosted and self-contained devices, artificial intelligence (AI) applications, etc. In a largely software-defined implementation of neuromorphic computing, it is possible to throw enormous computational power or optimize models and networks depending on the specific nature of the computational tasks. However, a hardware-based approach needs the identification of well-suited neuronal and synaptic models to obtain high functional and energy efficiency, which is a prime concern in size, weight, and power (SWaP) constrained environments. In this work, we perform a study on the characteristics of hardware neuron models (namely, inference errors, generalizability and robustness, practical implementability, and memory capacity) that have been proposed and demonstrated using a plethora of emerging nano-materials technology-based physical devices, to quantify the performance of such neurons on certain classes of problems that are of great importance in real-time signal processing like tasks in the context of reservoir computing. We find that the answer on which neuron to use for what applications depends on the particulars of the application requirements and constraints themselves, i.e., we need not only a hammer but all sorts of tools in our tool chest for high efficiency and quality neuromorphic computing.more » « less
-
Embedded database libraries provide developers with a com- mon and convenient data persistence layer. They have spread to many systems, including interactive devices like smart- phones, appearing in all major mobile systems. Their perfor- mance affects the response times and resource consumption of millions of phone apps and billions of phone users. It is thus critical that we better understand how they work, so they can be used more efficiently, and so developers can make faster libraries. Mobile databases differ significantly from server-class storage in terms of platform, usage, and measurement. Phones are multi-tenant, end-user devices that the database must share with other apps. Contrary to traditional database design goals, workloads on phones are single-app, bursty, and rarely saturate the CPU. We argue that mobile storage design should refocus on what matters on the mobile platform: latency and energy. As accurate per- formance measurement tools are necessary to evaluation of good database design, this uncovers another issue: Tradi- tional database benchmarking methods produce misleading results when applied to mobile devices, due to evaluating performance at saturation. Development of databases and measurements specifically designed for the mobile platform is necessary to optimize user experience of the most common database usage in the world.more » « less
-
Distributed computing, computer networking, and the Internet of Things (IoT) are all around us, yet only computer science and engineering majors learn the technologies that enable our modern lives. This paper introduces PhoneIoT, a mobile app that makes it possible to teach some of the basic concepts of distributed computation and networked sensing to novices. PhoneIoT turns mobile phones and tablets into IoT devices and makes it possible to create highly engaging projects through NetsBlox, an open-source block-based programming environment focused on teaching distributed computing at the high school level. PhoneIoT lets NetsBlox programs—running in the browser on the student’s computer—access available sensors. Since phones have touchscreens, PhoneIoT also allows building a Graphical User Interface (GUI) remotely from NetsBlox, which can be set to trigger custom code written by the student via NetsBlox’s message system. This approach enables students to create quite advanced distributed projects, such as turning their phone into a game controller or tracking their exercise on top of an interactive Google Maps background with just a few blocks of code.more » « less
-
Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as `CNN power and memory wall'. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement `bit-wise in-memory convolution engine', which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the `in-memory computing' concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication.more » « less
An official website of the United States government

