Advances in algorithms and low-power computing hardware imply that machine learning is of potential use in off-grid medical data classification and diagnosis applications such as electrocardiogram interpretation. However, although support vector machine algorithms for electrocardiogram classification show high classification accuracy, hardware implementations for edge applications are impractical due to the complexity and substantial power consumption needed for kernel optimization when using conventional complementary metal–oxide–semiconductor circuits. Here we report reconfigurable mixed-kernel transistors based on dual-gated van der Waals heterojunctions that can generate fully tunable individual and mixed Gaussian and sigmoid functions for analogue support vector machine kernel applications. We show that the heterojunction-generated kernels can be used for arrhythmia detection from electrocardiogram signals with high classification accuracy compared with standard radial basis function kernels. The reconfigurable nature of mixed-kernel heterojunction transistors also allows for personalized detection using Bayesian optimization. A single mixed-kernel heterojunction device can generate the equivalent transfer function of a complementary metal–oxide–semiconductor circuit comprising dozens of transistors and thus provides a low-power approach for support vector machine classification applications.
more »
« less
Gaussian‐Sigmoid Reinforcement Transistors: Resolving Exploration‐Exploitation Trade‐Off Through Gate Voltage‐Controlled Activation Functions
Abstract Reinforcement learning (RL) relies on Gaussian and sigmoid functions to balance exploration and exploitation, but implementing these functions in hardware typically requires iterative computations, increasing power and circuit complexity. Here, Gaussian‐sigmoid reinforcement transistors (GS‐RTs) are reported that integrate both activation functions into a single device. The transistors feature a vertical n‐p‐i‐p heterojunction stack composed of a‐IGZO and DNTT, with asymmetric source–drain contacts and a parylene interlayer that enables voltage‐tunable transitions between sigmoid, Gaussian, and mixed responses. This architecture emulates the behavior of three transistors in one, reducing the required circuit complexity from dozens of transistors to fewer than a few. The GS‐RT exhibits a peak current of 5.95 µA at VG= −17 V and supports nonlinear transfer characteristics suited for neuromorphic computing. In a multi‐armed bandit task, GS‐RT‐based RL policies demonstrate 20% faster convergence and 30% higher final reward compared to conventional sigmoid‐ or Gaussian‐based approaches. Extending this advantage further, GS‐RT‐based activation function in deep RL for cartpole balancing significantly outperforms the traditional ReLU‐based activation function in terms of faster learning and tolerance to input perturbations.
more »
« less
- PAR ID:
- 10613859
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Advanced Functional Materials
- ISSN:
- 1616-301X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In reinforcement learning (RL), the ability to utilize prior knowledge from previously solved tasks can allow agents to quickly solve new problems. In some cases, these new problems may be approximately solved by composing the solutions of previously solved primitive tasks (task composition). Otherwise, prior knowledge can be used to adjust the reward function for a new problem, in a way that leaves the optimal policy unchanged but enables quicker learning (reward shaping). In this work, we develop a general framework for reward shaping and task composition in entropy-regularized RL. To do so, we derive an exact relation connecting the optimal soft value functions for two entropy-regularized RL problems with different reward functions and dynamics. We show how the derived relation leads to a general result for reward shaping in entropy-regularized RL. We then generalize this approach to derive an exact relation connecting optimal value functions for the composition of multiple tasks in entropy-regularized RL. We validate these theoretical contributions with experiments showing that reward shaping and task composition lead to faster learning in various settings.more » « less
-
Abstract Atomically thin, 2D, and semiconducting transition metal dichalcogenides (TMDs) are seen as potential candidates for complementary metal oxide semiconductor (CMOS) technology in future nodes. While high‐performance field effect transistors (FETs), logic gates, and integrated circuits (ICs) made from n‐type TMDs such as MoS2and WS2grown at wafer scale have been demonstrated, realizing CMOS electronics necessitates integration of large area p‐type semiconductors. Furthermore, the physical separation of memory and logic is a bottleneck of the existing CMOS technology and must be overcome to reduce the energy burden for computation. In this article, the existing limitations are overcome and for the first time, a heterogeneous integration of large area grown n‐type MoS2and p‐type vanadium doped WSe2FETs with non‐volatile and analog memory storage capabilities to achieve a non–von Neumann 2D CMOS platform is introduced. This manufacturing process flow allows for precise positioning of n‐type and p‐type FETs, which is critical for any IC development. Inverters and a simplified 2‐input‐1‐output multiplexers and neuromorphic computing primitives such as Gaussian, sigmoid, and tanh activation functions using this non–von Neumann 2D CMOS platform are also demonstrated. This demonstration shows the feasibility of heterogeneous integration of wafer scale 2D materials.more » « less
-
This paper extends the star set reachability approach to verify the robustness of feed-forward neural networks (FNNs) with sigmoidal activation functions such as Sigmoid and TanH. The main drawbacks of the star set approach in Sigmoid/TanH FNN verification are scalability, feasibility, and optimality issues in some cases due to the linear programming solver usage. We overcome this challenge by proposing a relaxed star (RStar) with symbolic intervals, which allows the usage of the back-substitution technique in DeepPoly to find bounds when overapproximating activation functions while maintaining the valuable features of a star set. RStar can overapproximate a sigmoidal activation function using four linear constraints (RStar4) or two linear constraints (RStar2), or only the output bounds (RStar0). We implement our RStar reachability algorithms in NNV and compare them to DeepPoly via robustness verification of image classification DNNs benchmarks. The experimental results show that the original star approach (i.e., no relaxation) is the least conservative of all methods yet the slowest. RStar4 is computationally much faster than the original star method and is the second least conservative approach. It certifies up to 40% more images against adversarial attacks than DeepPoly and on average 51 times faster than the star set. Last but not least, RStar0 is the most conservative method, which could only verify two cases for the CIFAR10 small Sigmoid network,δ= 0.014. However, it is the fastest method that can verify neural networks up to 3528 times faster than the star set and up to 46 times faster than DeepPoly in our evaluation.more » « less
-
null (Ed.)Internet of Things (IoT) sensors often operate in unknown dynamic environments comprising latency-sensitive data sources, dynamic processing loads, and communication channels of unknown statistics. Such settings represent a natural application domain of reinforcement learning (RL), which enables computing and learning decision policies online, with no a priori knowledge. In our previous work, we introduced a post-decision state (PDS) based RL framework, which considerably accelerates the rate of learning an optimal decision policy. The present paper formulates an efficient hardware architecture for the action evaluation step, which is the most computationally-intensive step in the PDS based learning framework. By leveraging the unique characteristics of PDS learning, we optimize its state value expectation and known cost computational blocks, to speed-up the overall computation. Our experiments show that the optimized circuit is 49 times faster than its software implementation counterpart, and six times faster than a Q-learning hardware accelerator.more » « less
An official website of the United States government
