Abstract Marine phytoplankton are a diverse group of photoautotrophic organisms and key mediators in the global carbon cycle. Phytoplankton physiology and biomass accumulation are closely tied to mixed layer depth, but the intracellular metabolic pathways activated in response to changes in mixed layer depth remain less explored. Here, metatranscriptomics was used to characterize the phytoplankton community response to a mixed layer shallowing (from 233 to 5 m) over the course of two days during the late spring in the Northwest Atlantic. Most phytoplankton genera downregulated core photosynthesis, carbon storage, and carbon fixation genes as the system transitioned from a deep to a shallow mixed layer and shifted towards catabolism of stored carbon supportive of rapid cell growth. In contrast, phytoplankton genera exhibited divergent transcriptional patterns for photosystem light harvesting complex genes during this transition. Active virus infection, taken as the ratio of virus to host transcripts, increased in the Bacillariophyta (diatom) phylum and decreased in the Chlorophyta (green algae) phylum upon mixed layer shallowing. A conceptual model is proposed to provide ecophysiological context for our findings, in which integrated light limitation and lower division rates during transient deep mixing are hypothesized to disrupt resource-driven, oscillating transcript levels related to photosynthesis, carbon fixation, and carbon storage. Our findings highlight shared and unique transcriptional response strategies within phytoplankton communities acclimating to the dynamic light environment associated with transient deep mixing and shallowing events during the annual North Atlantic bloom.
more »
« less
Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework
We approach the problem of implementing mixed-datatype support within the general matrix multiplication ( gemm ) operation of the BLAS-like Library Instantiation Software framework, whereby each matrix operand A , B , and C may be stored as single- or double-precision real or complex values. Another factor of complexity, whereby the matrix product and accumulation are allowed to take place in a precision different from the storage precisions of either A or B , is also discussed. We first break the problem into orthogonal dimensions, considering the mixing of domains separately from mixing precisions. Support for all combinations of matrix operands stored in either the real or complex domain is mapped out by enumerating the cases and describing an implementation approach for each. Supporting all combinations of storage and computation precisions is handled by typecasting the matrices at key stages of the computation—during packing and/or accumulation, as needed. Several optional optimizations are also documented. Performance results gathered on a 56-core Marvell ThunderX2 and a 52-core Intel Xeon Platinum demonstrate that high performance is mostly preserved, with modest slowdowns incurred from unavoidable typecast instructions. The mixed-datatype implementation confirms that combinatorial intractability is avoided, with the framework relying on only two assembly microkernels to implement 128 datatype combinations.
more »
« less
- Award ID(s):
- 2003921
- PAR ID:
- 10222810
- Date Published:
- Journal Name:
- ACM Transactions on Mathematical Software
- Volume:
- 47
- Issue:
- 2
- ISSN:
- 0098-3500
- Page Range / eLocation ID:
- 1 to 26
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The data precision can significantly affect the accuracy and overhead metrics of hardware accelerators for different applications such as artificial neural networks (ANNs). This paper evaluates the inference and training of multi-layer perceptrons (MLPs), in which initially IEEE standard floating-point (FP) precisions (half, single and double) are utilized separately and then compared with mixed-precision FP formats. The mixed-precision calculations are investigated for three critical propagation modules (activation functions, weight updates, and accumulation units). Compared with applying a simple low-precision format, the mixed-precision format prevents an accuracy loss and the occurrence of overflow/underflow in the MLPs while potentially incurring in less hardware overhead in terms of area/power. As the multiply-accumulation is the most dominant operation in trending ANNs, a fully pipelined hardware implementation for the fused multiply-add units is proposed for different IEEE FP formats to achieve a very high operating frequency.more » « less
-
null (Ed.)In this paper the tracking problem of multi-agent systems, in a particular scenario where a segment of agents entering a sensing-denied environment or behaving as noncooperative targets, is considered. The focus is on determining the optimal sensor precisions while simultaneously promoting sparseness in the sensor measurements to guarantee a specified estimation performance. The problem is formulated in the discrete-time centralized Kalman filtering framework. A semidefinite program subject to linear matrix inequalities is solved to minimize the trace of precision matrix which is defined to be the inverse of sensor noise covariance matrix. Simulation results expose a trade-off between sensor precisions and sensing frequency.more » « less
-
The mixed Lp-norm, 0 ≤ p ≤ 2, stabilization algorithm is flexible for constructing a suite of subsurface models with either distinct, or a combination of, smooth, sparse, or blocky structures. This general purpose algorithm can be used for the inversion of data from regions with different subsurface characteristics. Model interpretation is improved by simulta- neous inversion of multiple data sets using a joint inversion approach. An effective and general algorithm is presented for the mixed Lp-norm joint inversion of gravity and magnetic data sets. The imposition of the structural cross-gradient enforces similarity between the reconstructed models. For efficiency the implementation relies on three crucial realistic details; (i) the data are assumed to be on a uniform grid providing sensitivity matrices that decompose in block Toeplitz Toeplitz block form for each depth layer of the model domain and yield efficiency in storage and computation via 2D fast Fourier transforms; (ii) matrix-free implementation for calculating derivatives of parameters reduces memory and computational overhead; and (iii) an alternating updating algorithm is employed. Balancing of the data misfit terms is imposed to assure that the gravity and magnetic data sets are fit with respect to their individual noise levels without overfitting of either model. Strategies to find all weighting parameters within the objective function are described. The algorithm is validated on two synthetic but complicated models. It is applied to invert gravity and magnetic data acquired over two kimberlite pipes in Botswana, producing models that are in good agreement with borehole information available in the survey area.more » « less
-
As the number of weight parameters in deep neural networks (DNNs) continues growing, the demand for ultra-efficient DNN accelerators has motivated research on non-traditional architectures with emerging technologies. Resistive Random-Access Memory (ReRAM) crossbar has been utilized to perform insitu matrix-vector multiplication of DNNs. DNN weight pruning techniques have also been applied to ReRAM-based mixed-signal DNN accelerators, focusing on reducing weight storage and accelerating computation. However, the existing works capture very few peripheral circuits features such as Analog to Digital converters (ADCs) during the neural network design. Unfortunately, ADCs have become the main part of power consumption and area cost of current mixed-signal accelerators, and the large overhead of these peripheral circuits is not solved efficiently. To address this problem, we propose a novel weight pruning framework for ReRAM-based mixed-signal DNN accelerators, named TINYADC, which effectively reduces the required bits for ADC resolution and hence the overall area and power consumption of the accelerator without introducing any computational inaccuracy. Compared to state-of-the-art pruning work on the ImageNet dataset, TINYADC achieves 3.5× and 2.9× power and area reduction, respectively. TINYADC framework optimizes the throughput of state-of-the-art architecture design by 29% and 40% in terms of the throughput per unit of millimeter square and watt (GOPs/s×mm 2 and GOPs/w), respectively.more » « less
An official website of the United States government

