Abstract Topology optimization has emerged as a versatile design tool embraced across diverse domains. This popularity has led to great efforts in the development of education-centric topology optimization codes with various focuses, such as targeting beginners seeking user-friendliness and catering to experienced users emphasizing computational efficiency. In this study, we introduce , a novel 2D and 3D topology optimization software developed in Python and built upon the open-source library, designed to harmonize usability with computational efficiency and post-processing for fabrication. employs a modular architecture, offering a unified input script for defining topology optimization problems and six replaceable modules to streamline subsequent optimization tasks. By enabling users to express problems in the weak form, eliminates the need for matrix manipulations, thereby simplifying the modeling process. The software also integrates automatic differentiation to mitigate the intricacies associated with chain rules in finite element analysis and sensitivity analysis. Furthermore, provides access to a comprehensive array of readily available solvers and preconditioners, bolstering flexibility in problem-solving. is designed for scalability, furnishing robust support for parallel computing that seamlessly adapts to diverse computing platforms, spanning from laptops to distributed computing clusters. It also facilitates effortless transitions for various spatial dimensions, mesh geometries, element types, and orders, and quadrature degrees. Apart from the computational benefits, facilitates the automated exportation of optimized designs, compatible with open-source software for post-processing. This functionality allows for visualizing optimized designs across diverse mesh geometries and element shapes, automatically smoothing 3D designs, and converting smoothed designs into STereoLithography (STL) files for 3D printing. To illustrate the capabilities of , we present five representative examples showcasing topology optimization across 2D and 3D geometries, structured and unstructured meshes, solver switching, and complex boundary conditions. We also assess the parallel computational efficiency of by examining its performance across diverse computing platforms, process counts, problem sizes, and solver configurations. Finally, we demonstrate a physical 3D-printed model utilizing the STL file derived from the design optimized by . These examples showcase not only ’s rich functionality but also its parallel computing performance. The open-source is given in Appendix B and will be available on GitHub.
more »
« less
An optimized transient detection pipeline for the ASKAP Variables and Slow Transients (VAST) survey
ABSTRACT In this paper, we present an optimized version of the detection pipeline for the ASKAP Variables and Slow Transients (VAST) survey, offering significant performance improvement. The key to this optimization is the replacement of the original w-projection algorithm integrated in the Common Astronomy Software Applications package with the w-stacking algorithm implemented in the WSClean software. Our experiments demonstrate that this optimization improves the overall processing efficiency of the pipeline by approximately a factor of 3. Moreover, the residual images generated by the optimized pipeline exhibit lower noise levels and fewer artefact sources, suggesting that our optimized pipeline not only enhances detection accuracy but also improves imaging fidelity. This optimized VAST detection pipeline is integrated into the Data Activated Liu Graph Engine (DALiuGE) execution framework, specifically designed for SKA-scale big data processing. Experimental results show that the performance and scalability advantages of the pipeline using DALiuGE over traditional MPI or BASH techniques increase with the data size. In summary, the optimized transient detection pipeline significantly reduces runtime, increases operational efficiency, and decreases implementation costs, offering a practical optimization solution for other ASKAP imaging pipelines as well.
more »
« less
- Award ID(s):
- 1816492
- PAR ID:
- 10545358
- Publisher / Repository:
- Monthly Notices of the Royal Astronomical Society
- Date Published:
- Journal Name:
- Monthly Notices of the Royal Astronomical Society
- Volume:
- 526
- Issue:
- 2
- ISSN:
- 0035-8711
- Page Range / eLocation ID:
- 1809 to 1821
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Algorithm-hardware Co-optimization for Energy-efficient Drone Detection on Resource-constrained FPGAConvolutional neural network (CNN)-based object detection has achieved very high accuracy; e.g., single-shot multi-box detectors (SSDs) can efficiently detect and localize various objects in an input image. However, they require a high amount of computation and memory storage, which makes it difficult to perform efficient inference on resource-constrained hardware devices such as drones or unmanned aerial vehicles (UAVs). Drone/UAV detection is an important task for applications including surveillance, defense, and multi-drone self-localization and formation control. In this article, we designed and co-optimized an algorithm and hardware for energy-efficient drone detection on resource-constrained FPGA devices. We trained an SSD object detection algorithm with a custom drone dataset. For inference, we employed low-precision quantization and adapted the width of the SSD CNN model. To improve throughput, we use dual-data rate operations for DSPs to effectively double the throughput with limited DSP counts. For different SSD algorithm models, we analyze accuracy or mean average precision (mAP) and evaluate the corresponding FPGA hardware utilization, DRAM communication, and throughput optimization. We evaluated the FPGA hardware for a custom drone dataset, Pascal VOC, and COCO2017. Our proposed design achieves a high mAP of 88.42% on the multi-drone dataset, with a high energy efficiency of 79 GOPS/W and throughput of 158 GOPS using the Xilinx Zynq ZU3EG FPGA device on the Open Vision Computer version 3 (OVC3) platform. Our design achieves 1.1 to 8.7× higher energy efficiency than prior works that used the same Pascal VOC dataset, using the same FPGA device, but at a low-power consumption of 2.54 W. For the COCO dataset, our MobileNet-V1 implementation achieved an mAP of 16.8, and 4.9 FPS/W for energy-efficiency, which is ∼ 1.9× higher than prior FPGA works or other commercial hardware platforms.more » « less
-
Spintronic terahertz emitters (STEs) generate broadband THz radiation via ultrafast spin–charge conversion in magnetic multilayers, offering spectral coverage beyond that of photoconductive antennas and nonlinear optical crystals. Here, we demonstrate STEs based on a PtxAu100−x alloy that achieve significantly higher THz output power than widely used Pt-based devices. Alloy composition and layer thickness tuning yield Pt75Au25 as the optimal alloy, providing a 30% increase in THz power in CoFeB/Pt75Au25 bilayer STEs compared to the optimized CoFeB/Pt reference STE. In W/CoFeB/Pt75Au25 trilayer STEs, we observe a 10% higher THz power than in the optimized W/CoFeB/Pt trilayer. The STE efficiency is reduced upon annealing for both Pt75Au25- and Pt-based STEs due to the formation of interfacial alloys. Our results establish Pt75Au25 as a promising platform for high-performance STEs, where its giant spin Hall effect significantly enhances efficiency over conventional Pt-based devices.more » « less
-
Abstract Cyber-enabled manufacturing systems are becoming increasingly data-rich, generating vast amounts of real-time sensor data for quality control and process optimization. However, this proliferation of data also exposes these systems to significant cyber-physical security threats. For instance, malicious attackers may delete, change, or replace original data, leading to defective products, damaged equipment, or operational safety hazards. False data injection attacks can compromise machine learning models, resulting in erroneous predictions and decisions. To mitigate these risks, it is crucial to employ robust data processing techniques that can adapt to varying process conditions and detect anomalies in real-time. In this context, the incremental machine learning (IML) approaches can be valuable, allowing models to be updated incrementally with newly collected data without retraining from scratch. Moreover, although recent studies have demonstrated the potential of blockchain in enhancing data security within manufacturing systems, most existing security frameworks are primarily based on cryptography, which does not sufficiently address data quality issues. Thus, this study proposes a gatekeeper mechanism to integrate IML with blockchain and discusses how this integration could potentially increase the data integrity of cyber-enabled manufacturing systems. The proposed IML-integrated blockchain can address the data security concerns from both intentional alterations (e.g., malicious tampering) and unintentional alterations (e.g., process anomalies and outliers). The real-world case study results show that the proposed gatekeeper integration algorithm can successfully filter out over 80% of malicious data entries while maintaining comparable classification performance to standard IML models. Furthermore, the integration of blockchain enables effective detection of tampering attempts, ensuring the trustworthiness of the stored information.more » « less
-
Introducing HyperSense, the co‐designed hardware and software system efficiently controls analog‐to‐digital converter (ADC) modules’ data generation rate based on object presence predictions in sensor data. Addressing challenges posed by escalating sensor quantities and data rates, HyperSense reduces redundant digital data using energy‐efficient low‐precision ADC, diminishing machine learning system costs. Leveraging neurally inspired hyperdimensional computing, HyperSense analyzes real‐time raw low‐precision sensor data, offering advantages in handling noise, memory‐centricity, and real‐time learning. The proposed HyperSense model combines high‐performance software for object detection with real‐time hardware prediction, introducing the novel concept of intelligent sensor control. Comprehensive software and hardware evaluations demonstrate the solution's superior performance, evidenced by the highest area under the curve and sharpest receiver operating characteristic curve among lightweight models. Hardware‐wise, the field programmable gate array‐based domain‐specific accelerator tailored for HyperSense achieves a 5.6× speedup compared to YOLOv4 on NVIDIA Jetson Orin while showing up to 92.1% energy saving compared to the conventional system. These results underscore HyperSense's effectiveness and efficiency, positioning it as a promising solution for intelligent sensing and real‐time data processing across diverse applications.more » « less
An official website of the United States government

