Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available December 2, 2025
-
We consider the problem of constructing embeddings of large attributed graphs and supporting multiple downstream learning tasks. We develop a graph embedding method, which is based on extending deep metric and unbiased contrastive learning techniques to 1) work with attributed graphs, 2) enabling a mini-batch based approach, and 3) achieving scalability. Based on a multi-class tuplet loss function, we present two algorithms -- DMT for semi-supervised learning and DMAT-i for the unsupervised case. Analyzing our methods, we provide a generalization bound for the downstream node classification task and for the first time relate tuplet loss to contrastive learning. Through extensive experiments, we show high scalability of representation construction, and in applying the method for three downstream tasks (node clustering, node classification, and link prediction) better consistency over any single existing method.more » « less
-
Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution are becoming common. This paper presents SoD2, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a classification of common operators that form DNNs, and the use of this classification towards a Rank and Dimension Propagation (RDP) method. This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these. Next, using RDP we enable a series of optimizations, like fused code generation, execution (order) planning, and even runtime memory allocation plan generation. By evaluating the framework on 10 emerging Dynamic DNNs and comparing it against several existing systems, we demonstrate both reductions in execution latency and memory requirements, with RDP-enabled key optimizations responsible for much of the gains.more » « less
-
Storage of sensitive multi-dimensional arrays must be secure and efficient in storage and processing time. Searchable encryption allows one to trade between security and efficiency. Searchable encryption design focuses on building indexes, overlooking the crucial aspect of record retrieval. Gui et al. (PoPETS 2023) showed that understanding the security and efficiency of record retrieval is critical to understand the overall system. A common technique for improving security is partitioning data tuples into parts. When a tuple is requested, the entire relevant part is retrieved, hiding the tuple of interest. This work assesses tuple partitioning strategies in the dense data setting, considering parts that are random, 1-dimensional, and multi-dimensional. We consider synthetic datasets of 2,3 and 4 dimensions, with sizes extending up to 2M tuples. We compare security and efficiency across a variety of record retrieval methods. Our findings are: 1. For most configurations, multi-dimensional partitioning yields better efficiency and less leakage. 2. 1-dimensional partitioning outperforms multi-dimensional partitioning when the first (indexed) dimension is any size as long as the query is large in all other dimensions. 3. The leakage of 1-dimensional partitioning is reduced the most when using a bucketed ORAM (Demertiz et al., USENIX Security 2020).more » « less
-
LU factorization for sparse matrices is an important computing step for many engineering and scientific problems such as circuit simulation. There have been many efforts toward parallelizing and scaling this algorithm, which include the recent efforts targeting the GPUs. However, it is still challenging to deploy a complete sparse LU factorization workflow on a GPU due to high memory requirements and data dependencies. In this paper, we propose the first complete GPU solution for sparse LU factorization. To achieve this goal, we propose an out-of-core implementation of the symbolic execution phase, thus removing the bottleneck due to large intermediate data structures. Next, we propose a dynamic parallelism implementation of Kahn's algorithm for topological sort on the GPUs. Finally, for the numeric factorization phase, we increase the parallelism degree by removing the memory limits for large matrices as compared to the existing implementation approaches. Experimental results show that compared with an implementation modified from GLU 3.0, our out-of-core version achieves speedups of 1.13--32.65X. Further, our out-of-core implementation achieves a speedup of 1.2--2.2 over an optimized unified memory implementation on the GPU. Finally, we show that the optimizations we introduce for numeric factorization turn out to be effective.more » « less
-
Despite the popularity of in-situ analytics in scientific computing, there is only limited work to date on in-situ analytics for simulations running on GPUs. Notably, two unaddressed challenges are 1) performing memory-efficient in-situ analysis on accelerators and 2)automatically choosing the processing resources and suitable data representation for a given query and platform. This paper addresses both problems. First, GAP makes several new contributions toward making bitmap indices suitable, effective, and efficient as a compressed data summary structure for the GPUs - this includes introducing a layout structure, a method for generating multi-attribute bitmaps, and novel techniques for bitmap-based processing of major operators that comprise complex data analytics. Second, this paper presents a performance modeling methodology, aiming to predict the placement (i.e., CPU or GPU) and the data representation choice (summarization or original) that yield the best performance on a given configuration. Our extensive evaluation of complex in-situ queries and real-world simulations shows that with our methods, analytics on GPU using bitmaps almost always outperforms other options, and the GAP performance model predicts the optimal placement and data representation for most scenarios.more » « less