NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

EMT: an OS framework for new memory translation architectures

Chai, Siyuan; Zhang, Jiyuan; Kim, Jongyul; Wang, Alan; Chung, Fan; Stojkovic, Jovan; Jia, Weiwei; Skarlatos, Dimitrios; Torrellas, Josep; Xu, Tianyin (July 2025, USENIX Association, Proceedings of the 19th USENIX Conference on Operating Systems Design and Implementation)

Free, publicly-accessible full text available July 7, 2026
CXLfork: Fast Remote Fork over CXL Fabrics

https://doi.org/10.1145/3676641.3715988

Alverti, Chloe; Psomadakis, Stratos; Ocalan, Burak; Jaiswal, Shashwat; Xu, Tianyin; Torrellas, Josep (March 2025, ACM)

Free, publicly-accessible full text available March 30, 2026
Concord: Rethinking Distributed Coherence for Software Caches in Serverless Environments

https://doi.org/10.1109/HPCA61900.2025.00043

Stojkovic, Jovan; Alverti, Chloe; Andrade, Alan; Iliakopoulou, Nikoleta; Franke, Hubertus; Xu, Tianyin; Torrellas, Josep (March 2025, IEEE)

Free, publicly-accessible full text available March 1, 2026
Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication

https://doi.org/10.1109/SC41406.2024.00052

Ranawaka, Isuru; Hussain, Md Taufique; Block, Charles; Gerogiannis, Gerasimos; Torrellas, Josep; Azad, Ariful (November 2024, IEEE)

We consider a sparse matrix-matrix multiplication (SpGEMM) setting where one matrix is square and the other is tall and skinny. This special variant, TS-SpGEMM, has important applications in multi-source breadth-first search, influence maximization, sparse graph embedding, and algebraic multigrid solvers. Unfortunately, popular distributed algorithms like sparse SUMMA deliver suboptimal performance for TS-SpGEMM. To address this limitation, we develop a novel distributed-memory algorithm tailored for TS SpGEMM. Our approach employs customized 1D partitioning for all matrices involved and leverages sparsity-aware tiling for efficient data transfers. In addition, it minimizes communication overhead by incorporating both local and remote computations. On average, our TSSpGEMM algorithm attains 5x performance gains over 2D and 3D SUMMA. Furthermore, we use our algorithm to implement multi-source breadth-first search and sparse graph embedding algorithms and demonstrate their scalability up to 512 Nodes (or 65,536 cores) on NERSC Perlmutter.
more » « less
Full Text Available
Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix Multiplication

Ranawaka, Isuru; Hussain, Md Taufique; Block, Charles; Gerogiannis, Gerasimos; Torrellas, Josep; Azad, Ariful (November 2024, International Conference for High Performance Computing, Networking, Storage and Analysis SC)

Full Text Available
HADES: Hardware-Assisted Distributed Transactions in the Age of Fast Networks and SmartNICs

https://doi.org/10.1109/ISCA59077.2024.00062

Kokolis, Apostolos; Psistakis, Antonis; Reidys, Benjamin; Huang, Jian; Torrellas, Josep (June 2024, IEEE)

Full Text Available
EcoFaaS: Rethinking the Design of Serverless Environments for Energy Efficiency

https://doi.org/10.1109/ISCA59077.2024.00042

Stojkovic, Jovan; Iliakopoulou, Nikoleta; Xu, Tianyin; Franke, Hubertus; Torrellas, Josep (June 2024, IEEE)

Full Text Available
MINOS: Distributed Consistency and Persistency Protocol Implementation & Offloading to SmartNICs

https://doi.org/10.1109/HPCA57654.2024.00076

Psistakis, Antonis; Chaix, Fabien; Torrellas, Josep (March 2024, IEEE)

Full Text Available
Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMM

https://doi.org/10.1145/3620665.3640427

Block, Charles; Gerogiannis, Gerasimos; Mendis, Charith; Azad, Ariful; Torrellas, Josep (April 2024, ACM)

Sparse matrix dense matrix multiplication (SpMM) is commonly used in applications ranging from scientific computing to graph neural networks. Typically, when SpMM is executed in a distributed platform, communication costs dominate. Such costs depend on how communication is scheduled. If it is scheduled in a sparsity-unaware manner, such as with collectives, execution is often inefficient due to unnecessary data transfers. On the other hand, if communication is scheduled in a fine-grained sparsity-aware manner, communicating only the necessary data, execution can also be inefficient due to high software overhead. We observe that individual sparse matrices often contain regions that are denser and regions that are sparser. Based on this observation, we develop a model that partitions communication into sparsity-unaware and sparsity-aware components. Leveraging the partition, we develop a new algorithm that performs collective communication for the denser regions, and fine-grained, one-sided communication for the sparser regions. We call the algorithm Two-Face. We show that Two-Face attains an average speedup of 2.11x over prior work when evaluated on a 4096-core supercomputer. Additionally, Two-Face scales well with the machine size.
more » « less
HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures

https://doi.org/10.1109/HPCA57654.2024.00081

Gerogiannis, Gerasimos; Aananthakrishnan, Sriram; Torrellas, Josep; Hur, Ibrahim (March 2024, IEEE)

Full Text Available

« Prev Next »

Search for: All records