<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/terms/"><records count="1" morepages="false" start="1" end="1"><record rownumber="1"><dc:product_type>Journal Article</dc:product_type><dc:title>A Modular Static Cost Analysis for GPU Warp-Level Parallelism</dc:title><dc:creator>Blike, Gregory (ORCID:0009000528900822); Zicarelli, Hannah (ORCID:0000000236071746); Sathiyamoorthy, Udaya (ORCID:0009000689093451); Lange, Julien (ORCID:0000000196971378); Cogumbreiro, Tiago (ORCID:0000000232099258)</dc:creator><dc:corporate_author/><dc:editor/><dc:description>Graphics Processing Units (GPUs) are the accelerator of choice for performance-critical applications, yet optimizing for performance requires mastery of the complex interactions between its memory architecture and its execution model. Existing static analysis tools for GPU kernels either identify performance bugs without quantifying costs or cannot handle thread-divergent control flow, leading to significant over-approximations. We present the first static relational-cost analysis for GPU warp-level parallelism that can give exact bounds even in the presence of thread divergence. Our analysis is general and flexible, as it is parametric on the resource metric (uncoalesced accesses, bank conflicts) and on the cost relation (=, ≤, ≥). We establish a soundness theorem for our technique, provide mechanized proofs in Rocq and implement our theory in a tool called Pico. In a reproducibility experiment, Pico produced the tightest bounds in every input, outperforming the state-of-the-art tool RaCUDA in 10 kernels (1.7×better), while RaCUDA produced 4 incorrect bounds and crashed on 2 kernels. In an experiment to measure the accuracy of Pico, we studied the impact of thread-divergence in control-flow in a dataset of 226 kernels. We found that at least 75.3% of conditionals and 85.4% of loops can be captured exactly, without introducing approximation.</dc:description><dc:publisher>ACM</dc:publisher><dc:date>2026-01-08</dc:date><dc:nsf_par_id>10668318</dc:nsf_par_id><dc:journal_name>Proceedings of the ACM on Programming Languages</dc:journal_name><dc:journal_volume>10</dc:journal_volume><dc:journal_issue>POPL</dc:journal_issue><dc:page_range_or_elocation>1471 to 1499</dc:page_range_or_elocation><dc:issn>2475-1421</dc:issn><dc:isbn/><dc:doi>https://doi.org/10.1145/3776693</dc:doi><dcq:identifierAwardId>2204986</dcq:identifierAwardId><dc:subject/><dc:version_number/><dc:location/><dc:rights/><dc:institution/><dc:sponsoring_org>National Science Foundation</dc:sponsoring_org></record></records></rdf:RDF>