Protein sequence landscapes are not so simple: on reference-free versus reference-based inference
- Award ID(s):
- 1914916
- PAR ID:
- 10527324
- Publisher / Repository:
- bioRxiv
- Date Published:
- Format(s):
- Medium: X
- Institution:
- bioRxiv
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)Abstract We study super-resolution multi-reference alignment, the problem of estimating a signal from many circularly shifted, down-sampled and noisy observations. We focus on the low SNR regime, and show that a signal in $${\mathbb{R}}^M$$ is uniquely determined when the number $$L$$ of samples per observation is of the order of the square root of the signal’s length ($$L=O(\sqrt{M})$$). Phrased more informally, one can square the resolution. This result holds if the number of observations is proportional to $$1/\textrm{SNR}^3$$. In contrast, with fewer observations recovery is impossible even when the observations are not down-sampled ($L=M$). The analysis combines tools from statistical signal processing and invariant theory. We design an expectation-maximization algorithm and demonstrate that it can super-resolve the signal in challenging SNR regimes.more » « less
-
Data movement is a common performance bottleneck, and its chief remedy is caching. Traditional cache management is transparent to the workload: data that should be kept in cache are determined by the recency information only, while the program information, i.e., future data reuses, is not communicated to the cache. This has changed in a new cache design named Lease Cache . The program control is passed to the lease cache by a compiler technique called Compiler Assigned Reference Lease (CARL). This technique collects the reuse interval distribution for each reference and uses it to compute and assign the lease value to each reference. In this article, we prove that CARL is optimal under certain statistical assumptions. Based on this optimality, we prove miss curve convexity, which is useful for optimizing shared cache, and sub-partitioning monotonicity, which simplifies lease compilation. We evaluate the potential using scientific kernels from PolyBench and show that compiler insertions of up to 34 leases in program code achieve similar or better cache utilization (in variable size cache) than the optimal fixed-size caching policy, which has been unattainable with automatic caching but now within the potential of cache programming for all tested programs and most cache sizes.more » « less
An official website of the United States government

