NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Two-Dimensional Longest Common Extension Queries in Compact Space

https://doi.org/10.4230/LIPICS.STACS.2025.38

Ganguly, Arnab; Gibney, Daniel; Shah, Rahul; Thankachan, Sharma V (January 2025, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Beyersdorff, Olaf; Pilipczuk, Michał; Pimentel, Elaine; Thắng, Nguyễn Kim (Ed.)
For a length n text over an alphabet of size σ, we can encode the suffix tree data structure in 𝒪(nlog σ) bits of space. It supports suffix array (SA), inverse suffix array (ISA), and longest common extension (LCE) queries in 𝒪(log^ε_σ n) time, which enables efficient pattern matching; here ε > 0 is an arbitrarily small constant. Further improvements are possible for LCE queries, where 𝒪(1) time queries can be achieved using an index of space 𝒪(nlog σ) bits. However, compactly indexing a two-dimensional text (i.e., an n× n matrix) has been a major open problem. We show progress in this direction by first presenting an 𝒪(n²log σ)-bit structure supporting LCE queries in near 𝒪((log_σ n)^{2/3}) time. We then present an 𝒪(n²log σ + n²log log n)-bit structure supporting ISA queries in near 𝒪(log n ⋅ (log_σ n)^{2/3}) time. Within a similar space, achieving SA queries in poly-logarithmic (even strongly sub-linear) time is a significant challenge. However, our 𝒪(n²log σ + n²log log n)-bit structure can support SA queries in 𝒪(n²/(σ log n)^c) time, where c is an arbitrarily large constant, which enables pattern matching in time faster than what is possible without preprocessing. We then design a repetition-aware data structure. The δ_2D compressibility measure for two-dimensional texts was recently introduced by Carfagna and Manzini [SPIRE 2023]. The measure ranges from 1 to n², with smaller δ_2D indicating a highly compressible two-dimensional text. The current data structure utilizing δ_2D allows only element access. We obtain the first structure based on δ_2D for LCE queries. It takes 𝒪^{~}(n^{5/3} + n^{8/5}δ_2D^{1/5}) space and answers queries in 𝒪(log n) time.
more » « less
Full Text Available
Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations

Ganguly, Arnab; Mitra, Riten; Zhou, Jinpu (April 2023, Journal of machine learning research)

The paper has two major themes. The first part of the paper establishes certain general results for infinite-dimensional optimization problems on Hilbert spaces. These results cover the classical representer theorem and many of its variants as special cases and offer a wider scope of applications. The second part of the paper then develops a systematic approach for learning the drift function of a stochastic differential equation by integrating the results of the first part with Bayesian hierarchical framework. Importantly, our Bayesian approach incorporates low-cost sparse learning through proper use of shrinkage priors while allowing proper quantification of uncertainty through posterior distributions. Several examples at the end illustrate the accuracy of our learning scheme.
more » « less
Full Text Available
Moment stability of stochastic processes with applications to control systems

https://doi.org/10.3934/mcrf.2023008

Ganguly, Arnab; Chatterjee, Debasish (January 2023, Mathematical Control and Related Fields)

Full Text Available
Fully Functional Parameterized Suffix Trees in Compact Space

Ganguly, Arnab; Shah, Rahul; Thankachan, Sharma V. (June 2022, 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022))
Mikolaj Bojanczyk; Emanuela Merelli; David P. Woodruff (Ed.)
Two equal length strings are a parameterized match (p-match) iff there exists a one-to-one function that renames the symbols in one string to those in the other. The Parameterized Suffix Tree (PST) [Baker, STOC' 93] is a fundamental data structure that handles various string matching problems under this setting. The PST of a text T[1,n] over an alphabet Σ of size σ takes O(nlog n) bits of space. It can report any entry in (parameterized) (i) suffix array, (ii) inverse suffix array, and (iii) longest common prefix (LCP) array in O(1) time. Given any pattern P as a query, a position i in T is an occurrence iff T[i,i+|P|-1] and P are a p-match. The PST can count the number of occurrences of P in T in time O(|P|log σ) and then report each occurrence in time proportional to that of accessing a suffix array entry. An important question is, can we obtain a compressed version of PST that takes space close to the text’s size of nlogσ bits and still support all three functionalities mentioned earlier? In SODA' 17, Ganguly et al. answered this question partially by presenting an O(nlogσ) bit index that can support (parameterized) suffix array and inverse suffix array operations in O(log n) time. However, the compression of the (parameterized) LCP array and the possibility of faster suffix array and inverse suffix array queries in compact space were left open. In this work, we obtain a compact representation of the (parameterized) LCP array. With this result, in conjunction with three new (parameterized) suffix array representations, we obtain the first set of PST representations in o(nlog n) bits (when logσ = o(log n)) as follows. Here ε > 0 is an arbitrarily small constant. - Space O(n logσ) bits and query time O(log_σ^ε n); - Space O(n logσlog log_σ n) bits and query time O(log log_σ n); and - Space O(n logσ log^ε_σ n) bits and query time O(1). The first trade-off is an improvement over Ganguly et al.’s result, whereas our third trade-off matches the optimal time performance of Baker’s PST while squeezing the space by a factor roughly log_σ n. We highlight that our trade-offs match the space-and-time bounds of the best-known compressed text indexes for exact pattern matching and further improvement is highly unlikely.
more » « less
Full Text Available
The Heaviest Induced Ancestors Problem: Better Data Structures and Applications

https://doi.org/10.1007/s00453-022-00955-7

Abedin, Paniz; Hooshmand, Sahar; Ganguly, Arnab; Thankachan, Sharma V. (July 2022, Algorithmica)

Full Text Available
Inhomogeneous functionals and approximations of invariant distributions of ergodic diffusions: Central limit theorem and moderate deviation asymptotics

https://doi.org/10.1016/j.spa.2020.10.009

Ganguly, Arnab; Sundar, P. (March 2021, Stochastic Processes and their Applications)
null (Ed.)
Full Text Available
Efficient Data Structures for Range Shortest Unique Substring Queries

https://doi.org/10.3390/a13110276

Abedin, Paniz; Ganguly, Arnab; Pissis, Solon P.; Thankachan, Sharma V. (November 2020, Algorithms)
null (Ed.)
Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T[i,j] of T with exactly one occurrence in [α,β]. We present an O(nlogn)-word data structure with O(logwn) query time, where w=Ω(logn) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(nlogϵn) query time, where ϵ>0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].
more » « less
Full Text Available
FM-Index Reveals the Reverse Suffix Array

https://doi.org/10.4230/LIPIcs.CPM.2020.13

Ganguly, Arnab; Gibney, Daniel; Hooshmand, Sahar; Kulekci, M. Oguzhan; Thankachan, Sharma V. (January 2020, 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020)
null (Ed.)
Full Text Available
FM-Index Reveals the Reverse Suffix Array

Ganguly, Arnab; Gibney, Daniel; Sahar, Hooshmand; M. Oguzhan, Kulekci; V. Thankachan, Sharma (January 2020, 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020, June 17-19, 2020, Copenhagen, Denmark)
null (Ed.)
Full Text Available
Range Shortest Unique Substring Queries

https://doi.org/10.1007/978-3-030-32686-9_18

Abedin, Paniz; Ganguly, Arnab; Pissis, Solon P.; Thankachan, Sharma V. (January 2019, String Processing and Information Retrieval (SPIRE))

Full Text Available

« Prev Next »

Search for: All records