<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/terms/"><records count="1" morepages="false" start="1" end="1"><record rownumber="1"><dc:product_type>Conference Paper</dc:product_type><dc:title>An Average-Case Efficient Two-Stage Algorithm for Enumerating All Longest Common Substrings of Minimum Length $k$ Between Genome Pairs</dc:title><dc:creator>Prosperi, Mattia; Marini, Simone; Boucher, Christina</dc:creator><dc:corporate_author/><dc:editor/><dc:description>A problem extension of the longest common sub-string (LCS) between two texts is the enumeration of all LCSs given a minimum length k (ALCS-k), along with their positions in each text. In bioinformatics, an efficient solution to the ALCS- k for very long texts -genomes or metagenomes- can provide useful insights to discover genetic signatures responsible for biological mechanisms. The ALCS-k problem has two additional requirements compared to the LCS problem: one is the minimum length k , and the other is that all common strings longer than k must be reported. We present an efficient, two-stage ALCS-k algorithm exploiting the spectrum of text substrings of length k (k-mers). Our approach yields a worst-case time complexity loglinear in the number of k-mers for the first stage, and an average-case loglinear in the number of common k-mers for the second stage (several orders of magnitudes smaller than the total k-mer spectrum). The space complexity is linear in the first phase (disk-based), and on average linear in the second phase (disk- and memory-based). Tests performed on genomes for different organisms (including viruses, bacteria and animal chromosomes) show that run times are consistent with our theoretical estimates; further, comparisons with MUMmer4 show an asymptotic advantage with divergent genomes.</dc:description><dc:publisher>IEEE</dc:publisher><dc:date>2024-06-03</dc:date><dc:nsf_par_id>10548611</dc:nsf_par_id><dc:journal_name/><dc:journal_volume/><dc:journal_issue/><dc:page_range_or_elocation>93 to 102</dc:page_range_or_elocation><dc:issn/><dc:isbn>979-8-3503-8373-7</dc:isbn><dc:doi>https://doi.org/10.1109/ICHI61247.2024.00020</dc:doi><dcq:identifierAwardId>2013998</dcq:identifierAwardId><dc:subject/><dc:version_number/><dc:location>Orlando, FL, USA</dc:location><dc:rights/><dc:institution/><dc:sponsoring_org>National Science Foundation</dc:sponsoring_org></record></records></rdf:RDF>