On Strings Having the Same Length-k Substrings.
Saved in:
| Title: | On Strings Having the Same Length-k Substrings. |
|---|---|
| Authors: | Bernardini, Giulia1 (AUTHOR) giulia.bernardini@unimi.it, Conte, Alessio2 (AUTHOR) alessio.conte@unipi.it, Gabory, Esteban3 (AUTHOR) esteban.gabory@unipa.it, Grossi, Roberto2 (AUTHOR) roberto.grossi@unipi.it, Loukides, Grigorios4 (AUTHOR) grigorios.loukides@kcl.ac.uk, Pissis, Solon P.5 (AUTHOR) solon.pissis@cwi.nl, Punzi, Giulia2 (AUTHOR) giulia.punzi@unipi.it, Sweering, Michelle5 (AUTHOR) michelle.sweering@gmail.com |
| Source: | Theory of Computing Systems. Jun2026, Vol. 70 Issue 2, p1-31. 31p. |
| Abstract: | Let Substr k (X) denote the set of length- k substrings of a given string X for a given integer k > 0 . We study the following basic string problem, called z -Shortest S k -Equivalent Strings: Given a set S k of n length- k strings and an integer z > 0 , list z shortest distinct strings T 1 , … , T z such that Substr k ( T i ) = S k , for all i ∈ [ 1 , z ] . The z -Shortest S k -Equivalent Strings problem arises naturally as an encoding problem in many real-world applications; e.g. in data privacy, data compression, and bioinformatics. The 1 -Shortest S k -Equivalent Strings, referred to as Shortest S k -Equivalent String, asks for a shortest string X such that Substr k (X) = S k . Our main contributions are as follows. Given a directed graph G = (V , E) , the Directed Chinese Postman (DCP) problem asks for a shortest closed walk that visits every edge of G at least once. DCP can be solved using an algorithm for min-cost flow. We show, via a non-trivial reduction, that if Shortest S k -Equivalent String over a binary alphabet has a near-linear-time solution then so does DCP. Secondly, we show that the length of a shortest string output by Shortest S k -Equivalent String is in O (k + n 2 ) . We generalize this bound by showing that the total length of z shortest strings is in O (zk + zn 2 + z 2 n) . We derive these upper bounds by showing (asymptotically tight) bounds on the total length of z shortest Eulerian walks in general directed graphs. Furthermore, we present an algorithm for solving z -Shortest S k -Equivalent Strings in O (nk + n 2 log 2 n + zn 2 log n + | output |) time. If z = 1 , the time becomes O (nk + n 2 log 2 n) by the fact that the size of the input is Θ (n k) and the size of the output is O (k + n 2 ) . Finally, we also provide a direct technical application of our algorithms on strings in an existing data privacy framework. A preliminary version of this paper was announced at CPM 2022. [ABSTRACT FROM AUTHOR] |
| Copyright of Theory of Computing Systems is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
Be the first to leave a comment!