Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of the union search. Consequently, we introduce a new notion of unionability that considers relationships between columns, together with the semantics of columns, in a principled way. To do so, we present two new methods to discover the semantic relationships between pairs of columns. The first uses an existing knowledge base (KB), and the second (which we call a "synthesized KB") uses knowledge from the data lake itself. We adopt an existing Table Union Search benchmark and present new (open) benchmarks that represent small and large real data lakes. We show that our new unionability search algorithm, called SANTOS, outperforms a state-of-the-art union search that uses a wide variety of column-based semantics, including word embeddings and regular expressions. We show empirically that our synthesized KB improves the accuracy of union search by representing relationship semantics that may not be contained in an available KB. This result hints at a promising future of creating synthesized KBs from data lakes with limited KB coverage and using them for union search.more » « lessFree, publicly-accessible full text available May 26, 2024
In this paper, we propose an efficient numerical scheme for solving some large‐scale ill‐posed linear inverse problems arising from image restoration. In order to accelerate the computation, two different hidden structures are exploited. First, the coefficient matrix is approximated as the sum of a small number of Kronecker products. This procedure not only introduces one more level of parallelism into the computation but also enables the usage of computationally intensive matrix–matrix multiplications in the subsequent optimization procedure. We then derive the corresponding Tikhonov regularized minimization model and extend the fast iterative shrinkage‐thresholding algorithm (FISTA) to solve the resulting optimization problem. Because the matrices appearing in the Kronecker product approximation are all structured matrices (Toeplitz, Hankel, etc.), we can further exploit their fast matrix–vector multiplication algorithms at each iteration. The proposed algorithm is thus called
structuredFISTA (sFISTA). In particular, we show that the approximation error introduced by sFISTA is well under control and sFISTA can reach the same image restoration accuracy level as FISTA. Finally, both the theoretical complexity analysis and some numerical results are provided to demonstrate the efficiency of sFISTA.