skip to main content


Title: K-SVD based Periodicity Dictionary Learning
It has recently been shown that periodicity in discrete-time data can be analyzed using Ramanujan sums and associated dictionaries. This paper explores the role of dictionary learning methods in the context of period estimation and periodic signal representation using dictionaries. It is shown that a wellknown dictionary learning algorithm, namely K-SVD, is able to learn Ramanujan and Farey periodicity dictionaries from the noisy, sparse coefficient data generated from them without imposing any periodicity structure in the learning stage. This similarity between the learned dictionary and the underlying original periodicity dictionary reaffirms the power of the KSVD in predicting the right dictionary from data without explicit application-specific constraints. The paper also examines how the choice of different parameter values affect the similarity of the learned dictionary to the underlying dictionary. Two versions of K-SVD along with different initializations are analyzed for their effect on representation and denoising error for the data.  more » « less
Award ID(s):
1712633
NSF-PAR ID:
10275652
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proc. Asil. Conf. Sig., Sys., and Comp
Page Range / eLocation ID:
1333 to 1337
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Detecting and locating damage information from waves reflected off damage is a common practice in non-destructive structural health monitoring systems. Yet, the transmitted ultrasonic guided waves are affected by the physical and material properties of the structure and are often complicated to model mathematically. This calls for data-driven approaches to model the behaviour of waves, where patterns in wave data due to damage can be learned and distinguished from non-damage data. Recent works have used a popular dictionary learning algorithm, K-SVD, to learn an overcomplete dictionary for waves propagating in a metal plate. However, the domain knowledge is not utilized. This may lead to fruitless results in the case where there are strong patterns in the data that are not of interest to the domain. In this work, instead of treating the K-SVD algorithm as a black box, we create a novel modification by enforcing domain knowledge. In particular, we look at how regularizing the K-SVD algorithm with the one-dimensional wave equation affects the dictionary learned in the simple case of vibrating string. By adding additional non-wave patterns (noise) to the data, we demonstrate that the “wave-informed K-SVD” does not learn patterns which do not obey the wave equation hence learning patterns from data and not the noise. 
    more » « less
  2. This work discusses an optimization framework to embed dictionary learning frameworks with the wave equation as a strategy for incorporating prior scientific knowledge into a machine learning algorithm. We modify dictionary learning to study ultrasonic guided wave-based defect detection for non-destructive structural health monitoring systems. Specifically, this work involves altering the popular-SVD algorithm for dictionary learning by enforcing prior knowledge about the ultrasonic guided wave problem through a physics-based regularization derived from the wave equation. We confer it the name “wave-informed K-SVD.” Training dictionary on data simulated from a fixed string added with noise using both K-SVD and wave-informed K-SVD, we show an improved physical consistency of columns of dictionary matrix with the known modal behavior of different one-dimensional wave simulations is observed. 
    more » « less
  3. Similarity search is the basis for many data analytics techniques, including k-nearest neighbor classification and outlier detection. Similarity search over large data sets relies on i) a distance metric learned from input examples and ii) an index to speed up search based on the learned distance metric. In interactive systems, input to guide the learning of the distance metric may be provided over time. As this new input changes the learned distance metric, a naive approach would adopt the costly process of re-indexing all items after each metric change. In this paper, we propose the first solution, called OASIS, to instantaneously adapt the index to conform to a changing distance metric without this prohibitive re-indexing process. To achieve this, we prove that locality-sensitive hashing (LSH) provides an invariance property, meaning that an LSH index built on the original distance metric is equally effective at supporting similarity search using an updated distance metric as long as the transform matrix learned for the new distance metric satisfies certain properties. This observation allows OASIS to avoid recomputing the index from scratch in most cases. Further, for the rare cases when an adaption of the LSH index is shown to be necessary, we design an efficient incremental LSH update strategy that re-hashes only a small subset of the items in the index. In addition, we develop an efficient distance metric learning strategy that incrementally learns the new metric as inputs are received. Our experimental study using real world public datasets confirms the effectiveness of OASIS at improving the accuracy of various similarity search-based data analytics tasks by instantaneously adapting the distance metric and its associated index in tandem, while achieving an up to 3 orders of magnitude speedup over the state-of-art techniques. 
    more » « less
  4. null (Ed.)
    It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries. While these documents are useful in helping an end-user properly interpret the meaning of a column in a data set, existing data dictionaries typically are not machine-readable and do not follow a common specification standard. We introduce the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data, enabling standardization and harmonization across diverse data sets. In this paper, we present our Semantic Data Dictionary work in the context of our work with biomedical data; however, the approach can and has been used in a wide range of domains. The rendition of data in this form helps promote improved discovery, interoperability, reuse, traceability, and reproducibility. We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature. We discuss our approach, present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey data set, present modeling challenges, and describe the use of this approach in sponsored research, including our work on a large National Institutes of Health (NIH)-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics, Learning, and Semantics project. We evaluate this work in comparison with traditional data dictionaries, mapping languages, and data integration tools. 
    more » « less
  5. Dictionaries remain the most well studied class of data structures. A dictionary supports insertions, deletions, membership queries, and usually successor, predecessor, and extract-min. In a RAM, all such operations take O(log n) time on n elements. Dictionaries are often cross-referenced as follows. Consider a set of tuples {〈ai,bi,ci…〉}. A database might include more than one dictionary on such a set, for example, one indexed on the a ‘s, another on the b‘s, and so on. Once again, in a RAM, inserting into a set of L cross-referenced dictionaries takes O(L log n) time, as does deleting. The situation is more interesting in external memory. On a Disk Access Machine (DAM), B-trees achieve O(logB N) I/Os for insertions and deletions on a single dictionary and K-element range queries take optimal O(logB N + K/B) I/Os. These bounds are also achievable by a B-tree on cross-referenced dictionaries, with a slowdown of an L factor on insertion and deletions. In recent years, both the theory and practice of external- memory dictionaries has been revolutionized by write- optimization techniques. A dictionary is write optimized if it is close to a B-tree for query time while beating B-trees on insertions. The best (and optimal) dictionaries achieve a substantially improved insertion and deletion cost of amortized I/Os on a single dictionary while maintaining optimal O(log1+B∊ N + K/B)- I/O range queries. Although write optimization still helps for insertions into cross-referenced dictionaries, its value for deletions would seem to be greatly reduced. A deletion into a cross- referenced dictionary only specifies a key a. It seems to be necessary to look up the associated values b, c … in order to delete them from the other dictionaries. This takes Ω(logB N) I/Os, well above the per-dictionary write-optimization budget of So the total deletion cost is In short, for deletions, write optimization offers an advantage over B-trees in that L multiplies a lower order term, but when L = 2, write optimization seems to offer no asymptotic advantage over B-trees. That is, no known query- optimal solution for pairs of cross-referenced dictionaries seem to beat B-trees for deletions. In this paper, we show a lower bound establishing that a pair of cross-referenced dictionaries that are optimal for range queries and that supports deletions cannot match the write optimization bound available to insert-only dictionaries. This result thus establishes a limit to the applicability of write-optimization techniques on which many new databases and file systems are based. Read More: http://epubs.siam.org/doi/10.1137/1.9781611974782.99 
    more » « less