o fill a gap in online educational tools, we are working to support search in lecture videos using formulas from lecture notes and vice versa. We use an existing system to convert single-shot lecture videos to keyframe images that capture whiteboard contents along with the times they appear. We train classifiers for handwritten symbols using the CROHME dataset, and for LATEX symbols using generated images. Symbols detected in video keyframes and LATEX formula images are indexed using Line-of-Sight graphs. For search, we lookup pairs of symbols that can 'see' each other, and connected pairs are merged to identify the largest match within each indexed image. We rank matches using symbol class probabilities and angles between symbol pairs. We demonstrate how our method effectively locates formulas between typeset and handwritten images using a set of linear algebra lectures. By combining our search engine Tangent-V) with temporal keyframe metadata, we are able to navigate to where a query formula in LATEX is first handwritten in a lecture video. Our system is available as open-source. For other domains, only the OCR modules require updating. 
                        more » 
                        « less   
                    
                            
                            Tangent-V: Math Formula Image Search Using Line-of-Sight Graphs
                        
                    
    
            We present a visual search engine for graphics such as math, chemical diagrams, and figures. Graphics are represented using Line-of- Sight (LOS) graphs, with symbols connected only when they can ‘see’ each other along an unobstructed line. Symbol identities may be provided (e.g., in PDF) or taken from Optical Character Recognition applied to images. Graphics are indexed by pairs of symbols that ‘see’ each other using their labels, spatial displacement, and size ratio. Retrieval has two layers: the first matches query symbol pairs in an inverted index, while the second aligns candidates with the query and scores the resulting matches using the identity and relative position of symbols. For PDFs, we also introduce a new tool that quickly extracts characters and their lo- cations. We have applied our model to the NTCIR-12 Wikipedia Formula Browsing Task, and found that the method can locate relevant matches without unification of symbols or using a math expression grammar. In the future, one might index LOS graphs for entire pages and search for text and graphics. Our source code has been made publicly available. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1717997
- PAR ID:
- 10124341
- Date Published:
- Journal Name:
- Proceedings of the European Conference on Information Retrieval (ECIR)
- Page Range / eLocation ID:
- 681-695
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            We present a model for recognizing typeset math formula images from connected components or symbols. In our approach, connected components are used to construct a line-of-sight (LOS) graph. The graph is used both to reduce the search space for formula structure interpretations, and to guide a classification attention model using separate channels for inputs and their local visual context. For classification, we used visual densities with Random Forests for initial development, and then converted this to a Convolutional Neural Network (CNN) with a second branch to capture context for each input image. Formula structure is extracted as a directed spanning tree from a weighted LOS graph using Edmonds’ algorithm. We obtain strong results for formulas without grids or matrices in the InftyCDB-2 dataset (90.89% from components, 93.5% from symbols). Using tools from the CROHME handwritten formula recognition competitions, we were able to compile all symbol and structure recognition errors for analysis. Our data and source code are publicly available.more » « less
- 
            When searching for mathematical content, accurate measures of formula similarity can help with tasks such as document ranking, query recommendation, and result set clustering. While there have been many attempts at embedding words and graphs, formula embedding is in its early stages. We introduce a new formula em- bedding model that we use with two hierarchical representations, (1) Symbol Layout Trees (SLTs) for appearance, and (2) Operator Trees (OPTs) for mathematical content. Following the approach of graph embeddings such as DeepWalk, we generate tuples represent- ing paths between pairs of symbols depth-first, embed tuples using the fastText n-gram embedding model, and then represent an SLT or OPT by its average tuple embedding vector. We then combine SLT and OPT embeddings, leading to state-of-the-art results for the NTCIR-12 formula retrieval task. Our fine-grained holistic vector representations allow us to retrieve many more partially similar for- mulas than methods using structural matching in trees. Combining our embedding model with structural matching in the Approach0 formula search engine produces state-of-the-art results for both fully and partially relevant results on the NTCIR-12 benchmark. Source code for our system is publicly available.more » « less
- 
            We present a new visual parsing method based on convolutional neural networks for handwritten mathematical formulas. The Query-Driven Global Graph Attention (QD- GGA) parsing model employs multi-task learning, and uses a single feature representation for locating, classifying, and relating symbols. First, a Line-Of-Sight (LOS) graph is computed over the handwritten strokes in a formula. Second, class distributions for LOS nodes and edges are obtained using query-specific feature filters (i.e., attention) in a single feed-forward pass. Finally, a Maximum Spanning Tree (MST) is extracted from the weighted graph. Our preliminary results show that this is a promising new approach for visual parsing of handwritten formulas. Our data and source code are publicly available.more » « less
- 
            LoRa has seen widespread adoption as a long range IoT technology. As the number of LoRa deployments grow, packet collisions undermine its overall network throughput. In this paper, we propose a novel interference cancellation technique -- Concurrent Interference Cancellation (CIC), that enables concurrent decoding of multiple collided LoRa packets. CIC fundamentally differs from existing approaches as it demodulates symbols by canceling out all other interfering symbols. It achieves this cancellation by carefully selecting a set of sub-symbols -- pieces of the original symbol such that no interfering symbol is common across all sub-symbols in this set. Thus, after demodulating each sub-symbol, an intersection across their spectra cancels out all the interfering symbols. Through LoRa deployments using COTS devices, we demonstrate that CIC can increase the network capacity of standard LoRa by up to 10x and up to 4x over the state-of-the-art research. While beneficial across all scenarios, CIC has even more significant benefits under low SNR conditions that are common to LoRa deployments, in which prior approaches appear to perform quite poorly.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    