TangentV: Math Formula Image Search Using LineofSight Graphs
We present a visual search engine for graphics such as math, chemical diagrams, and figures. Graphics are represented using Lineof Sight (LOS) graphs, with symbols connected only when they can ‘see’ each other along an unobstructed line. Symbol identities may be provided (e.g., in PDF) or taken from Optical Character Recognition applied to images. Graphics are indexed by pairs of symbols that ‘see’ each other using their labels, spatial displacement, and size ratio. Retrieval has two layers: the first matches query symbol pairs in an inverted index, while the second aligns candidates with the query and scores the resulting matches using the identity and relative position of symbols. For PDFs, we also introduce a new tool that quickly extracts characters and their lo cations. We have applied our model to the NTCIR12 Wikipedia Formula Browsing Task, and found that the method can locate relevant matches without unification of symbols or using a math expression grammar. In the future, one might index LOS graphs for entire pages and search for text and graphics. Our source code has been made publicly available.
 Award ID(s):
 1717997
 Publication Date:
 NSFPAR ID:
 10124341
 Journal Name:
 Proceedings of the European Conference on Information Retrieval (ECIR)
 Page Range or eLocationID:
 681695
 Sponsoring Org:
 National Science Foundation
