Tangent-V: Math Formula Image Search Using Line-of-Sight Graphs

Davila, Kenny; Joshi, Ritvik; Setlur, Srirangaraj; Govindaraju, Venu; Zanibbi, Richard

doi:10.1007/978-3-030-15712-8_44

Citation Details

Tangent-V: Math Formula Image Search Using Line-of-Sight Graphs

We present a visual search engine for graphics such as math, chemical diagrams, and figures. Graphics are represented using Line-of- Sight (LOS) graphs, with symbols connected only when they can ‘see’ each other along an unobstructed line. Symbol identities may be provided (e.g., in PDF) or taken from Optical Character Recognition applied to images. Graphics are indexed by pairs of symbols that ‘see’ each other using their labels, spatial displacement, and size ratio. Retrieval has two layers: the first matches query symbol pairs in an inverted index, while the second aligns candidates with the query and scores the resulting matches using the identity and relative position of symbols. For PDFs, we also introduce a new tool that quickly extracts characters and their lo- cations. We have applied our model to the NTCIR-12 Wikipedia Formula Browsing Task, and found that the method can locate relevant matches without unification of symbols or using a math expression grammar. In the future, one might index LOS graphs for entire pages and search for text and graphics. Our source code has been made publicly available. more »

Award ID(s):: 1717997

PAR ID:: 10124341

Author(s) / Creator(s):: Davila, Kenny; Joshi, Ritvik; Setlur, Srirangaraj; Govindaraju, Venu; Zanibbi, Richard

Date Published:: 2019-01-01

Journal Name:: Proceedings of the European Conference on Information Retrieval (ECIR)

Page Range / eLocation ID:: 681-695

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1007/978-3-030-15712-8_44

More Like this