skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Deep Neural Network for Semantic-based Text Recognition in Images
State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word text in images of natural scenes and ignore the semantic coherence within a region of text. However, when interpreted together, seemingly isolated words may be easier to recognize. On this basis, we propose a novel "semantic-based text recognition" (STR) deep learning model that reads text in images with the help of understanding context. STR consists of several modules. We introduce the Text Grouping and Arranging (TGA) algorithm to connect and order isolated text regions. A text-recognition network interprets isolated words. Benefiting from semantic information, a sequence-to-sequence network model efficiently corrects inaccurate and uncertain phrases produced earlier in the STR pipeline. We present experiments on two new distinct datasets that contain scanned catalog images of interior designs and photographs of protesters with hand-written signs, respectively. Our results show that our STR model outperforms a baseline method that uses state-of-the-art single-word recognition techniques on both datasets. STR yields a high accuracy rate of 90% on the catalog images and 71% on the more difficult protest images, suggesting its generality in recognizing text.  more » « less
Award ID(s):
1838193
PAR ID:
10345897
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ArXivorg
ISSN:
2331-8422
Page Range / eLocation ID:
10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Named entity recognition (NER) is a fundamental task in the natural language processing (NLP) area. Recently, representation learning methods (e.g., character embedding and word embedding) have achieved promising recognition results. However, existing models only consider partial features derived from words or characters while failing to integrate semantic and syntactic information (e.g., capitalization, inter-word relations, keywords, lexical phrases, etc.) from multi-level perspectives. Intuitively, multi-level features can be helpful when recognizing named entities from complex sentences. In this study, we propose a novel framework called attention-based multi-level feature fusion (AMFF), which is used to capture the multi-level features from different perspectives to improve NER. Our model consists of four components to respectively capture the local character-level, global character-level, local word-level, and global word-level features, which are then fed into a BiLSTM-CRF network for the final sequence labeling. Extensive experimental results on four benchmark datasets show that our proposed model outperforms a set of state-of-the-art baselines. 
    more » « less
  2. The lack of adequate training data is one of the major hurdles in WiFi-based activity recognition systems. In this paper, we propose Wi-Fringe, which is a WiFi CSI-based devicefree human gesture recognition system that recognizes named gestures, i.e., activities and gestures that have a semantically meaningful name in English language, as opposed to arbitrary free-form gestures. Given a list of activities (only their names in English text), along with zero or more training examples (WiFi CSI values) per activity, Wi-Fringe is able to detect all activities at runtime. We show for the first time that by utilizing the state-of-the-art semantic representation of English words, which is learned from datasets like the Wikipedia (e.g., Google's word-to-vector [1]) and verb attributes learned from how a word is defined (e.g, American Heritage Dictionary), we can enhance the capability of WiFi-based named gesture recognition systems that lack adequate training examples per class. We propose a novel cross-domain knowledge transfer algorithm between radio frequency (RF) and text to lessen the burden on developers and end-users from the tedious task of data collection for all possible activities. To evaluate Wi-Fringe, we collect data from four volunteers in a multi-person apartment and an office building for a total of 20 activities. We empirically quantify the trade-off between the accuracy and the number of unseen activities. 
    more » « less
  3. null (Ed.)
    This paper studies the task of Relation Extraction (RE) that aims to identify the semantic relations between two entity mentions in text. In the deep learning models for RE, it has been beneficial to incorporate the syntactic structures from the dependency trees of the input sentences. In such models, the dependency trees are often used to directly structure the network architectures or to obtain the dependency relations between the word pairs to inject the syntactic information into the models via multi-task learning. The major problem with these approaches is the lack of generalization beyond the syntactic structures in the training data or the failure to capture the syntactic importance of the words for RE. In order to overcome these issues, we propose a novel deep learning model for RE that uses the dependency trees to extract the syntax-based importance scores for the words, serving as a tree representation to introduce syntactic information into the models with greater generalization. In particular, we leverage Ordered-Neuron Long-Short Term Memory Networks (ON-LSTM) to infer the model-based importance scores for RE for every word in the sentences that are then regulated to be consistent with the syntax-based scores to enable syntactic information injection. We perform extensive experiments to demonstrate the effectiveness of the proposed method, leading to the state-of-the-art performance on three RE benchmark datasets. 
    more » « less
  4. Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora. Since PLMs capture word semantics in different contexts, the quality of word representations highly depends on word frequency, which usually follows a heavy-tailed distributions in the pre-training corpus. Therefore, the embeddings of rare words on the tail are usually poorly optimized. In this work, we focus on enhancing language model pre-training by leveraging definitions of the rare words in dictionaries (e.g., Wiktionary). To incorporate a rare word definition as a part of input, we fetch its definition from the dictionary and append it to the end of the input text sequence. In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary. We evaluate the proposed Dict-BERT model on the language understanding benchmark GLUE and eight specialized domain benchmark datasets. Extensive experiments demonstrate that Dict-BERT can significantly improve the understanding of rare words and boost model performance on various NLP downstream tasks. 
    more » « less
  5. Taking an answer and its context as input, sequence-to-sequence models have made considerable progress on question generation. However, we observe that these approaches often generate wrong question words or keywords and copy answer-irrelevant words from the input. We believe that lacking global question semantics and exploiting answer position-awareness not well are the key root causes. In this paper, we propose a neural question generation model with two general modules: sentence-level semantic matching and answer position inferring. Further, we enhance the initial state of the decoder by leveraging the answer-aware gated fusion mechanism. Experimental results demonstrate that our model outperforms the state-of-the-art (SOTA) models on SQuAD and MARCO datasets. Owing to its generality, our work also improves the existing models significantly. 
    more » « less