Parsing Digitized Vietnamese Paper Documents

Truong Dieu, Linh; Nguyen, Thuan Trong; Vo, Nguyen D.; Nguyen, Tam V.; Nguyen, Khang

Citation Details

In recent years, the need to exploit digitized document data has been increasing. In this paper, we address the problem of parsing digitized Vietnamese paper documents. The digitized Vietnamese documents are mainly in the form of scanned images with diverse layouts and special characters introducing many challenges. To this end, we first collect the UIT-DODV dataset, a novel Vietnamese document image dataset that includes scientific papers in Vietnamese derived from different scientific conferences. We compile both images that were converted from PDF and scanned by a smartphone in addition a physical scanner that poses many new challenges. Additionally, we further leverage the state-of-the-art object detector along with the fused loss function to efficiently parse the Vietnamese paper documents. Extensive experiments conducted on the UIT-DODV dataset provide a comprehensive evaluation and insightful analysis. more »

Award ID(s):: 2025234

PAR ID:: 10277210

Author(s) / Creator(s):: Truong Dieu, Linh; Nguyen, Thuan Trong; Vo, Nguyen D.; Nguyen, Tam V.; Nguyen, Khang

Date Published:: 2021-01-01

Journal Name:: International Conference on Computer Analysis of Images and Patterns

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
The DOI is not currently available.

More Like this