Capturing document images with hand-held devices in unstructured environments is a common practice nowadays. However, “casual” photos of documents are usually unsuitable for automatic information extraction, mainly due to physical distortion of the document paper, as well as various camera positions and illumination conditions. In this work, we propose DewarpNet, a deep-learning approach for document image unwarping from a single image. Our insight is that the 3D geometry of the document not only determines the warping of its texture but also causes the illumination effects. Therefore, our novelty resides on the explicit modeling of 3D shape for document paper in an end-to-end pipeline. Also, we contribute the largest and most comprehensive dataset for document image unwarping to date – Doc3D. This dataset features multiple ground-truth annotations, including 3D shape, surface normals, UV map, albedo image, etc. Training with Doc3D, we demonstrate state-of-the-art performance for DewarpNet with extensive qualitative and quantitative evaluations. Our network also significantly improves OCR performance on captured document images, decreasing character error rate by 42% on average. Both the code and the dataset are released.
more »
« less
Surface Snapping Optimization Layer for Single Image Object Shape Reconstruction
Reconstructing the 3D shape of objects observed in a single image is a challenging task. Recent approaches rely on visual cues extracted from a given image learned from a deep net. In this work, we leverage recent advances in monocular scene understanding to incorporate an additional geometric cue of surface normals. For this, we proposed a novel optimization layer that encourages the face normals of the reconstructed shape to be aligned with estimated surface normals. We develop a computationally efficient conjugate-gradient-based method that avoids the computation of a high-dimensional sparse matrix. We show this framework to achieve compelling shape reconstruction results on the challenging Pix3D and ShapeNet datasets.
more »
« less
- Award ID(s):
- 2106825
- PAR ID:
- 10441670
- Date Published:
- Journal Name:
- International Conference on Machine Learning
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Previous approaches on 3D shape segmentation mostly rely on heuristic processing and hand-tuned geometric descriptors. In this paper, we propose a novel 3D shape representation learning approach, Directionally Convolutional Network (DCN), to solve the shape segmentation problem. DCN extends convolution operations from images to the surface mesh of 3D shapes. With DCN, we learn effective shape representations from raw geometric features, i.e., face normals and distances, to achieve robust segmentation. More specifically, a two-stream segmentation framework is proposed: one stream is made up by the proposed DCN with the face normals as the input, and the other stream is implemented by a neural network with the face distance histogram as the input. The learned shape representations from the two streams are fused by an element-wise product. Finally, Conditional Random Field (CRF) is applied to optimize the segmentation. Through extensive experiments conducted on benchmark datasets, we demonstrate that our approach outperforms the current state-of-the-arts (both classic and deep learning-based) on a large variety of 3D shapes.more » « less
-
Surface reconstruction from points is a fundamental problem in computer graphics. While numerous methods have been proposed, it remains challenging to reconstruct from sparse and non-uniform point distributions, particularly when normals are absent. We present a robust and scalable method for reconstructing an implicit surface from points without normals. By exploring the locality of natural neighborhoods, we propose local reformulations of a previous global method, known for its ability to surface sparse points but high computational cost, thereby significantly improving its scalability while retaining its robustness. Experiments show that our method achieves comparable speed to existing reconstruction methods on large inputs while producing fewer artifacts in under-sampled regions.more » « less
-
Polarization imaging is highly sensitive to surface shape but is inherently ambiguous, as measurements depend only on the projected surface normal orientation. This shape-from-polarization algorithm introduces a method to recover unique surface normals from monocular Mueller images. We formulate the inverse problem as the estimation of the scattering geometry, enabling the extraction of unambiguous depth information from otherwise ambiguous normal data. Simulations show that while the initial ambiguous surface normal estimates are robust to noise, the subsequent depth recovery and disambiguation are more noise-sensitive. For simple object shapes, the method resolves ambiguities with mean angular errors below 10° at an SNR of 100. However, complex shapes require an SNR of 1,000 to achieve comparable accuracy. Notably, as the polarimetric capture system is simplified, the disambiguation performance approaches that of random selection for linear Stokes images.more » « less
-
Helmholtz stereopsis (HS) exploits the reciprocity principle of light propagation (i.e., the Helmholtz reciprocity) for 3D reconstruction of surfaces with arbitrary reflectance. In this paper, we present the polarimetric Helmholtz stereopsis (polar-HS), which extends the classical HS by considering the polarization state of light in the reciprocal paths. With the additional phase information from polar- ization, polar-HS requires only one reciprocal image pair. We formulate new reciprocity and diffuse/specular polari- metric constraints to recover surface depths and normals using an optimization framework. Using a hardware proto- type, we show that our approach produces high-quality 3D reconstruction for different types of surfaces, ranging from diffuse to highly specular.more » « less
An official website of the United States government

