Capturing document images with hand-held devices in unstructured environments is a common practice nowadays. However, “casual” photos of documents are usually unsuitable for automatic information extraction, mainly due to physical distortion of the document paper, as well as various camera positions and illumination conditions. In this work, we propose DewarpNet, a deep-learning approach for document image unwarping from a single image. Our insight is that the 3D geometry of the document not only determines the warping of its texture but also causes the illumination effects. Therefore, our novelty resides on the explicit modeling of 3D shape for document paper in an end-to-end pipeline. Also, we contribute the largest and most comprehensive dataset for document image unwarping to date – Doc3D. This dataset features multiple ground-truth annotations, including 3D shape, surface normals, UV map, albedo image, etc. Training with Doc3D, we demonstrate state-of-the-art performance for DewarpNet with extensive qualitative and quantitative evaluations. Our network also significantly improves OCR performance on captured document images, decreasing character error rate by 42% on average. Both the code and the dataset are released.
Topology-Aware Single-Image 3D Shape Reconstruction
We make an attempt to address topology-awareness for 3D shape reconstruction. Two types of high-level shape typologies are being studied here, namely genus (number of cuttings/holes) and connectivity (number of connected components), which are of great importance in 3D object reconstruction/understanding but have been thus far disjoint from the existing dense voxel-wise prediction literature. We propose a topology-aware shape autoencoder component (TPWCoder) by approximating topology property functions such as genus and connectivity with neural networks from the latent variables. TPWCoder can be directly combined with the existing 3D shape reconstruction pipelines for end-to-end training and prediction. On the challenging A Big CAD Model Dataset (ABC), TPWCoder demonstrates a noticeable quantitative and qualitative improvement over the competing methods, and it also shows improved quantitative result on the ShapeNet dataset.
- Publication Date:
- NSF-PAR ID:
- 10166846
- Journal Name:
- IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops
- ISSN:
- 2160-7516
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We investigate the problem of learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views. Previous work on learning shape reconstruction from multiple views uses discrete representations such as point clouds or voxels, while continuous surface generation approaches lack multi-view consistency. We address these issues by designing neural networks capable of generating high-quality parametric 3D surfaces which are also consistent between views. Furthermore, the generated 3D surfaces preserve accurate image pixel to 3D surface point correspondences, allowing us to lift texture information to reconstruct shapes with rich geometry and appearance. Our method is supervised and trained on a public dataset of shapes from common object categories. Quantitative results indicate that our method significantly outperforms previous work, while qualitative results demonstrate the high quality of our reconstructions.
-
Dynamic network topology can pose important challenges to communication and control protocols in networks of autonomous vehicles. For instance, maintaining connectivity is a key challenge in unmanned aerial vehicle (UAV) networks. However, tracking and computational resources of the observer module might not be sufficient for constant monitoring of all surrounding nodes in large-scale networks. In this paper, we propose an optimal measurement policy for network topology monitoring under constrained resources. To this end, We formulate the localization of multiple objects in terms of linear networked systems and solve it using Kalman filtering with intermittent observation. The proposed policy includes two sequential steps. We first find optimal measurement attempt probabilities for each target using numerical optimization methods to assign the limited number of resources among targets. The optimal resource allocation follows a waterfall-like solution to assign more resources to targets with lower measurement success probability. This provides a 10% to 60% gain in prediction accuracy. The second step is finding optimal on-off patterns for measurement attempts for each target over time. We show that a regular measurement pattern that evenly distributed resources over time outperforms the two extreme cases of using all measurement resources either in the beginning or at themore »
-
We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings.
-
Abstract
Abstract from the article associated with the dataset: George, Mullinix, et al PeerJ 2021. Reef-building corals are ecosystem engineers that compete with other benthic or- ganisms for space and resources. Corals harvest energy through their surface by photosynthesis and heterotrophic feeding, and they divert part of this energy to defend their outer colony perimeter against competitors. Here, we hypothesized that corals with a larger space-filling surface and smaller perimeters increase energy gain while reducing the exposure to competitors. This predicted an association between these two geometric properties of corals and the competitive outcome against other benthic organisms. To test the prediction, fifty coral colonies from the Caribbean island of Curac ̧ao were rendered using digital 3D and 2D reconstructions. The surface areas, perimeters, box-counting dimensions (as a proxy of space-filling property), and other geometric properties were extracted and analyzed with respect to the percentage of the perimeter losing or winning against competitors based on the coral tissue apparent growth or damage. The increase in surface space-filling dimension was the only significant single indicator of coral winning outcomes, but the combination of surface space-filling dimension with perimeter length increased the statistical prediction of coral competition outcomes. Corals with larger surface space-filling