skip to main content


Title: SSN: Soft Shadow Network for Image Compositing
We introduce an interactive Soft Shadow Network (SSN) to generates controllable soft shadows for image composit- ing. SSN takes a 2D object mask as input and thus is ag- nostic to image types such as painting and vector art. An environment light map is used to control the shadow’s char- acteristics, such as angle and softness. SSN employs an Ambient Occlusion Prediction module to predict an inter- mediate ambient occlusion map, which can be further re- fined by the user to provides geometric cues to modulate the shadow generation. To train our model, we design an efficient pipeline to produce diverse soft shadow training data using 3D object models. In addition, we propose an inverse shadow map representation to improve model train- ing. We demonstrate that our model produces realistic soft shadows in real-time. Our user studies show that the gen- erated shadows are often indistinguishable from shadows calculated by a physics-based renderer and users can eas- ily use SSN through an interactive application to generate specific shadow effects in minutes.  more » « less
Award ID(s):
1816514
NSF-PAR ID:
10286980
Author(s) / Creator(s):
Date Published:
Journal Name:
IEEE Conference on Computer Vision and Pattern Recognition
ISSN:
2163-6648
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper proposes a phase-to-depth deep learning model to repair shadow-induced errors for fringe projection profilometry (FPP). The model comprises two hourglass branches that extract information from texture images and phase maps and fuses the information from the two branches by concatenation and weights. The input of the proposed model contains texture images, masks, and unwrapped phase maps, and the ground truth is the depth map from CAD models. A loss function was chosen to consider image details and structural similarity. The training data contain 1200 samples in the verified virtual FPP system. After training, we conduct experiments on the virtual and real-world scanning data, and the results support the model’s effectiveness. The mean absolute error and the root mean squared error are 1.0279 mm and 1.1898 mm on the validation dataset. In addition, we analyze the influence of ambient light intensity on the model’s performance. Low ambient light limits the model’s performance as the model cannot extract valid information from the completely dark shadow regions in texture images. The contribution of each branch network is also investigated. Features from the texture-dominant branch are leveraged as guidance to remedy shadow-induced errors. Information from the phase-dominant branch network makes accurate predictions for the whole object. Our model provides a good reference for repairing shadow-induced errors in the FPP system. 
    more » « less
  2. Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training. 
    more » « less
  3. null (Ed.)
    We present a novel technique for transcribing crowds in video scenes that allows extracting the positions of moving objects in video frames. The technique can be used as a more precise alternative to image processing methods, such as background-removal or automated pedestrian detection based on feature extraction and classification. By manually projecting pedestrian actors on a two-dimensional plane and translating screen coordinates to absolute real-world positions using the cross ratio, we provide highly accurate and complete results at the cost of increased processing time. We are able to completely avoid most errors found in other automated annotation techniques, resulting from sources such as noise, occlusion, shadows, view angle or the density of pedestrians. It is further possible to process scenes that are difficult or impossible to transcribe by automated image processing methods, such as low-contrast or low-light environments. We validate our model by comparing it to the results of both background-removal and feature extraction and classification in a variety of scenes. 
    more » « less
  4. The image of a supermassive black hole surrounded by an optically-thin, radiatively-inefficient accretion flow, like that observed with the Event Horizon Telescope, is characterized by a bright ring of emission surrounding the black-hole shadow. In the Kerr spacetime this bright ring, when narrow, closely traces the boundary of the shadow and can, with appropriate calibration, serve as its proxy. The present paper expands the validity of this statement by considering two particular spacetime geometries: a solution to the field equations of a modified gravity theory and another that parametrically deviates from Kerr but recovers the Kerr spacetime when its deviation parameters vanish. A covariant, axisymmetric analytic model of the accretion flow based on conservation laws and spanning a broad range of plasma conditions is utilized to calculate synthetic non-Kerr black-hole images, which are then analysed and characterized. We find that in all spacetimes: (i) it is the gravitationally-lensed unstable photon orbit that plays the critical role in establishing the diameter of the rings observed in black-hole images, not the event horizon or the innermost stable circular orbit, (ii) bright rings in these images scale in size with, and encompass, the boundaries of the black-hole shadows, even when deviating significantly from Kerr, and (iii) uncertainties in the physical properties of the accreting plasma introduce subdominant corrections to the relation between the diameter of the image and the diameter of the black-hole shadow. These results provide theoretical justification for using black-hole images to probe and test the spacetimes of supermassive black holes. 
    more » « less
  5. In this paper, we present a technique for estimating the geometry and reflectance of objects using only a camera, flashlight, and optionally a tripod. We propose a simple data capture technique in which the user goes around the object, illuminating it with a flashlight and capturing only a few images. Our main technical contribution is the introduction of a recursive neural architecture, which can predict geometry and reflectance at 2 k ×2 k resolution given an input image at 2 k ×2 k and estimated geometry and reflectance from the previous step at 2 k−1 ×2 k−1 . This recursive architecture, termed RecNet, is trained with 256×256 resolution but can easily operate on 1024×1024 images during inference. We show that our method produces more accurate surface normal and albedo, especially in regions of specular highlights and cast shadows, compared to previous approaches, given three or fewer input images. 
    more » « less