skip to main content


Title: Tailor Me: An Editing Network for Fashion Attribute Shape Manipulation
Fashion attribute editing aims to manipulate fashion im- ages based on a user-specified attribute, while preserving the details of the original image as intact as possible. Re- cent works in this domain have mainly focused on direct manipulation of the raw RGB pixels, which only allows to perform edits involving relatively small shape changes (e.g., sleeves). The goal of our Virtual Personal Tailoring Network (VPTNet) is to extend the editing capabilities to much larger shape changes of fashion items, such as cloth length. To achieve this goal, we decouple the fashion at- tribute editing task into two conditional stages: shape-then- appearance editing. To this aim, we propose a shape editing network that employs a semantic parsing of the fashion im- age as an interface for manipulation. Compared to operat- ing on the raw RGB image, our parsing map editing enables performing more complex shape editing operations. Sec- ond, we introduce an appearance completion network that takes the previous stage results and completes the shape dif- ference regions to produce the final RGB image. Qualitative and quantitative experiments on the DeepFashion-Synthesis dataset confirm that VPTNet outperforms state-of-the-art methods for both small and large shape attribute editing.  more » « less
Award ID(s):
1840131
PAR ID:
10390737
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022
Page Range / eLocation ID:
3831-3840
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Shape servoing, a robotic task dedicated to controlling objects to desired goal shapes, is a promising approach to deformable object manipulation. An issue arises, however, with the reliance on the specification of a goal shape. This goal has been obtained either by a laborious domain knowledge engineering process or by manually manipulating the object into the desired shape and capturing the goal shape at that specific moment, both of which are impractical in various robotic applications. In this paper, we solve this problem by developing a novel neural network DefGoalNet, which learns deformable object goal shapes directly from a small number of human demonstrations. We demonstrate our method’s effectiveness on various robotic tasks, both in simulation and on a physical robot. Notably, in the surgical retraction task, even when trained with as few as 10 demonstrations, our method achieves a median success percentage of nearly 90%. These results mark a substantial advancement in enabling shape servoing methods to bring deformable object manipulation closer to practical real-world applications. 
    more » « less
  2. Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training. 
    more » « less
  3. We present a polarization-based approach to perform diffuse-specular separation from a single polarimetric image, acquired using a flexible, practical capture setup. Our key technical insight is that, unlike previous polarization-based separation methods that assume completely unpolarized diffuse reflectance, we use a more general polarimetric model that accounts for partially polarized diffuse reflections. We capture the scene with a polarimetric sensor and produce an initial analytical diffuse-specular separation that we further pass into a deep network trained to refine the separation. We demonstrate that our combination of analytical separation and deep network refinement produces state-of-the-art diffuse-specular separation, which enables image-based appearance editing of dynamic scenes and enhanced appearance estimation.

     
    more » « less
  4. Deep neural networks excel at finding hierarchical representations that solve complex tasks over large datasets. How can we humans understand these learned representations? In this work, we present network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. We find evidence that the network has learned many object classes that play crucial roles in classifying scene classes. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes. By analyzing changes made when small sets of units are activated or deactivated, we find that objects can be added and removed from the output scenes while adapting to the context. Finally, we apply our analytic framework to understanding adversarial attacks and to semantic image editing.

     
    more » « less
  5. Bayer pattern is a widely used Color Filter Array (CFA) for digital image sensors, efficiently capturing different light wavelengths on different pixels without the need for a costly ISP pipeline. The resulting single-channel raw Bayer images offer benefits such as spectral wavelength sensitivity and low time latency. However, object detection based on Bayer images has been underexplored due to challenges in human observation and algorithm design caused by the discontinuous color channels in adjacent pixels. To address this issue, we propose the BayerDetect network, an end-to-end deep object detection framework that aims to achieve fast, accurate, and memory-efficient object detection. Unlike RGB color images, where each pixel encodes spectral context from adjacent pixels during ISP color interpolation, raw Bayer images lack spectral context. To enhance the spectral context, the BayerDetect network introduces a spectral frequency attention block, transforming the raw Bayer image pattern to the frequency domain. In object detection, clear object boundaries are essential for accurate bounding box predictions. To handle the challenges posed by alternating spectral channels and mitigate the influence of discontinuous boundaries, the BayerDetect network incorporates a spatial attention scheme that utilizes deformable convolutional kernels in multiple scales to explore spatial context effectively. The extracted convolutional features are then passed through a sparse set of proposal boxes for detection and classification. We conducted experiments on both public and self-collected raw Bayer images, and the results demonstrate the superb performance of the BayerDetect network in object detection tasks. 
    more » « less