skip to main content


Title: Tailor Me: An Editing Network for Fashion Attribute Shape Manipulation
Fashion attribute editing aims to manipulate fashion im- ages based on a user-specified attribute, while preserving the details of the original image as intact as possible. Re- cent works in this domain have mainly focused on direct manipulation of the raw RGB pixels, which only allows to perform edits involving relatively small shape changes (e.g., sleeves). The goal of our Virtual Personal Tailoring Network (VPTNet) is to extend the editing capabilities to much larger shape changes of fashion items, such as cloth length. To achieve this goal, we decouple the fashion at- tribute editing task into two conditional stages: shape-then- appearance editing. To this aim, we propose a shape editing network that employs a semantic parsing of the fashion im- age as an interface for manipulation. Compared to operat- ing on the raw RGB image, our parsing map editing enables performing more complex shape editing operations. Sec- ond, we introduce an appearance completion network that takes the previous stage results and completes the shape dif- ference regions to produce the final RGB image. Qualitative and quantitative experiments on the DeepFashion-Synthesis dataset confirm that VPTNet outperforms state-of-the-art methods for both small and large shape attribute editing.  more » « less
Award ID(s):
1840131
NSF-PAR ID:
10390737
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022
Page Range / eLocation ID:
3831-3840
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training. 
    more » « less
  2. In computer vision, tracking humans across camera views remains challenging, especially for complex scenarios with frequent occlusions, significant lighting changes and other difficulties. Under such conditions, most existing appearance and geometric cues are not reliable enough to distinguish humans across camera views. To address these challenges, this paper presents a stochastic attribute grammar model for leveraging complementary and discriminative human attributes for enhancing cross-view tracking. The key idea of our method is to introduce a hierarchical representation, parse graph, to describe a subject and its movement trajectory in both space and time domains. This results in a hierarchical compositional representation, comprising trajectory entities of varying level, including human boxes, 3D human boxes, tracklets and trajectories. We use a set of grammar rules to decompose a graph node (e.g. tracklet) into a set of children nodes (e.g. 3D human boxes), and augment each node with a set of attributes, including geometry (e.g., moving speed, direction), accessories (e.g., bags), and/or activities (e.g., walking, running). These attributes serve as valuable cues, in addition to appearance features (e.g., colors), in determining the associations of human detection boxes across cameras. In particular, the attributes of a parent node are inherited by its children nodes, resulting in consistency constraints over the feasible parse graph. Thus, we cast cross-view human tracking as finding the most discriminative parse graph for each subject in videos. We develop a learning method to train this attribute grammar model from weakly supervised training data. To infer the optimal parse graph and its attributes, we develop an alternative parsing method that employs both top-down and bottom-up computations to search the optimal solution. We also explicitly reason the occlusion status of each entity in order to deal with significant changes of camera viewpoints. We evaluate the proposed method over public video benchmarks and demonstrate with extensive experiments that our method clearly outperforms state-of-theart tracking methods. 
    more » « less
  3. Raja, Gulistan (Ed.)
    Our computational developments and analyses on experimental images are designed to evaluate the effectiveness of chemical spraying via unmanned aerial vehicle (UAV). Our evaluations are in accord with the two perspectives of color-complexity: color variety within a color system and color distributional geometry on an image. First, by working within RGB and HSV color systems, we develop a new color-identification algorithm relying on highly associative relations among three color-coordinates to lead us to exhaustively identify all targeted color-pixels. A color-dot is then identified as one isolated network of connected color-pixel. All identified color-dots vary in shapes and sizes within each image. Such a pixel-based computing algorithm is shown robustly and efficiently accommodating heterogeneity due to shaded regions and lighting conditions. Secondly, all color-dots with varying sizes are categorized into three categories. Since the number of small color-dot is rather large, we spatially divide the entire image into a 2D lattice of rectangular. As such, each rectangle becomes a collective of color-dots of various sizes and is classified with respect to its color-dots intensity. We progressively construct a series of minimum spanning trees (MST) as multiscale 2D distributional spatial geometries in a decreasing-intensity fashion. We extract the distributions of distances among connected rectangle-nodes in the observed MST and simulated MSTs generated under the spatial uniformness assumption. We devise a new algorithm for testing 2D spatial uniformness based on a Hierarchical clustering tree upon all involving MSTs. This new tree-based p -value evaluation has the capacity to become exact. 
    more » « less
  4. null (Ed.)
    Heliconius butterflies have bright patterns on their wings that tell potential predators that they are toxic. As a result, predators learn to avoid eating them. Over time, unrelated species of butterflies have evolved similar patterns to avoid predation through a process known as Müllerian mimicry. Worldwide, there are over 180,000 species of butterflies and moths, most of which have different wing patterns. How do genes create this pattern diversity? And do butterflies use similar genes to create similar wing patterns? One of the genes involved in creating wing patterns is called cortex . This gene has a large region of DNA around it that does not code for proteins, but instead, controls whether cortex is on or off in different parts of the wing. Changes in this non-coding region can act like switches, turning regions of the wing into different colours and creating complex patterns, but it is unclear how these switches have evolved. Butterfly wings get their colour from tiny structures called scales, which each have their own unique set of pigments. In Heliconius butterflies, there are three types of scales: yellow/white scales, black scales, and red/orange/brown scales. Livraghi et al. used a DNA editing technique called CRISPR to find out whether the cortex gene affects scale type. First, Livraghi et al. confirmed that deleting cortex turned black and red scales yellow. Next, they used the same technique to manipulate the non-coding DNA around the cortex gene to see the effect on the wing pattern. This manipulation turned a black-winged butterfly into a butterfly with a yellow wing band, a pattern that occurs naturally in Heliconius butterflies. The next step was to find the mutation responsible for the appearance of yellow wing bands in nature. It turns out that a bit of extra genetic code, derived from so-called ‘jumping genes’, had inserted itself into the non-coding DNA around the cortex gene, ‘flipping’ the switch and leading to the appearance of the yellow scales. Genetic information contains the instructions to generate shape and form in most organisms. These instructions evolve over millions of years, creating everything from bacteria to blue whales. Butterfly wings are visual evidence of evolution, but the way their genes create new patterns isn't specific to butterflies. Understanding wing patterns can help researchers to learn how genetic switches control diversity across other species too. 
    more » « less
  5. Abstract

    Efficient rendering of photo‐realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo‐realistic images from hand‐crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo‐realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state‐of‐the‐art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photorealistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. Specifically, our emphasis is on the type of control, i.e., how the control is provided, which parts of the pipeline are learned, explicit vs. implicit control, generalization, and stochastic vs. deterministic synthesis. The second half of this state‐of‐the‐art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free‐viewpoint video, and the creation of photo‐realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems.

     
    more » « less