diffuse-to-choose.github.io - Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

Description: Virtual Try-All with image conditioned diffusion

diffusion (239) virtual try-on (12) diffuse to choose (2) virtual try-all (2)

Example domain paragraphs

As online shopping is growing, the ability for buyers to virtually visualize products in their settings—a phenomenon we define as "Virtual Try-All"—has become crucial. Recent diffusion models inherently contain a world model, rendering them suitable for this task within an inpainting context. However, traditional image-conditioned diffusion models often fail to capture the fine-grained details of products. In contrast, personalization-driven models such as DreamPaint are good at preserving the item's detail

We present Diffuse to Choose , a novel diffusion-based image-conditioned inpainting model that efficiently balances fast inference with the retention of high-fidelity details in a given reference item while ensuring accurate semantic manipulations in the given scene content. Our approach is based on incorporating fine-grained features from the reference image directly into the latent feature maps of the main diffusion model, alongside with a perceptual loss to further preserve the reference item's details.

We utilize a secondary U-Net Encoder to inject fine-grained details into the diffusion process. This begins with masking the source image and then inserting the reference image within the masked area. The resulting pixel-level 'hint' is subsequently adapted by a shallow CNN, aligning it with the VAE output dimensions of the source image, before element-wise added to it. Following this, a U-Net Encoder processes the adapted hint, where at each scale of the U-Net, a FILM module affinely aligns the skip-connec

Links to diffuse-to-choose.github.io (1)

quilt-llava.github.io Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos