lazydiffusion.github.io - Lazy Diffusion Transformer for Interactive Image Editing

Description: Lazy Diffusion Transformer for Interactive Image Editing

transformers (440) diffusion models (23) image inpainting (2)

Example domain paragraphs

We introduce a novel diffusion transformer, LazyDiffusion, that generates partial image updates efficiently. Our approach targets interactive image editing applications in which, starting from a blank canvas or an image, a user specifies a sequence of localized image modifications using binary masks and text prompts. Our generator operates in two phases. First, a context encoder processes the current canvas and user mask to produce a compact global context tailored to the region to generate. Second, conditi

We compare LazyDiffusion to the two existing inpainting approaches -- regenerating a smaller crop or the entire image. All methods are using a PixArt-based architecture. LazyDiffusion is consistently faster than a regenerating the entire image, especially for small mask ratios typical to interactive edits, reaching a speedup of 10x. Similarly, LazyDiffusion is faster than regenerating a crop when the mask is smaller than that. For masks greater than that (dashed), regenerating the crop is technically faster

Each panel illustrates a generative progression compared to the preceding state of the canvas to its left.