1 The Chinese University of Hong Kong 2 Massachusetts Institute of Technology 3 Centre for Artificial Intelligence and Robotics of Hong Kong 4 Shanghai Artificial Intelligence Laboratory
Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for machine learning. Despite progress in generating 2D medical images, the complex domain of clinical video generation has largely remained untapped. This paper introduces Endora , an innovative approach to generate medical videos to simulate clinical endoscopy scenes. We present a novel generative model design that integrates a meticulously crafted spatial-temporal video transformer with adv
We train a Gaussian Splatting representationon on the sampled videos by Endora and observe the multi-view consistent geometry ( shown by rendered RGB and depth maps ) as if in the real 3D world.