adapt-image-models.github.io - AIM: Adapting Image Models for Efficient Video Action Recognition

Description: AIM: Adapting Image Models for Efficient Video Action Recognition

Example domain paragraphs

Taojiannan Yang &#9824,&#9827 , Yi Zhu &#9824 , Yusheng Xie &#9824 , Aston Zhang &#9824 , Chen Chen &#9827 , Mu Li &#9824

&#9824 Amazon Web Services, &#9827 University of Central Florida

Multimodal models are sensitive to image/text perturbations (original image-text pairs are shown in blue boxes, perturbed ones are in red). Image captioning (Top): Adding image perturbations can result in incorrect captions, e.g., the tabby kitten is mistakenly described as a woman/dog. Text-to-image generation (bottom): Applying text perturbations can result in the generated images containing incomplete visual information, e.g., the tree is missing in the example above.

Links to adapt-image-models.github.io (3)