react-vl.github.io - Learning Customized Visual Models with Retrieval-Augmented Knowledge

Description: Learning Customized Visual Models with Retrieval-Augmented Knowledge

customization (324) retrieval (37) vision language pretraining (1) external knowledge (1)

Example domain paragraphs

R EACT customizes foundation models to downstream tasks without the need of any labeled data.

Image-text contrastive learning models such as CLIP and OpenCLIP have demonstrated strong task transfer ability. The high generality and usability of these visual models is achieved via a web-scale data collection process to ensure broad concept coverage, followed by expensive pre-training to feed all the knowledge into model weights. Alternatively, we propose R EACT , RE trieval- A ugmented C us T omization, a framework to acquire the relevant web knowledge to build customized visual models for target doma

Given a downstream visual task, REACT considers a retrieval-then-customization procedure: Retrieval . Task instruction is augmented with free knowledge from the web (eg, LAION ), without any downstream labelled data Customization . A lightweight training process to build customized models from a foundation model

Links to react-vl.github.io (2)