tifa-benchmark.github.io - TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Description: TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

evaluation (857) gpt-3 (67) text-to-image (26) vqa (11)

Example domain paragraphs

Experiments show that TIFA is much more accurate than CLIP in evaluating generated images, while being fine-grained and interpretable . It is an ideal choice for fine-grained automatic evaluation of image generation.

TIFA works better because it leverages LLMs to decompose the text input into fine-grained probes (questions), which allows VQA to capture more nuanced aspects of the text input and the generated image. Meanwhile, CLIP summarizes the image as a embedding , making it inaccurate and unable to capture fine-grained details of an image.

Do I need OpenAI API to run TIFA? No, you don't. We have pre-generated the questions for you in TIFA v1.0 benchmark. Meanwhile, we provide tools to generate your own questions with GPT-3.5.

Links to tifa-benchmark.github.io (4)