wysiwyr-itm.github.io - WYSIWYR

Description: DESCRIPTION META TAG

keywords should be placed here (42)

Example domain paragraphs

Focusing on image-text alignment, we introduce SeeTRUE, a comprehensive benchmark, and two effective methods: a zero-shot VQA-based approach and a synthetically-trained, fine-tuned model, both enhancing alignment tasks and text-to-image reordering.

Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks. In this work, we study methods for automatic text-image alignment evaluation. We first introduce SeeTRUE: a comprehensive evaluation set, spanning multiple datasets from both text-to-image and image-to-text generation tasks, with human judgements for whether a given text-image pair is semantic

A comprehensive benchmark constructed using text-to-image (t2i) and image-to-text (i2t) models, LLMs, and NLI, including a mix of natural and synthetic images, captions, and prompts.

Links to wysiwyr-itm.github.io (1)