quilt1m.github.io - QUILT-1M

Example domain paragraphs

To address the need for a large-scale vision-language dataset in histopathology, we introduce Quilt : containing 419,780 images aligned with 768,826 text pairs. We draw on the insight that publicly available educational YouTube histopathology content represents an untapped potential. We curate Quilt using 1,087 hours of valuable educational histopathology videos from expert pathologists on YouTube. To extract aligned image and text pairs from the videos, we utilize a mixture of models: large language models

We collected Quilt , from 4504 narrative videos spanning over 1087 hours with over 438K unique images with 768K associated text pairs. The mean length of the text captions is 22.76 words, and 8.68 words for ROI text, with an average of 1.74 medical sentences per image (max=5.33, min=1.0). Our dataset spans a total of 1.469M UMLS entities from those mentioned in the text (with 28.5K unique). The images span varying microscopic magnification scales (0-10x, 10-20x, 20-40x), obtaining (280K, 75K, 107K) images f

Following are the collection of sample images from our dataset, accompanied by corresponding medical text, ROI text, and the top three sub-pathology classifications:

Links to quilt1m.github.io (2)

mehmetsayginseyfioglu.github.io Mehmet Saygin Seyfioglu
quilt-llava.github.io Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos