mmmu-benchmark.github.io - MMMU

Description: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

ai (8712) artificial intelligence (3647) agi (117) large language model (26) lmm (22) artificial general intelligence (11) vision language model (6) lmm evaluation (3) large multimodal model (3) mmmu (1)

Example domain paragraphs

Overview of the MMMU dataset. MMMU presents four challenges: 1) comprehensiveness : 11.5K college-level problems across six broad disciplines and 30 college subjects; 2) highly heterogeneous image types; 3) interleaved text and images; 4) expert-level perception and reasoning rooted in deep subject knowledge.

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly heterogeneous image

--> Overview We introduce the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark, a novel benchmark meticulously curated to assess the expert-level multimodal understanding capability of foundation models across a broad scope of tasks. Covering subjects across disciplines, including Art, Business, Health & Medicine, Science, Humanities & Social Science, and Tech & Engineering, and over subfields. The detailed subject coverage and statistics are detailed in the figure. The quest

Links to mmmu-benchmark.github.io (11)