temptabqa.github.io - TempTabQA

Example domain paragraphs

In semi-structured data, such as Infobox tables, temporal information about entities is common but often challenging for current NLP systems to handle effectively. Our introduction of TempTabQA addresses this by presenting a rich dataset consisting of 11,454 question-answer pairs sourced from 1,208 diverse Wikipedia Infobox tables spanning over 90 distinct domains. The evaluation of leading models in this task reveals a significant gap, with even the best-performing LLMs trailing human performance by more t

TL;DR: TempTabQA, featuring 11,000+ Q&A pairs sourced from varied Wikipedia Infobox tables, evaluates NLP models' understanding of temporal data. Results indicate top models lag over 13.5 F1 points behind human performance, highlighting the potential to improve models' temporal reasoning TempTabQA dataset creation procedure We use Amazon Mechanical Turk ( mturk ) for data collection and validation. Annotators were presented with a tabular premise (infobox tables) and instructed to write three temporal quest

Below is an example from the TempTabQA dataset.

Links to temptabqa.github.io (1)