gittables.github.io - Home | GitTables

Example domain paragraphs

Home Usage News Documentation GitTables: a large-scale corpus of relational tables. Figure 1: high-level overview of how GitTables is constructed.

dataset download | paper | github repository | video presentation

GitTables is a large-scale corpus of relational tables extracted from CSV files in GitHub, that facilitates learning table representation models and applications in e.g. data management, data analysis, etc. We keep expanding GitTables to at least 10M tables (ETA: early 2023).

Links to gittables.github.io (2)