conda-workshop.github.io - CONDA 2024 | The 1st Workshop on Data Contamination

Description: Evaluation data has been compromised! A workshop on detecting, preventing, and addressing data contamination.

Example domain paragraphs

Workshop@ ACL 2024

Evaluation data has been compromised! A workshop on detecting, preventing, and addressing data contamination.

Data contamination, where evaluation data is inadvertently included in pre-training corpora of large scale models, and language models (LMs) in particular, has become a concern in recent times ( Sainz et al. 2023 ; Jacovi et al. 2023 ). The growing scale of both models and data, coupled with massive web crawling, has led to the inclusion of segments from evaluation benchmarks in the pre-training data of LMs ( Dodge et al., 2021 ; OpenAI, 2023 ; Google, 2023 ; Elazar et al., 2023 ). The scale of internet dat

Links to conda-workshop.github.io (2)

yanaiela.github.io Yanai Elazar
eagirre.github.io Eneko Agirre