newgeneralization.github.io - Workshop on New Forms of Generalization in Deep Learning and Natural Language Processing

Example domain paragraphs

TL;DR: We build models that work well on our datasets but when we play with them we are surprised that they are brittle and break. Let’s analyze their failings propose new evaluations & models.

Deep learning has brought a wealth of state-of-the-art results and new capabilities. Although methods have achieved near human-level performance on many benchmarks, numerous recent studies imply that these benchmarks only weakly test their intended purpose, and that simple examples produced either by human or machine, cause systems to fail spectacularly [1] [2] [3] [4] [5] [6] [7] . For example, a recently released textual entailment demo was criticized on social media for predicting: “John killed Mary” → “

This workshop provides a venue for exploring new approaches for measuring and enforcing generalization in models. We are soliciting work in the following areas: