referit3d.github.io - ReferIt3D

Example domain paragraphs

In this work we introduce the problem of using referential language to identify common objects in real-world 3D scenes. We focus on a challenging setup where the referred object belongs to a fine-grained object class and the underlying scene contains multiple object instances of that class. Due to the scarcity and unsuitability of existent 3D-oriented linguistic resources for this task, we first develop two large-scale and complementary visio-linguistic datasets: i) Sr3D, which contains 83.5K template-based

Because this enables the human (or neural) speakers to use minimal details to disambiguate the “target” object, fostering the production of efficient & fine-grained references. Put it simple, if you contrast an armchair to a target office-chair, you can trivially utter: “the office chair”. Furthemore, the inclusion of explicit bounding boxes that surround the contrasting objects helps our annotators focus on the task, especially since the ScanNet 3D reconstructions are far from noise-free.

Because similarly to the above, this forces the reference to go beyond fine-grained or simple-object classification! I.e., if the target is the only refrigerator of the scene, the reference: “the refrigerator” is good enough, no?

Links to referit3d.github.io (6)