mugen-org.github.io - MUGEN

Description: A Playground for Video-Audio-Text Multimodal Understanding and GENeration

machine learning (3390) deep learning (1121) computer vision (756) neural networks (217) caption (74) nocaps (4) open images (2)

Example domain paragraphs

An overview of MUGEN.

Multimodal video-audio-text understanding and generation can benefit from datasets that are narrow but rich. The narrowness allows bite-sized challenges that the research community can make progress on. The richness ensures we are making progress along the core challenges. To this end, we present a large-scale video-audio-text dataset MUGEN, collected using the open-sourced platform game CoinRun . We made substantial modifications to make the game richer by introducing audio and enabling new interactions. W

Links to mugen-org.github.io (2)