cmu-mmml.github.io - 11-777 MMML

Description: 11-777 - Multimodal Machine Learning - Carnegie Mellon University

Example domain paragraphs

Multimodal machine learning (MMML) Multimodal machine learning (MMML) is a vibrant multi-disciplinary research field which addresses some of the original goals of artificial intelligence by integrating and modeling multiple communicative modalities, including linguistic, acoustic, and visual messages. With the initial research on audio-visual speech recognition and more recently with language & vision projects such as image and video captioning, this research field brings some unique challenges for multimod

The course will present the fundamental mathematical concepts in machine learning and deep learning relevant to the six main challenges in multimodal machine learning: (1) representation, (2) alignment, (3) reasoning, (4) generation, (5) transference and (6) quantification. These include, but not limited to, multimodal transformers, neuro-symbolic models, multimodal tensor fusion, mutual information and multimodal graph networks. The course will also discuss many of the recent applications of MMML including

This course is offered every semester:

Links to cmu-mmml.github.io (2)