Skip to main content

Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation

Abstract Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of ’meaningful integration of information from different modalities’ remains modality and task dependent. The downstream task can range from understanding one modality in the presence of information from other modalities, to that of translating input from one modality to another. In this thesis the utility of multimodal representation learning for understanding one modality vis-à-vis Image Understanding for Visual Reasoning given corresponding informati... (more)
Created Date 2018
Contributor Saha, Rudra (Author) / Yang, Yezhou (Advisor) / Singh, Maneesh Kumar (Committee member) / Baral, Chitta (Committee member) / Arizona State University (Publisher)
Subject Artificial intelligence / Multimodal Learning / Text-to-Image Translation / Visual Reasoning
Type Masters Thesis
Extent 78 pages
Language English
Note Masters Thesis Computer Engineering 2018
Collaborating Institutions Graduate College / ASU Library
Additional Formats MODS / OAI Dublin Core / RIS

  Full Text
1.9 MB application/pdf
Download Count: 47

Description Dissertation/Thesis