Skip to main content

Video2Vec: Learning Semantic Spatio-Temporal Embedding for Video Representations

Abstract High-level inference tasks in video applications such as recognition, video retrieval, and zero-shot classification have become an active research area in recent years. One fundamental requirement for such applications is to extract high-quality features that maintain high-level information in the videos.

Many video feature extraction algorithms have been purposed, such as STIP, HOG3D, and Dense Trajectories. These algorithms are often referred to as “handcrafted” features as they were deliberately designed based on some reasonable considerations. However, these algorithms may fail when dealing with high-level tasks or complex scene videos. Due to the success of using deep convolution neural networks (CNNs) to extract global representatio... (more)
Created Date 2016
Contributor Hu, Sheng-Hung (Author) / Li, Baoxin (Advisor) / Turaga, Pavan (Committee member) / Liang, Jianming (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)
Subject Computer engineering / Action Recognition / Deep Learning / Semantic Embedding / Semantic Retrieval / Video Representation / Zero-shot Learning
Type Masters Thesis
Extent 62 pages
Language English
Reuse Permissions All Rights Reserved
Note Masters Thesis Computer Science 2016
Collaborating Institutions Graduate College / ASU Library
Additional Formats MODS / OAI Dublin Core / RIS

  Full Text
9.1 MB application/pdf
Download Count: 2341

Description Dissertation/Thesis