Mining semantics from low-level features in multimedia computing

Wang, Zhesheng

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a…

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or low-level data entities. Also, additional domain knowledge may often be indispensable for uncovering the underlying semantics, but in most cases such domain knowledge is not readily available from the acquired media streams. Thus, making use of various types of contextual information and leveraging corresponding domain knowledge are vital for effectively associating high-level semantics with low-level signals with higher accuracies in multimedia computing problems. In this work, novel computational methods are explored and developed for incorporating contextual information/domain knowledge in different forms for multimedia computing and pattern recognition problems. Specifically, a novel Bayesian approach with statistical-sampling-based inference is proposed for incorporating a special type of domain knowledge, spatial prior for the underlying shapes; cross-modality correlations via Kernel Canonical Correlation Analysis is explored and the learnt space is then used for associating multimedia contents in different forms; model contextual information as a graph is leveraged for regulating interactions among high-level semantic concepts (e.g., category labels), low-level input signal (e.g., spatial/temporal structure). Four real-world applications, including visual-to-tactile face conversion, photo tag recommendation, wild web video classification and unconstrained consumer video summarization, are selected to demonstrate the effectiveness of the approaches. These applications range from classic research challenges to emerging tasks in multimedia computing. Results from experiments on large-scale real-world data with comparisons to other state-of-the-art methods and subjective evaluations with end users confirmed that the developed approaches exhibit salient advantages, suggesting that they are promising for leveraging contextual information/domain knowledge for a wide range of multimedia computing and pattern recognition problems.

Copyright Statement

Reuse Permissions

Downloads

pdf (4.3 MB)

Details

Title

Mining semantics from low-level features in multimedia computing

Contributors

Wang, Zhesheng (Author)
Li, Baoxin (Thesis advisor)
Sundaram, Hari (Committee member)
Qian, Gang (Committee member)
Ye, Jieping (Committee member)
Arizona State University (Publisher)

Date Created

2011

Subjects

Resource Type

Text

Collections this item is in

ASU Electronic Theses and Dissertations

Note

Partial requirement for: Ph.D., Arizona State University, 2011

Note type

thesis
Includes bibliographical references (p. 113-122)

Note type

bibliography
Field of study: Computer science

Mining semantics from low-level features in multimedia computing

Details

Citation and reuse

Statement of Responsibility

Machine-readable links