Within and Cross-Corpus Speech Emotion Recognition Using Latent Topic Model-Based Features

Shah, Mohit; Chakrabarti, Chaitali; Spanias, Andreas

Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtained via a brute-force collection of statistics over frames, thereby losing important local information in…

Owing to the suprasegmental behavior of emotional speech, turn-level features have demonstrated a better success than frame-level features for recognition-related tasks. Conventionally, such features are obtained via a brute-force collection of statistics over frames, thereby losing important local information in the process which affects the performance. To overcome these limitations, a novel feature extraction approach using latent topic models (LTMs) is presented in this study. Speech is assumed to comprise of a mixture of emotion-specific topics, where the latter capture emotionally salient information from the co-occurrences of frame-level acoustic features and yield better descriptors. Specifically, a supervised replicated softmax model (sRSM), based on restricted Boltzmann machines and distributed representations, is proposed to learn naturally discriminative topics. The proposed features are evaluated for the recognition of categorical or continuous emotional attributes via within and cross-corpus experiments conducted over acted and spontaneous expressions. In a within-corpus scenario, sRSM outperforms competing LTMs, while obtaining a significant improvement of 16.75% over popular statistics-based turn-level features for valence-based classification, which is considered to be a difficult task using only speech. Further analyses with respect to the turn duration show that the improvement is even more significant, 35%, on longer turns (>6 s), which is highly desirable for current turn-based practices. In a cross-corpus scenario, two novel adaptation-based approaches, instance selection, and weight regularization are proposed to reduce the inherent bias due to varying annotation procedures and cultural perceptions across databases. Experimental results indicate a natural, yet less severe, deterioration in performance - only 2.6% and 2.7%, thereby highlighting the generalization ability of the proposed features.

Copyright Statement

Reuse Permissions

Downloads

pdf (1.8 MB)

Details

Title

Within and Cross-Corpus Speech Emotion Recognition Using Latent Topic Model-Based Features

Contributors

Shah, Mohit (Author)
Chakrabarti, Chaitali (Author)
Spanias, Andreas (Author)
Ira A. Fulton Schools of Engineering (Contributor)

Date Created

2015-01-25

Resource Type

Text

Collections this item is in

ASU Scholarship Showcase

Identifier

Digital object identifier: 10.1186/s13636-014-0049-y
Identifier Type

International standard serial number

Identifier Value

1687-4714
Identifier Type

International standard serial number

Identifier Value

1687-4722

View full metadata

Citation and reuse

Cite this item

This is a suggested citation. Consult the appropriate style guide for specific citation guidelines.

Shah, Mohit, Chakrabarti, Chaitali, & Spanias, Andreas (2015). Within and cross-corpus speech emotion recognition using latent topic model-based features. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015:4. http://dx.doi.org/10.1186/s13636-014-0049-y

Within and Cross-Corpus Speech Emotion Recognition Using Latent Topic Model-Based Features

Details

Citation and reuse

Cite this item

Machine-readable links