Skip to main content

ASU Electronic Theses and Dissertations


This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.


Date Range
2013 2019


Much evidence has shown that first language (L1) plays an important role in the formation of L2 phonological system during second language (L2) learning process. This combines with the fact that different L1s have distinct phonological patterns to indicate the diverse L2 speech learning outcomes for speakers from different L1 backgrounds. This dissertation hypothesizes that phonological distances between accented speech and speakers' L1 speech are also correlated with perceived accentedness, and the correlations are negative for some phonological properties. Moreover, contrastive phonological distinctions between L1s and L2 will manifest themselves in the accented speech produced by speaker from these L1s. …

Contributors
Tu, Ming, Berisha, Visar, Liss, Julie M, et al.
Created Date
2018

Speech intelligibility measures how much a speaker can be understood by a listener. Traditional measures of intelligibility, such as word accuracy, are not sufficient to reveal the reasons of intelligibility degradation. This dissertation investigates the underlying sources of intelligibility degradations from both perspectives of the speaker and the listener. Segmental phoneme errors and suprasegmental lexical boundary errors are developed to reveal the perceptual strategies of the listener. A comprehensive set of automated acoustic measures are developed to quantify variations in the acoustic signal from three perceptual aspects, including articulation, prosody, and vocal quality. The developed measures have been validated on …

Contributors
Jiao, Yishan, Berisha, Visar, Liss, Julie, et al.
Created Date
2019

Dementia is a syndrome resulting from an acquired brain disease that affects many domains of cognitive impairment. The progressive disorder generally affects memory, attention, executive functions, communication, and other cognitive domains that significantly alter everyday function (Quinn, 2014). The purpose of this research was to gather a systematic review of cognitive-communication assessments and screeners used in assessing dementia to assist in early prognosis. From this review, there is potential in developing a new test to address the areas that people with dementia often have deficits in 1) Memory, 2) Attention, 3) Executive Functions, 4) Language, and 5) Visuospatial Skills. In …

Contributors
Miller, Marissa, Liss, Julie M, Berisha, Visar, et al.
Created Date
2019

Motion estimation is a core task in computer vision and many applications utilize optical flow methods as fundamental tools to analyze motion in images and videos. Optical flow is the apparent motion of objects in image sequences that results from relative motion between the objects and the imaging perspective. Today, optical flow fields are utilized to solve problems in various areas such as object detection and tracking, interpolation, visual odometry, etc. In this dissertation, three problems from different areas of computer vision and the solutions that make use of modified optical flow methods are explained. The contributions of this dissertation …

Contributors
Kanberoglu, Berkay, Frakes, David, Turaga, Pavan, et al.
Created Date
2018

The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been …

Contributors
Kalyanasundaram, Girish, Spanias, Andreas S, Tepedelenlioglu, Cihan, et al.
Created Date
2013

Everyday speech communication typically takes place face-to-face. Accordingly, the task of perceiving speech is a multisensory phenomenon involving both auditory and visual information. The current investigation examines how visual information influences recognition of dysarthric speech. It also explores where the influence of visual information is dependent upon age. Forty adults participated in the study that measured intelligibility (percent words correct) of dysarthric speech in auditory versus audiovisual conditions. Participants were then separated into two groups: older adults (age range 47 to 68) and young adults (age range 19 to 36) to examine the influence of age. Findings revealed that all …

Contributors
Fall, Elizabeth, Liss, Julie, Berisha, Visar, et al.
Created Date
2014

The present study describes audiovisual sentence recognition in normal hearing listeners, bimodal cochlear implant (CI) listeners and bilateral CI listeners. This study explores a new set of sentences (the AzAV sentences) that were created to have equal auditory intelligibility and equal gain from visual information. The aims of Experiment I were to (i) compare the lip reading difficulty of the AzAV sentences to that of other sentence materials, (ii) compare the speech-reading ability of CI listeners to that of normal-hearing listeners and (iii) assess the gain in speech understanding when listeners have both auditory and visual information from easy-to-lip-read and …

Contributors
Wang, Shuai, Dorman, Michael, Berisha, Visar, et al.
Created Date
2015

Many neurological disorders, especially those that result in dementia, impact speech and language production. A number of studies have shown that there exist subtle changes in linguistic complexity in these individuals that precede disease onset. However, these studies are conducted on controlled speech samples from a specific task. This thesis explores the possibility of using natural language processing in order to detect declining linguistic complexity from more natural discourse. We use existing data from public figures suspected (or at risk) of suffering from cognitive-linguistic decline, downloaded from the Internet, to detect changes in linguistic complexity. In particular, we focus on …

Contributors
Wang, Shuai, Berisha, Visar, LaCross, Amy, et al.
Created Date
2016

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging. The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions …

Contributors
Shah, Mohit, Spanias, Andreas, Chakrabarti, Chaitali, et al.
Created Date
2015

Modern machine learning systems leverage data and features from multiple modalities to gain more predictive power. In most scenarios, the modalities are vastly different and the acquired data are heterogeneous in nature. Consequently, building highly effective fusion algorithms is at the core to achieve improved model robustness and inferencing performance. This dissertation focuses on the representation learning approaches as the fusion strategy. Specifically, the objective is to learn the shared latent representation which jointly exploit the structural information encoded in all modalities, such that a straightforward learning model can be adopted to obtain the prediction. We first consider sensor fusion, …

Contributors
Song, Huan, Spanias, Andreas, Thiagarajan, Jayaraman, et al.
Created Date
2018

The purpose of this study was to identify acoustic markers that correlate with accurate and inaccurate /r/ production in children ages 5-8 using signal processing. In addition, the researcher aimed to identify predictive acoustic markers that relate to changes in /r/ accuracy. A total of 35 children (23 accurate, 12 inaccurate, 8 longitudinal) were recorded. Computerized stimuli were presented on a PC laptop computer and the children were asked to do five tasks to elicit spontaneous and imitated /r/ production in all positions. Files were edited and analyzed using a filter bank approach centered at 40 frequencies based on the …

Contributors
Becvar, Brittany Patricia, Azuma, Tamiko, Weinhold, Juliet, et al.
Created Date
2017

Through decades of clinical progress, cochlear implants have brought the world of speech and language to thousands of profoundly deaf patients. However, the technology has many possible areas for improvement, including providing information of non-linguistic cues, also called indexical properties of speech. The field of sensory substitution, providing information relating one sense to another, offers a potential avenue to further assist those with cochlear implants, in addition to the promise they hold for those without existing aids. A user study with a vibrotactile device is evaluated to exhibit the effectiveness of this approach in an auditory gender discrimination task. Additionally, …

Contributors
Butts, Austin McRae, Helms Tillery, Stephen, Berisha, Visar, et al.
Created Date
2015

The ability to identify unoccupied resources in the radio spectrum is a key capability for opportunistic users in a cognitive radio environment. This paper draws upon and extends geometrically based ideas in statistical signal processing to develop estimators for the rank and the occupied subspace in a multi-user environment from multiple temporal samples of the signal received at a single antenna. These estimators enable identification of resources, such as the orthogonal complement of the occupied subspace, that may be exploitable by an opportunistic user. This concept is supported by simulations showing the estimation of the number of users in a …

Contributors
Beaudet, Kaitlyn, Cochran, Douglas, Turaga, Pavan, et al.
Created Date
2014

Glottal fry is a vocal register characterized by low frequency and increased signal perturbation, and is perceptually identified by its popping, creaky quality. Recently, the use of the glottal fry vocal register has received growing awareness and attention in popular culture and media in the United States. The creaky quality that was originally associated with vocal pathologies is indeed becoming “trendy,” particularly among young women across the United States. But while existing studies have defined, quantified, and attempted to explain the use of glottal fry in conversational speech, there is currently no explanation for the increasing prevalence of the use …

Contributors
Delfino, Christine R., Liss, Julie M, Borrie, Stephanie A, et al.
Created Date
2015

Information divergence functions, such as the Kullback-Leibler divergence or the Hellinger distance, play a critical role in statistical signal processing and information theory; however estimating them can be challenge. Most often, parametric assumptions are made about the two distributions to estimate the divergence of interest. In cases where no parametric model fits the data, non-parametric density estimation is used. In statistical signal processing applications, Gaussianity is usually assumed since closed-form expressions for common divergence measures have been derived for this family of distributions. Parametric assumptions are preferred when it is known that the data follows the model, however this is …

Contributors
Wisler, Alan, Berisha, Visar, Spanias, Andreas, et al.
Created Date
2017

Head movement is known to have the benefit of improving the accuracy of sound localization for humans and animals. Marmoset is a small bodied New World monkey species and it has become an emerging model for studying the auditory functions. This thesis aims to detect the horizontal and vertical rotation of head movement in marmoset monkeys. Experiments were conducted in a sound-attenuated acoustic chamber. Head movement of marmoset monkey was studied under various auditory and visual stimulation conditions. With increasing complexity, these conditions are (1) idle, (2) sound-alone, (3) sound and visual signals, and (4) alert signal by opening and …

Contributors
Simhadri, Sravanthi, Zhou, Yi, Turaga, Pavan, et al.
Created Date
2014

Compressed sensing (CS) is a novel approach to collecting and analyzing data of all types. By exploiting prior knowledge of the compressibility of many naturally-occurring signals, specially designed sensors can dramatically undersample the data of interest and still achieve high performance. However, the generated data are pseudorandomly mixed and must be processed before use. In this work, a model of a single-pixel compressive video camera is used to explore the problems of performing inference based on these undersampled measurements. Three broad types of inference from CS measurements are considered: recovery of video frames, target tracking, and object classification/detection. Potential applications …

Contributors
Braun, Henry Carlton, Turaga, Pavan K, Spanias, Andreas S, et al.
Created Date
2016

A human communications research project at Arizona State University aurally recorded the daily interactions of aware and consenting employees and their visiting clients at the Software Factory, a software engineering consulting team, over a three year period. The resulting dataset contains valuable insights on the communication networks that the participants formed however it is far too vast to be processed manually by researchers. In this work, digital signal processing techniques are employed to develop a software toolkit that can aid in estimating the observable networks contained in the Software Factory recordings. A four-step process is employed that starts with parsing …

Contributors
Pressler, Daniel, Bliss, Daniel W, Berisha, Visar, et al.
Created Date
2018

Deep neural networks (DNN) have shown tremendous success in various cognitive tasks, such as image classification, speech recognition, etc. However, their usage on resource-constrained edge devices has been limited due to high computation and large memory requirement. To overcome these challenges, recent works have extensively investigated model compression techniques such as element-wise sparsity, structured sparsity and quantization. While most of these works have applied these compression techniques in isolation, there have been very few studies on application of quantization and structured sparsity together on a DNN model. This thesis co-optimizes structured sparsity and quantization constraints on DNN models during training. …

Contributors
Srivastava, Gaurav, Seo, Jae-Sun, Chakrabarti, Chaitali, et al.
Created Date
2018

The problem of cooperative radar and communications signaling is investigated. Each system typically considers the other system a source of interference. Consequently, the tradition is to have them operate in orthogonal frequency bands. By considering the radar and communications operations to be a single joint system, performance bounds on a receiver that observes communications and radar return in the same frequency allocation are derived. Bounds in performance of the joint system is measured in terms of data information rate for communications and radar estimation information rate for the radar. Inner bounds on performance are constructed. Dissertation/Thesis

Contributors
Chiriyath, Alex Rajan, Bliss, Daniel W, Kosut, Oliver, et al.
Created Date
2014