Skip to main content

ASU Electronic Theses and Dissertations


This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.


Language
  • English
Subject
Date Range
2011 2019


Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias …

Contributors
Zhang, Jun, Reiser, Mark, Barber, Jarrett, et al.
Created Date
2013

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due to sparseness in the cells of the table. The GFfit statistic can be used to examine model fit in subtables. It is proposed to assess model fit by using a new version of GFfit statistic based on orthogonal components of Pearson chi-square as a diagnostic to examine the fit on …

Contributors
Zhu, Junfei, Reiser, Mark, Stufken, John, et al.
Created Date
2017

The Pearson and likelihood ratio statistics are well-known in goodness-of-fit testing and are commonly used for models applied to multinomial count data. When data are from a table formed by the cross-classification of a large number of variables, these goodness-of-fit statistics may have lower power and inaccurate Type I error rate due to sparseness. Pearson's statistic can be decomposed into orthogonal components associated with the marginal distributions of observed variables, and an omnibus fit statistic can be obtained as a sum of these components. When the statistic is a sum of components for lower-order marginals, it has good performance for …

Contributors
Dassanayake, Mudiyanselage Maduranga Kasun, Reiser, Mark, Kao, Ming-Hung, et al.
Created Date
2018

Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane. For a quadratic growth curve, we propose an approximate confidence region as well as the confidence interval for x and y-coordinates of the vertex using two methods, the gradient method and the delta method. Under some models, an indirect test on the location of the curve can be based on the intercept and slope parameters, but in other models, a direct test on the vertex is required. We present a …

Contributors
Yu, Wanchunzi, Reiser, Mark, Barber, Jarrett, et al.
Created Date
2015

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect …

Contributors
Valdivia, Arturo, Eubank, Randall, Young, Dennis, et al.
Created Date
2013

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new IBOSS algorithm retains nice asymptotic properties of IBOSS and gives a larger determinant of the subdata information matrix. It has the same order of time complexity as the D-optimal IBOSS algorithm. However, it exploits the advantages of vectorized calculation avoiding for loops and is approximately 6 times as fast as the D-optimal IBOSS algorithm in R. The robustness of SSDA …

Contributors
Zheng, Yi, Stufken, John, Reiser, Mark, et al.
Created Date
2017

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among multi-categorical variables. Pearson's chi-squared statistic is well-known in goodness-of-fit testing, but it is sometimes considered to produce an omnibus test as it gives little guidance to the source of poor fit once the null hypothesis is rejected. However, its components can provide powerful directional tests. In this dissertation, orthogonal components …

Contributors
Milovanovic, Jelena, Young, Dennis, Reiser, Mark, et al.
Created Date
2011

When analyzing longitudinal data it is essential to account both for the correlation inherent from the repeated measures of the responses as well as the correlation realized on account of the feedback created between the responses at a particular time and the predictors at other times. A generalized method of moments (GMM) for estimating the coefficients in longitudinal data is presented. The appropriate and valid estimating equations associated with the time-dependent covariates are identified, thus providing substantial gains in efficiency over generalized estimating equations (GEE) with the independent working correlation. Identifying the estimating equations for computation is of utmost importance. …

Contributors
Yin, Jianqiong, Wilson, Jeffrey Wilson, Reiser, Mark, et al.
Created Date
2012

In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two variance parameters are estimated using generalized method of moments, negating the need for a distributional assumption. The mean-variance relation estimates are applied to clustered data and implemented in an adjusted generalized quasi-likelihood approach through an adjustment to the covariance matrix. In the presence of significant correlation in hierarchical structured data, …

Contributors
Irimata, Katherine, Wilson, Jeffrey R, Kamarianakis, Ioannis, et al.
Created Date
2018

Functional brain imaging experiments are widely conducted in many fields for study- ing the underlying brain activity in response to mental stimuli. For such experiments, it is crucial to select a good sequence of mental stimuli that allow researchers to collect informative data for making precise and valid statistical inferences at minimum cost. In contrast to most existing studies, the aim of this study is to obtain optimal designs for brain mapping technology with an ultra-high temporal resolution with respect to some common statistical optimality criteria. The first topic of this work is on finding optimal designs when the primary …

Contributors
Alghamdi, Reem, Kao, Ming-Hung, Fricks, John, et al.
Created Date
2019

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing exact designs for GLMs, that are robust against parameter, link and model uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we …

Contributors
Temkit, M'Hamed, Kao, Jason, Reiser, Mark, et al.
Created Date
2014

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained on a case-by-case basis, while in other situations, researchers also rely heavily on computational tools for design selection. Three topics are investigated in this dissertation with each one focusing on one type of GLMs. Topic I considers GLMs with factorial effects and one continuous covariate. Factors can have interactions among …

Contributors
Wang, Zhongshen, Stufken, John, Kamarianakis, Ioannis, et al.
Created Date
2018

A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have measurement errors. This method is commonly called Reduced Major Axis (RMA) regression and is often used instead of Ordinary Least Squares (OLS) regression. Results for confidence intervals, hypothesis testing and asymptotic distributions of coefficient estimates in the bivariate case are reviewed. A generalization of RMA to more than two variables for fitting a plane to data is obtained by minimizing …

Contributors
Li, Jingjin, Young, Dennis, Eubank, Randall, et al.
Created Date
2012

This study concerns optimal designs for experiments where responses consist of both binary and continuous variables. Many experiments in engineering, medical studies, and other fields have such mixed responses. Although in recent decades several statistical methods have been developed for jointly modeling both types of response variables, an effective way to design such experiments remains unclear. To address this void, some useful results are developed to guide the selection of optimal experimental designs in such studies. The results are mainly built upon a powerful tool called the complete class approach and a nonlinear optimization algorithm. The complete class approach was …

Contributors
Kim, Soohyun, Kao, Ming-Hung, Dueck, Amylou, et al.
Created Date
2017

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate GCGF outcomes. The purpose of this dissertation was to compare the statistical performance of four regression models (linear regression, Poisson regression, ordinal logistic regression, and beta regression) that can be used when the outcome is a GCGF variable. A simulation study was used to determine the power, type I error, …

Contributors
Coxe, Stefany Jean, Aiken, Leona S, West, Stephen G, et al.
Created Date
2012

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and powerful computational tools for obtaining good fMRI designs. However, these results are mainly for basic experimental settings with simple statistical models. In this work, a type of modern fMRI experiments is considered, in which the design matrix of the statistical model depends not only on the selected design, but also …

Contributors
Zhou, Lin, Kao, Ming-hung, Reiser, Mark, et al.
Created Date
2014

By the von Neumann min-max theorem, a two person zero sum game with finitely many pure strategies has a unique value for each player (summing to zero) and each player has a non-empty set of optimal mixed strategies. If the payoffs are independent, identically distributed (iid) uniform (0,1) random variables, then with probability one, both players have unique optimal mixed strategies utilizing the same number of pure strategies with positive probability (Jonasson 2004). The pure strategies with positive probability in the unique optimal mixed strategies are called saddle squares. In 1957, Goldman evaluated the probability of a saddle point (a …

Contributors
Manley, Michael, Kadell, Kevin W. J., Kao, Ming-Hung, et al.
Created Date
2011

Large-scale cultivation of perennial bioenergy crops (e.g., miscanthus and switch- grass) offers unique opportunities to mitigate climate change through avoided fossil fuel use and associated greenhouse gas reduction. Although conversion of existing agriculturally intensive lands (e.g., maize and soy) to perennial bioenergy cropping systems has been shown to reduce near-surface temperatures, unintended consequences on natural water resources via depletion of soil moisture may offset these benefits. In the effort of the cross-fertilization across the disciplines of physics-based modeling and spatio-temporal statistics, three topics are investigated in this dissertation aiming to provide a novel quantification and robust justifications of the hydroclimate …

Contributors
Wang, Meng, Kamarianakis, Yiannis, Georgescu, Matei, et al.
Created Date
2018

The concept of distribution is one of the core ideas of probability theory and inferential statistics, if not the core idea. Many introductory statistics textbooks pay lip service to stochastic/random processes but how do students think about these processes? This study sought to explore what understandings of stochastic process students develop as they work through materials intended to support them in constructing the long-run behavior meaning for distribution. I collected data in three phases. First, I conducted a set of task-based clinical interviews that allowed me to build initial models for the students’ meanings for randomness and probability. Second, I …

Contributors
Hatfield, Neil, Thompson, Patrick, Carlson, Marilyn, et al.
Created Date
2019

This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional nonparametric classifiers form a methodological basis, which is used herein to develop a) the family of ESFuNC segment-wise curve classification algorithms and b) per-pixel ensembles based on logistic regression and fused-LASSO. The proposed methods achieve test set accuracy rates as high as 94.3%, while returning information about regions of the temperature domain …

Contributors
Buscaglia, Robert, Kamarianakis, Yiannis, Armbruster, Dieter, et al.
Created Date
2018

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of …

Contributors
Ismay, Chester Ivan, Eubank, Randall, Young, Dennis, et al.
Created Date
2013

Though the likelihood is a useful tool for obtaining estimates of regression parameters, it is not readily available in the fit of hierarchical binary data models. The correlated observations negate the opportunity to have a joint likelihood when fitting hierarchical logistic regression models. Through conditional likelihood, inferences for the regression and covariance parameters as well as the intraclass correlation coefficients are usually obtained. In those cases, I have resorted to use of Laplace approximation and large sample theory approach for point and interval estimates such as Wald-type confidence intervals and profile likelihood confidence intervals. These methods rely on distributional assumptions …

Contributors
Wang, Bei, Wilson, Jeffrey R, Kamarianakis, Ioannis, et al.
Created Date
2017

Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association between the covariates and the outcomes across time. When multiple outcomes are of interest, association may exist between the various models. These correlations can lead to issues in model fitting and inference if not properly accounted for. This dissertation presents three papers discussing appropriate methods to properly consider different types …

Contributors
Irimata, Kyle, Wilson, Jeffrey R, Broatch, Jennifer, et al.
Created Date
2018

The primary objective in time series analysis is forecasting. Raw data often exhibits nonstationary behavior: trends, seasonal cycles, and heteroskedasticity. After data is transformed to a weakly stationary process, autoregressive moving average (ARMA) models may capture the remaining temporal dynamics to improve forecasting. Estimation of ARMA can be performed through regressing current values on previous realizations and proxy innovations. The classic paradigm fails when dynamics are nonlinear; in this case, parametric, regime-switching specifications model changes in level, ARMA dynamics, and volatility, using a finite number of latent states. If the states can be identified using past endogenous or exogenous information, …

Contributors
Giacomazzo, Mario, Kamarianakis, Yiannis, Reiser, Mark, et al.
Created Date
2018

Threshold regression is used to model regime switching dynamics where the effects of the explanatory variables in predicting the response variable depend on whether a certain threshold has been crossed. When regime-switching dynamics are present, new estimation problems arise related to estimating the value of the threshold. Conventional methods utilize an iterative search procedure, seeking to minimize the sum of squares criterion. However, when unnecessary variables are included in the model or certain variables drop out of the model depending on the regime, this method may have high variability. This paper proposes Lasso-type methods as an alternative to ordinary least …

Contributors
van Schaijik, Maria, Kamarianakis, Yiannis, Kamarianakis, Yiannis, et al.
Created Date
2015