Skip to main content

ASU Electronic Theses and Dissertations


This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.


Contributor
Resource Type
  • Masters Thesis
  • 1 Text
Subject
Date Range
2011 2019


Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading algorithms, including BART. The results show that XBART maintains BART’s predictive power while reducing its computation time. The thesis also describes the development of a Python package implementing XBART. Dissertation/Thesis

Contributors
Yalov, Saar, Hahn, P. Richard, McCulloch, Robert, et al.
Created Date
2019

This thesis presents a family of adaptive curvature methods for gradient-based stochastic optimization. In particular, a general algorithmic framework is introduced along with a practical implementation that yields an efficient, adaptive curvature gradient descent algorithm. To this end, a theoretical and practical link between curvature matrix estimation and shrinkage methods for covariance matrices is established. The use of shrinkage improves estimation accuracy of the curvature matrix when data samples are scarce. This thesis also introduce several insights that result in data- and computation-efficient update equations. Empirical results suggest that the proposed method compares favorably with existing second-order techniques based on …

Contributors
Barron, Trevor Paul, Ben Amor, Heni, He, Jingrui, et al.
Created Date
2019

In this work, I present a Bayesian inference computational framework for the analysis of widefield microscopy data that addresses three challenges: (1) counting and localizing stationary fluorescent molecules; (2) inferring a spatially-dependent effective fluorescence profile that describes the spatially-varying rate at which fluorescent molecules emit subsequently-detected photons (due to different illumination intensities or different local environments); and (3) inferring the camera gain. My general theoretical framework utilizes the Bayesian nonparametric Gaussian and beta-Bernoulli processes with a Markov chain Monte Carlo sampling scheme, which I further specify and implement for Total Internal Reflection Fluorescence (TIRF) microscopy data, benchmarking the method on …

Contributors
Wallgren, Ross Tod, Presse, Steve, Armbruster, Hans, et al.
Created Date
2019

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach. In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore and predict students’ …

Contributors
Tian, Wenbo, Hsiao, Ihan, Bazzi, Rida, et al.
Created Date
2019

Statistical model selection using the Akaike Information Criterion (AIC) and similar criteria is a useful tool for comparing multiple and non-nested models without the specification of a null model, which has made it increasingly popular in the natural and social sciences. De- spite their common usage, model selection methods are not driven by a notion of statistical confidence, so their results entail an unknown de- gree of uncertainty. This paper introduces a general framework which extends notions of Type-I and Type-II error to model selection. A theo- retical method for controlling Type-I error using Difference of Goodness of Fit (DGOF) …

Contributors
Cullan, Michael, Sterner, Beckett, Fricks, John, et al.
Created Date
2018

Understanding customer preference is crucial for new product planning and marketing decisions. This thesis explores how historical data can be leveraged to understand and predict customer preference. This thesis presents a decision support framework that provides a holistic view on customer preference by following a two-phase procedure. Phase-1 uses cluster analysis to create product profiles based on which customer profiles are derived. Phase-2 then delves deep into each of the customer profiles and investigates causality behind their preference using Bayesian networks. This thesis illustrates the working of the framework using the case of Intel Corporation, world’s largest semiconductor manufacturing company. …

Contributors
Ram, Sudarshan Venkat, Kempf, Karl G, Wu, Teresa, et al.
Created Date
2017

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new IBOSS algorithm retains nice asymptotic properties of IBOSS and gives a larger determinant of the subdata information matrix. It has the same order of time complexity as the D-optimal IBOSS algorithm. However, it exploits the advantages of vectorized calculation avoiding for loops and is approximately 6 times as fast as the D-optimal IBOSS algorithm in R. The robustness of SSDA …

Contributors
Zheng, Yi, Stufken, John, Reiser, Mark, et al.
Created Date
2017

Distributed Renewable energy generators are now contributing a significant amount of energy into the energy grid. Consequently, reliability adequacy of such energy generators will depend on making accurate forecasts of energy produced by them. Power outputs of Solar PV systems depend on the stochastic variation of environmental factors (solar irradiance, ambient temperature & wind speed) and random mechanical failures/repairs. Monte Carlo Simulation which is typically used to model such problems becomes too computationally intensive leading to simplifying state-space assumptions. Multi-state models for power system reliability offer a higher flexibility in providing a description of system state evolution and an accurate …

Contributors
Kadloor, Nikhil, Kuitche, Joseph, Pan, Rong, et al.
Created Date
2017

Anomaly is a deviation from the normal behavior of the system and anomaly detection techniques try to identify unusual instances based on deviation from the normal data. In this work, I propose a machine-learning algorithm, referred to as Artificial Contrasts, for anomaly detection in categorical data in which neither the dimension, the specific attributes involved, nor the form of the pattern is known a priori. I use RandomForest (RF) technique as an effective learner for artificial contrast. RF is a powerful algorithm that can handle relations of attributes in high dimensional data and detect anomalies while providing probability estimates for …

Contributors
Mousavi, Seyyedehnasim, Runger, George, Wu, Teresa, et al.
Created Date
2016

The operating temperature of photovoltaic (PV) modules is affected by external factors such as irradiance, wind speed and ambient temperature as well as internal factors like material properties and design properties. These factors can make a difference in the operating temperatures between cells within a module and between modules within a plant. This is a three-part thesis. Part 1 investigates the behavior of temperature distribution of PV cells within a module through outdoor temperature monitoring under various operating conditions (Pmax, Voc and Isc) and examines deviation in the temperature coefficient values pertaining to this temperature variation. ANOVA, a statistical tool, …

Contributors
PAVGI, ASHWINI, Tamizhmani, Govindasamy, Phelan, Patrick, et al.
Created Date
2016

The inherent intermittency in solar energy resources poses challenges to scheduling generation, transmission, and distribution systems. Energy storage devices are often used to mitigate variability in renewable asset generation and provide a mechanism to shift renewable power between periods of the day. In the absence of storage, however, time series forecasting techniques can be used to estimate future solar resource availability to improve the accuracy of solar generator scheduling. The knowledge of future solar availability helps scheduling solar generation at high-penetration levels, and assists with the selection and scheduling of spinning reserves. This study employs statistical techniques to improve the …

Contributors
Soundiah Regunathan Rajasekaran, Dhiwaakar Purusothaman, Johnson, Nathan G, Karady, George G, et al.
Created Date
2016

A simulation study was conducted to explore the influence of partial loading invariance and partial intercept invariance on the latent mean comparison of the second-order factor within a higher-order confirmatory factor analysis (CFA) model. Noninvariant loadings or intercepts were generated to be at one of the two levels or both levels for a second-order CFA model. The numbers and directions of differences in noninvariant loadings or intercepts were also manipulated, along with total sample size and effect size of the second-order factor mean difference. Data were analyzed using correct and incorrect specifications of noninvariant loadings and intercepts. Results summarized across …

Contributors
Liu, Yixing, Thompson, Marilyn, Green, Samuel, et al.
Created Date
2016

The Partition of Variance (POV) method is a simplistic way to identify large sources of variation in manufacturing systems. This method identifies the variance by estimating the variance of the means (between variance) and the means of the variance (within variance). The project shows that the method correctly identifies the variance source when compared to the ANOVA method. Although the variance estimators deteriorate when varying degrees of non-normality is introduced through simulation; however, the POV method is shown to be a more stable measure of variance in the aggregate. The POV method also provides non-negative, stable estimates for interaction when …

Contributors
Little, David John, Borror, Connie, Montgomery, Douglas, et al.
Created Date
2015

Given the importance of buildings as major consumers of resources worldwide, several organizations are working avidly to ensure the negative impacts of buildings are minimized. The U.S. Green Building Council's (USGBC) Leadership in Energy and Environmental Design (LEED) rating system is one such effort to recognize buildings that are designed to achieve a superior performance in several areas including energy consumption and indoor environmental quality (IEQ). The primary objectives of this study are to investigate the performance of LEED certified facilities in terms of energy consumption and occupant satisfaction with IEQ, and introduce a framework to assess the performance of …

Contributors
Chokor, Abbas, El Asmar, Mounir, Chong, Oswald, et al.
Created Date
2015

Researchers are often interested in estimating interactions in multilevel models, but many researchers assume that the same procedures and interpretations for interactions in single-level models apply to multilevel models. However, estimating interactions in multilevel models is much more complex than in single-level models. Because uncentered (RAS) or grand mean centered (CGM) level-1 predictors in two-level models contain two sources of variability (i.e., within-cluster variability and between-cluster variability), interactions involving RAS or CGM level-1 predictors also contain more than one source of variability. In this Master’s thesis, I use simulations to demonstrate that ignoring the four sources of variability in a …

Contributors
Mazza, Gina Lynn, Enders, Craig K., Aiken, Leona S., et al.
Created Date
2015

Currently, there is a clear gap in the missing data literature for three-level models. To date, the literature has only focused on the theoretical and algorithmic work required to implement three-level imputation using the joint model (JM) method of imputation, leaving relatively no work done on fully conditional specication (FCS) method. Moreover, the literature lacks any methodological evaluation of three-level imputation. Thus, this thesis serves two purposes: (1) to develop an algorithm in order to implement FCS in the context of a three-level model and (2) to evaluate both imputation methods. The simulation investigated a random intercept model under both …

Contributors
Keller, Brian Tinnell, Enders, Craig K, Grimm, Kevin J, et al.
Created Date
2015

The present thesis explores how statistical methods are conceptualized, used, and interpreted in quantitative Hispanic sociolinguistics in light of the group of statistical methods espoused by Kline (2013) and named by Cumming (2012) as the “new statistics.” The new statistics, as a conceptual framework, repudiates null hypothesis statistical testing (NHST) and replaces it with the ESCI method, or Effect Sizes and Confidence Intervals, as well as meta-analytic thinking. In this thesis, a descriptive review of 44 studies found in three academic journals over the last decade (2005 – 2015), NHST was found to have a tight grip on most researchers. …

Contributors
Kidhardt, Paul Adrian, Cerron-Palomino, Alvaro, Gonzalez-Lopez, Veronica, et al.
Created Date
2015

Threshold regression is used to model regime switching dynamics where the effects of the explanatory variables in predicting the response variable depend on whether a certain threshold has been crossed. When regime-switching dynamics are present, new estimation problems arise related to estimating the value of the threshold. Conventional methods utilize an iterative search procedure, seeking to minimize the sum of squares criterion. However, when unnecessary variables are included in the model or certain variables drop out of the model depending on the regime, this method may have high variability. This paper proposes Lasso-type methods as an alternative to ordinary least …

Contributors
van Schaijik, Maria, Kamarianakis, Yiannis, Kamarianakis, Yiannis, et al.
Created Date
2015

This is a two-part thesis: Part 1 characterizes soiling losses using various techniques to understand the effect of soiling on photovoltaic modules. The higher the angle of incidence (AOI), the lower will be the photovoltaic (PV) module performance. Our research group has already reported the AOI investigation for cleaned modules of five different technologies with air/glass interface. However, the modules that are installed in the field would invariably develop a soil layer with varying thickness depending on the site condition, rainfall and tilt angle. The soiled module will have the air/soil/glass interface rather than air/glass interface. This study investigates the …

Contributors
Boppana, Sravanthi, Tamizhmani, Govindasamy, Srinivasan, Devarajan, et al.
Created Date
2015

Tracking targets in the presence of clutter is inevitable, and presents many challenges. Additionally, rapid, drastic changes in clutter density between different environments or scenarios can make it even more difficult for tracking algorithms to adapt. A novel approach to target tracking in such dynamic clutter environments is proposed using a particle filter (PF) integrated with Interacting Multiple Models (IMMs) to compensate and adapt to the transition between different clutter densities. This model was implemented for the case of a monostatic sensor tracking a single target moving with constant velocity along a two-dimensional trajectory, which crossed between regions of drastically …

Contributors
Dutson, Karl J, Papandreou-Suppappola, Antonia, Kovvali, Narayan, et al.
Created Date
2015