Skip to main content

ASU Electronic Theses and Dissertations


This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.


Subject
Date Range
2010 2019


Dimensionality assessment is an important component of evaluating item response data. Existing approaches to evaluating common assumptions of unidimensionality, such as DIMTEST (Nandakumar & Stout, 1993; Stout, 1987; Stout, Froelich, & Gao, 2001), have been shown to work well under large-scale assessment conditions (e.g., large sample sizes and item pools; see e.g., Froelich & Habing, 2007). It remains to be seen how such procedures perform in the context of small-scale assessments characterized by relatively small sample sizes and/or short tests. The fact that some procedures come with minimum allowable values for characteristics of the data, such as the number of …

Contributors
Reichenberg, Ray E., Levy, Roy, Thompson, Marilyn S., et al.
Created Date
2013

Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias …

Contributors
Zhang, Jun, Reiser, Mark, Barber, Jarrett, et al.
Created Date
2013

Value-added models (VAMs) are used by many states to assess contributions of individual teachers and schools to students' academic growth. The generalized persistence VAM, one of the most flexible in the literature, estimates the ``value added'' by individual teachers to their students' current and future test scores by employing a mixed model with a longitudinal database of test scores. There is concern, however, that missing values that are common in the longitudinal student scores can bias value-added assessments, especially when the models serve as a basis for personnel decisions -- such as promoting or dismissing teachers -- as they are …

Contributors
Karl, Andrew Thomas, Lohr, Sharon L, Yang, Yan, et al.
Created Date
2012

Understanding customer preference is crucial for new product planning and marketing decisions. This thesis explores how historical data can be leveraged to understand and predict customer preference. This thesis presents a decision support framework that provides a holistic view on customer preference by following a two-phase procedure. Phase-1 uses cluster analysis to create product profiles based on which customer profiles are derived. Phase-2 then delves deep into each of the customer profiles and investigates causality behind their preference using Bayesian networks. This thesis illustrates the working of the framework using the case of Intel Corporation, world’s largest semiconductor manufacturing company. …

Contributors
Ram, Sudarshan Venkat, Kempf, Karl G, Wu, Teresa, et al.
Created Date
2017

Complex systems are pervasive in science and engineering. Some examples include complex engineered networks such as the internet, the power grid, and transportation networks. The complexity of such systems arises not just from their size, but also from their structure, operation (including control and management), evolution over time, and that people are involved in their design and operation. Our understanding of such systems is limited because their behaviour cannot be characterized using traditional techniques of modelling and analysis. As a step in model development, statistically designed screening experiments may be used to identify the main effects and interactions most significant …

Contributors
Aldaco-Gastelum, Abraham Netzahualcoyotl, Syrotiuk, Violet R., Colbourn, Charles J., et al.
Created Date
2015

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due to sparseness in the cells of the table. The GFfit statistic can be used to examine model fit in subtables. It is proposed to assess model fit by using a new version of GFfit statistic based on orthogonal components of Pearson chi-square as a diagnostic to examine the fit on …

Contributors
Zhu, Junfei, Reiser, Mark, Stufken, John, et al.
Created Date
2017

Urban growth, from regional sprawl to global urbanization, is the most rapid, drastic, and irreversible form of human modification to the natural environment. Extensive land cover modifications during urban growth have altered the local energy balance, causing the city warmer than its surrounding rural environment, a phenomenon known as an urban heat island (UHI). How are the seasonal and diurnal surface temperatures related to the land surface characteristics, and what land cover types and/or patterns are desirable for ameliorating climate in a fast growing desert city? This dissertation scrutinizes these questions and seeks to address them using a combination of …

Contributors
Fan, Chao, Myint, Soe W, Li, Wenwen, et al.
Created Date
2016

The main objective of this research is to develop an approach to PV module lifetime prediction. In doing so, the aim is to move from empirical generalizations to a formal predictive science based on data-driven case studies of the crystalline silicon PV systems. The evaluation of PV systems aged 5 to 30 years old that results in systematic predictive capability that is absent today. The warranty period provided by the manufacturers typically range from 20 to 25 years for crystalline silicon modules. The end of lifetime (for example, the time-to-degrade by 20% from rated power) of PV modules is usually …

Contributors
Kuitche, Joseph Mathurin, Pan, Rong, TamizhMani, Govindasamy, et al.
Created Date
2014

Statistical process control (SPC) and predictive analytics have been used in industrial manufacturing and design, but up until now have not been applied to threshold data of vital sign monitoring in remote care settings. In this study of 20 elders with COPD and/or CHF, extended months of peak flow monitoring (FEV1) using telemedicine are examined to determine when an earlier or later clinical intervention may have been advised. This study demonstrated that SPC may bring less than a 2.0% increase in clinician workload while providing more robust statistically-derived thresholds than clinician-derived thresholds. Using a random K-fold model, FEV1 output was …

Contributors
Fralick, Celeste Rachelle, Muthuswamy, Jitendran, O'Shea, Terrance, et al.
Created Date
2013

Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading algorithms, including BART. The results show that XBART maintains BART’s predictive power while reducing its computation time. The thesis also describes the development of a Python package implementing XBART. Dissertation/Thesis

Contributors
Yalov, Saar, Hahn, P. Richard, McCulloch, Robert, et al.
Created Date
2019

Real-world environments are characterized by non-stationary and continuously evolving data. Learning a classification model on this data would require a framework that is able to adapt itself to newer circumstances. Under such circumstances, transfer learning has come to be a dependable methodology for improving classification performance with reduced training costs and without the need for explicit relearning from scratch. In this thesis, a novel instance transfer technique that adapts a "Cost-sensitive" variation of AdaBoost is presented. The method capitalizes on the theoretical and functional properties of AdaBoost to selectively reuse outdated training instances obtained from a "source" domain to effectively …

Contributors
Venkatesan, Ashok, Panchanathan, Sethuraman, Li, Baoxin, et al.
Created Date
2011

The Pearson and likelihood ratio statistics are well-known in goodness-of-fit testing and are commonly used for models applied to multinomial count data. When data are from a table formed by the cross-classification of a large number of variables, these goodness-of-fit statistics may have lower power and inaccurate Type I error rate due to sparseness. Pearson's statistic can be decomposed into orthogonal components associated with the marginal distributions of observed variables, and an omnibus fit statistic can be obtained as a sum of these components. When the statistic is a sum of components for lower-order marginals, it has good performance for …

Contributors
Dassanayake, Mudiyanselage Maduranga Kasun, Reiser, Mark, Kao, Ming-Hung, et al.
Created Date
2018

Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane. For a quadratic growth curve, we propose an approximate confidence region as well as the confidence interval for x and y-coordinates of the vertex using two methods, the gradient method and the delta method. Under some models, an indirect test on the location of the curve can be based on the intercept and slope parameters, but in other models, a direct test on the vertex is required. We present a …

Contributors
Yu, Wanchunzi, Reiser, Mark, Barber, Jarrett, et al.
Created Date
2015

This dissertation presents methods for addressing research problems that currently can only adequately be solved using Quality Reliability Engineering (QRE) approaches especially accelerated life testing (ALT) of electronic printed wiring boards with applications to avionics circuit boards. The methods presented in this research are generally applicable to circuit boards, but the data generated and their analysis is for high performance avionics. Avionics equipment typically requires 20 years expected life by aircraft equipment manufacturers and therefore ALT is the only practical way of performing life test estimates. Both thermal and vibration ALT induced failure are performed and analyzed to resolve industry …

Contributors
Juarez, Joseph Moses, Montgomery, Douglas C., Borror, Connie M., et al.
Created Date
2012

This thesis presents a family of adaptive curvature methods for gradient-based stochastic optimization. In particular, a general algorithmic framework is introduced along with a practical implementation that yields an efficient, adaptive curvature gradient descent algorithm. To this end, a theoretical and practical link between curvature matrix estimation and shrinkage methods for covariance matrices is established. The use of shrinkage improves estimation accuracy of the curvature matrix when data samples are scarce. This thesis also introduce several insights that result in data- and computation-efficient update equations. Empirical results suggest that the proposed method compares favorably with existing second-order techniques based on …

Contributors
Barron, Trevor Paul, Ben Amor, Heni, He, Jingrui, et al.
Created Date
2019

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect …

Contributors
Valdivia, Arturo, Eubank, Randall, Young, Dennis, et al.
Created Date
2013

This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new IBOSS algorithm retains nice asymptotic properties of IBOSS and gives a larger determinant of the subdata information matrix. It has the same order of time complexity as the D-optimal IBOSS algorithm. However, it exploits the advantages of vectorized calculation avoiding for loops and is approximately 6 times as fast as the D-optimal IBOSS algorithm in R. The robustness of SSDA …

Contributors
Zheng, Yi, Stufken, John, Reiser, Mark, et al.
Created Date
2017

Designing studies that use latent growth modeling to investigate change over time calls for optimal approaches for conducting power analysis for a priori determination of required sample size. This investigation (1) studied the impacts of variations in specified parameters, design features, and model misspecification in simulation-based power analyses and (2) compared power estimates across three common power analysis techniques: the Monte Carlo method; the Satorra-Saris method; and the method developed by MacCallum, Browne, and Cai (MBC). Choice of sample size, effect size, and slope variance parameters markedly influenced power estimates; however, level-1 error variance and number of repeated measures (3 …

Contributors
Van Vleet, Bethany L., Thompson, Marilyn S., Green, Samuel B., et al.
Created Date
2011

This dissertation proposes a new set of analytical methods for high dimensional physiological sensors. The methodologies developed in this work were motivated by problems in learning science, but also apply to numerous disciplines where high dimensional signals are present. In the education field, more data is now available from traditional sources and there is an important need for analytical methods to translate this data into improved learning. Affecting Computing which is the study of new techniques that develop systems to recognize and model human emotions is integrating different physiological signals such as electroencephalogram (EEG) and electromyogram (EMG) to detect and …

Contributors
Lujan Moreno, Gustavo A., Runger, George C, Atkinson, Robert K, et al.
Created Date
2017

Anomaly is a deviation from the normal behavior of the system and anomaly detection techniques try to identify unusual instances based on deviation from the normal data. In this work, I propose a machine-learning algorithm, referred to as Artificial Contrasts, for anomaly detection in categorical data in which neither the dimension, the specific attributes involved, nor the form of the pattern is known a priori. I use RandomForest (RF) technique as an effective learner for artificial contrast. RF is a powerful algorithm that can handle relations of attributes in high dimensional data and detect anomalies while providing probability estimates for …

Contributors
Mousavi, Seyyedehnasim, Runger, George, Wu, Teresa, et al.
Created Date
2016

In the field of infectious disease epidemiology, the assessment of model robustness outcomes plays a significant role in the identification, reformulation, and evaluation of preparedness strategies aimed at limiting the impact of catastrophic events (pandemics or the deliberate release of biological agents) or used in the management of disease prevention strategies, or employed in the identification and evaluation of control or mitigation measures. The research work in this dissertation focuses on: The comparison and assessment of the role of exponentially distributed waiting times versus the use of generalized non-exponential parametric distributed waiting times of infectious periods on the quantitative and …

Contributors
Morale Butler, Emmanuel Jesús, Castillo-Chavez, Carlos, Aparicio, Juan P, et al.
Created Date
2014

This thesis presents a meta-analysis of lead-free solder reliability. The qualitative analyses of the failure modes of lead- free solder under different stress tests including drop test, bend test, thermal test and vibration test are discussed. The main cause of failure of lead- free solder is fatigue crack, and the speed of propagation of the initial crack could differ from different test conditions and different solder materials. A quantitative analysis about the fatigue behavior of SAC lead-free solder under thermal preconditioning process is conducted. This thesis presents a method of making prediction of failure life of solder alloy by building …

Contributors
Xu, Xinyue, Pan, Rong, Montgomery, Douglas, et al.
Created Date
2014

Many methodological approaches have been utilized to predict student retention and persistence over the years, yet few have utilized a Bayesian framework. It is believed this is due in part to the absence of an established process for guiding educational researchers reared in a frequentist perspective into the realms of Bayesian analysis and educational data mining. The current study aimed to address this by providing a model-building process for developing a Bayesian network (BN) that leveraged educational data mining, Bayesian analysis, and traditional iterative model-building techniques in order to predict whether community college students will stop out at the completion …

Contributors
Arcuria, Phil, Levy, Roy, Green, Samuel B, et al.
Created Date
2015

Urban scaling analysis has introduced a new scientific paradigm to the study of cities. With it, the notions of <italic>size</italic>, <italic>heterogeneity</italic> and <italic>structure</italic> have taken a leading role. These notions are assumed to be behind the causes for why cities differ from one another, sometimes wildly. However, the mechanisms by which size, heterogeneity and structure shape the general statistical patterns that describe urban economic output are still unclear. Given the rapid rate of urbanization around the globe, we need precise and formal mathematical understandings of these matters. In this context, I perform in this dissertation probabilistic, distributional and computational explorations …

Contributors
Gomez-Lievano, Andres, Lobo, José, Muneepeerakul, Rachata, et al.
Created Date
2014

Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of mismatch in dimensionality between data and analysis models with multiple-group analyses at the population and sample levels. Datasets were generated using a bifactor model with different factor structures and were analyzed with bifactor and single-factor models to assess misspecification effects on assessments of MI and latent mean differences. As baseline …

Contributors
Xu, Yuning, Green, Samuel, Levy, Roy, et al.
Created Date
2018

Optimal experimental design for generalized linear models is often done using a pseudo-Bayesian approach that integrates the design criterion across a prior distribution on the parameter values. This approach ignores the lack of utility of certain models contained in the prior, and a case is demonstrated where the heavy focus on such hopeless models results in a design with poor performance and with wild swings in coverage probabilities for Wald-type confidence intervals. Design construction using a utility-based approach is shown to result in much more stable coverage probabilities in the area of greatest concern. The pseudo-Bayesian approach can be applied …

Contributors
Hassler, Edgar, Montgomery, Douglas C, Silvestrini, Rachel T, et al.
Created Date
2015

In this work, I present a Bayesian inference computational framework for the analysis of widefield microscopy data that addresses three challenges: (1) counting and localizing stationary fluorescent molecules; (2) inferring a spatially-dependent effective fluorescence profile that describes the spatially-varying rate at which fluorescent molecules emit subsequently-detected photons (due to different illumination intensities or different local environments); and (3) inferring the camera gain. My general theoretical framework utilizes the Bayesian nonparametric Gaussian and beta-Bernoulli processes with a Markov chain Monte Carlo sampling scheme, which I further specify and implement for Total Internal Reflection Fluorescence (TIRF) microscopy data, benchmarking the method on …

Contributors
Wallgren, Ross Tod, Presse, Steve, Armbruster, Hans, et al.
Created Date
2019

This thesis examines the application of statistical signal processing approaches to data arising from surveys intended to measure psychological and sociological phenomena underpinning human social dynamics. The use of signal processing methods for analysis of signals arising from measurement of social, biological, and other non-traditional phenomena has been an important and growing area of signal processing research over the past decade. Here, we explore the application of statistical modeling and signal processing concepts to data obtained from the Global Group Relations Project, specifically to understand and quantify the effects and interactions of social psychological factors related to intergroup conflicts. We …

Contributors
Liu, Hui, Taylor, Thomas, Cochran, Douglas, et al.
Created Date
2012

Statistical model selection using the Akaike Information Criterion (AIC) and similar criteria is a useful tool for comparing multiple and non-nested models without the specification of a null model, which has made it increasingly popular in the natural and social sciences. De- spite their common usage, model selection methods are not driven by a notion of statistical confidence, so their results entail an unknown de- gree of uncertainty. This paper introduces a general framework which extends notions of Type-I and Type-II error to model selection. A theo- retical method for controlling Type-I error using Difference of Goodness of Fit (DGOF) …

Contributors
Cullan, Michael, Sterner, Beckett, Fricks, John, et al.
Created Date
2018

Mixture experiments are useful when the interest is in determining how changes in the proportion of an experimental component affects the response. This research focuses on the modeling and design of mixture experiments when the response is categorical namely, binary and ordinal. Data from mixture experiments is characterized by the perfect collinearity of the experimental components, resulting in model matrices that are singular and inestimable under likelihood estimation procedures. To alleviate problems with estimation, this research proposes the reparameterization of two nonlinear models for ordinal data -- the proportional-odds model with a logistic link and the stereotype model. A study …

Contributors
Mancenido, Michelle V., Montgomery, Douglas C, Pan, Rong, et al.
Created Date
2016

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among multi-categorical variables. Pearson's chi-squared statistic is well-known in goodness-of-fit testing, but it is sometimes considered to produce an omnibus test as it gives little guidance to the source of poor fit once the null hypothesis is rejected. However, its components can provide powerful directional tests. In this dissertation, orthogonal components …

Contributors
Milovanovic, Jelena, Young, Dennis, Reiser, Mark, et al.
Created Date
2011

Mediation analysis is used to investigate how an independent variable, X, is related to an outcome variable, Y, through a mediator variable, M (MacKinnon, 2008). If X represents a randomized intervention it is difficult to make a cause and effect inference regarding indirect effects without making no unmeasured confounding assumptions using the potential outcomes framework (Holland, 1988; MacKinnon, 2008; Robins & Greenland, 1992; VanderWeele, 2015), using longitudinal data to determine the temporal order of M and Y (MacKinnon, 2008), or both. The goals of this dissertation were to (1) define all indirect and direct effects in a three-wave longitudinal mediation …

Contributors
Valente, Matthew John, MacKinnon, David P, West, Stephen G, et al.
Created Date
2018

The living world we inhabit and observe is extraordinarily complex. From the perspective of a person analyzing data about the living world, complexity is most commonly encountered in two forms: 1) in the sheer size of the datasets that must be analyzed and the physical number of mathematical computations necessary to obtain an answer and 2) in the underlying structure of the data, which does not conform to classical normal theory statistical assumptions and includes clustering and unobserved latent constructs. Until recently, the methods and tools necessary to effectively address the complexity of biomedical data were not ordinarily available. The …

Contributors
Brown, Justin Reed, Dinu, Valentin, Johnson, William, et al.
Created Date
2012

When analyzing longitudinal data it is essential to account both for the correlation inherent from the repeated measures of the responses as well as the correlation realized on account of the feedback created between the responses at a particular time and the predictors at other times. A generalized method of moments (GMM) for estimating the coefficients in longitudinal data is presented. The appropriate and valid estimating equations associated with the time-dependent covariates are identified, thus providing substantial gains in efficiency over generalized estimating equations (GEE) with the independent working correlation. Identifying the estimating equations for computation is of utmost importance. …

Contributors
Yin, Jianqiong, Wilson, Jeffrey Wilson, Reiser, Mark, et al.
Created Date
2012

The dawn of Internet of Things (IoT) has opened the opportunity for mainstream adoption of machine learning analytics. However, most research in machine learning has focused on discovery of new algorithms or fine-tuning the performance of existing algorithms. Little exists on the process of taking an algorithm from the lab-environment into the real-world, culminating in sustained value. Real-world applications are typically characterized by dynamic non-stationary systems with requirements around feasibility, stability and maintainability. Not much has been done to establish standards around the unique analytics demands of real-world scenarios. This research explores the problem of the why so few of …

Contributors
Shahapurkar, Som, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2016

In accelerated life tests (ALTs), complete randomization is hardly achievable because of economic and engineering constraints. Typical experimental protocols such as subsampling or random blocks in ALTs result in a grouped structure, which leads to correlated lifetime observations. In this dissertation, generalized linear mixed model (GLMM) approach is proposed to analyze ALT data and find the optimal ALT design with the consideration of heterogeneous group effects. Two types of ALTs are demonstrated for data analysis. First, constant-stress ALT (CSALT) data with Weibull failure time distribution is modeled by GLMM. The marginal likelihood of observations is approximated by the quadrature rule; …

Contributors
Seo, Kangwon, Pan, Rong, Montgomery, Douglas C, et al.
Created Date
2017

This is a two part thesis: Part 1 of this thesis determines the most dominant failure modes of field aged photovoltaic (PV) modules using experimental data and statistical analysis, FMECA (Failure Mode, Effect, and Criticality Analysis). The failure and degradation modes of about 5900 crystalline-Si glass/polymer modules fielded for 6 to 16 years in three different photovoltaic (PV) power plants with different mounting systems under the hot-dry desert climate of Arizona are evaluated. A statistical reliability tool, FMECA that uses Risk Priority Number (RPN) is performed for each PV power plant to determine the dominant failure modes in the modules …

Contributors
Shrestha, Sanjay Mohan, Tamizhmani, Govindsamy, Srinivasan, Devrajan, et al.
Created Date
2014

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75, level 3 (L3) sample size ranged from 30 to 150, intraclass correlation (ICC) ranging from 0.10 to 0.50, model complexity ranging from one predictor to three predictors), this study intends to provide general guidelines about adequate sample sizes at three levels under varying ICC conditions for a viable three level HLM analysis (e.g., reasonably unbiased and accurate parameter estimates). In this study, the data generating parameters for the were obtained …

Contributors
Yel, Nedim, Levy, Roy, Elliott, Stephen N, et al.
Created Date
2016

Smoking remains the leading cause of preventable death in the United States, and early initiation is associated with greater difficulty quitting. Among adolescent smokers, those with attention-deficit hyperactivity disorder (ADHD), characterized by difficulties associated with impulsivity, hyperactivity, and inattention, smoke at nearly twice the rate of their peers. Although cigarette smoking is highly addictive, nicotine is a relatively weak primary reinforcer, spurring research on other potential targets that may maintain smoking, including the potential benefits of nicotine on attention, inhibition, and reinforcer efficacy. The present study employs the most prevalent rodent model of ADHD, the spontaneously hypertensive rat (SHR) and …

Contributors
Mazur, Gabriel Joseph, Sanabria, Federico, Killeen, Peter R, et al.
Created Date
2014

Electricity infrastructure vulnerabilities were assessed for future heat waves due to climate change. Critical processes and component relationships were identified and characterized with consideration for the terminal event of service outages, including cascading failures in transmission-level components that can result in blackouts. The most critical dependency identified was the increase in peak electricity demand with higher air temperatures. Historical and future air temperatures were characterized within and across Los Angeles County, California (LAC) and Maricopa County (Phoenix), Arizona. LAC was identified as more vulnerable to heat waves than Phoenix due to a wider distribution of historical temperatures. Two approaches were …

Contributors
Burillo, Daniel, Chester, Mikhail V, Ruddell, Benjamin, et al.
Created Date
2018

The objective of this thesis is to investigate the various types of energy end-uses to be expected in future high efficiency single family residences. For this purpose, this study has analyzed monitored data from 14 houses in the 2013 Solar Decathlon competition, and segregates the energy consumption patterns in various residential end-uses (such as lights, refrigerators, washing machines, ...). The analysis was not straight-forward since these homes were operated according to schedules previously determined by the contest rules. The analysis approach allowed the isolation of the comfort energy use by the Heating, Venting and Cooling (HVAC) systems. HVAC are the …

Contributors
Garkhail, Rahul, Reddy, T Agami, Bryan, Harvey, et al.
Created Date
2014

As the world embraces a sustainable energy future, alternative energy resources, such as wind power, are increasingly being seen as an integral part of the future electric energy grid. Ultimately, integrating such a dynamic and variable mix of generation requires a better understanding of renewable generation output, in addition to power grid systems that improve power system operational performance in the presence of anticipated events such as wind power ramps. Because of the stochastic, uncontrollable nature of renewable resources, a thorough and accurate characterization of wind activity is necessary to maintain grid stability and reliability. Wind power ramps from an …

Contributors
Ganger, David Wu, Vittal, Vijay, Zhang, Junshan, et al.
Created Date
2016

In two independent and thematically connected chapters, I investigate consumers' willingness to pay a price premium in response to product development that entails prosocial attributes (PATs), those that allude to the reduction of negative externalities to benefit society, and to an innovative participatory pricing design called 'Pay-What-You-Want' (PWYW) pricing, a mechanism that relinquishes the determination of payments in exchange for private goods to the consumers themselves partly relying on their prosocial preferences to drive positive payments. First, I propose a novel statistical approach built on the choice based contingent valuation technique to estimate incremental willingness to pay (IWTP) for PATs …

Contributors
Christopher, Ranjit M., Wiles, Michael, Ketcham, Jonathan, et al.
Created Date
2016

In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two variance parameters are estimated using generalized method of moments, negating the need for a distributional assumption. The mean-variance relation estimates are applied to clustered data and implemented in an adjusted generalized quasi-likelihood approach through an adjustment to the covariance matrix. In the presence of significant correlation in hierarchical structured data, …

Contributors
Irimata, Katherine, Wilson, Jeffrey R, Kamarianakis, Ioannis, et al.
Created Date
2018

Estimating cointegrating relationships requires specific techniques. Canonical correlations are used to determine the rank and space of the cointegrating matrix. The vectors used to transform the data into canonical variables have an eigenvector representation, and the associated canonical correlations have an eigenvalue representation. The number of cointegrating relations is chosen based upon a theoretical difference in the convergence rates of the eignevalues. The number of cointegrating relations is consistently estimated using a threshold function which places a lower bound on the eigenvalues associated with cointegrating relations and an upper bound on the eigenvalues on the eigenvalues not associated with cointegrating …

Contributors
Nowak, Adam Daniel, Ahn, Seung C, Liu, Crocker, et al.
Created Date
2012

Given the importance of buildings as major consumers of resources worldwide, several organizations are working avidly to ensure the negative impacts of buildings are minimized. The U.S. Green Building Council's (USGBC) Leadership in Energy and Environmental Design (LEED) rating system is one such effort to recognize buildings that are designed to achieve a superior performance in several areas including energy consumption and indoor environmental quality (IEQ). The primary objectives of this study are to investigate the performance of LEED certified facilities in terms of energy consumption and occupant satisfaction with IEQ, and introduce a framework to assess the performance of …

Contributors
Chokor, Abbas, El Asmar, Mounir, Chong, Oswald, et al.
Created Date
2015

Understanding how adherence affects outcomes is crucial when developing and assigning interventions. However, interventions are often evaluated by conducting randomized experiments and estimating intent-to-treat effects, which ignore actual treatment received. Dose-response effects can supplement intent-to-treat effects when participants are offered the full dose but many only receive a partial dose due to nonadherence. Using these data, we can estimate the magnitude of the treatment effect at different levels of adherence, which serve as a proxy for different levels of treatment. In this dissertation, I conducted Monte Carlo simulations to evaluate when linear dose-response effects can be accurately and precisely estimated …

Contributors
Mazza, Gina Lynn, Grimm, Kevin J, West, Stephen G, et al.
Created Date
2018

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing exact designs for GLMs, that are robust against parameter, link and model uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we …

Contributors
Temkit, M'Hamed, Kao, Jason, Reiser, Mark, et al.
Created Date
2014

Statistics is taught at every level of education, yet teachers often have to assume their students have no knowledge of statistics and start from scratch each time they set out to teach statistics. The motivation for this experimental study comes from interest in exploring educational applications of augmented reality (AR) delivered via mobile technology that could potentially provide rich, contextualized learning for understanding concepts related to statistics education. This study examined the effects of AR experiences for learning basic statistical concepts. Using a 3 x 2 research design, this study compared learning gains of 252 undergraduate and graduate students from …

Contributors
Conley, Quincy, Atkinson, Robert K, Nguyen, Frank, et al.
Created Date
2013

Mostly, manufacturing tolerance charts are used these days for manufacturing tolerance transfer but these have the limitation of being one dimensional only. Some research has been undertaken for the three dimensional geometric tolerances but it is too theoretical and yet to be ready for operator level usage. In this research, a new three dimensional model for tolerance transfer in manufacturing process planning is presented that is user friendly in the sense that it is built upon the Coordinate Measuring Machine (CMM) readings that are readily available in any decent manufacturing facility. This model can take care of datum reference change …

Contributors
Khan, M Nadeem Shafi, Phelan, Patrick E, Montgomery, Douglas, et al.
Created Date
2011

Information divergence functions, such as the Kullback-Leibler divergence or the Hellinger distance, play a critical role in statistical signal processing and information theory; however estimating them can be challenge. Most often, parametric assumptions are made about the two distributions to estimate the divergence of interest. In cases where no parametric model fits the data, non-parametric density estimation is used. In statistical signal processing applications, Gaussianity is usually assumed since closed-form expressions for common divergence measures have been derived for this family of distributions. Parametric assumptions are preferred when it is known that the data follows the model, however this is …

Contributors
Wisler, Alan, Berisha, Visar, Spanias, Andreas, et al.
Created Date
2017

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view of the problem allows for richer learning from data and, thereby, improves decision making. The first topic of this dissertation is the prediction of several target attributes using a common set of predictor attributes. In a holistic learning approach, the relationships between target attributes are embedded into the learning algorithm …

Contributors
Azarnoush, Bahareh, Runger, George C, Bekki, Jennifer, et al.
Created Date
2014

The inherent intermittency in solar energy resources poses challenges to scheduling generation, transmission, and distribution systems. Energy storage devices are often used to mitigate variability in renewable asset generation and provide a mechanism to shift renewable power between periods of the day. In the absence of storage, however, time series forecasting techniques can be used to estimate future solar resource availability to improve the accuracy of solar generator scheduling. The knowledge of future solar availability helps scheduling solar generation at high-penetration levels, and assists with the selection and scheduling of spinning reserves. This study employs statistical techniques to improve the …

Contributors
Soundiah Regunathan Rajasekaran, Dhiwaakar Purusothaman, Johnson, Nathan G, Karady, George G, et al.
Created Date
2016

The use of bias indicators in psychological measurement has been contentious, with some researchers questioning whether they actually suppress or moderate the ability of substantive psychological indictors to discriminate (McGrath, Mitchell, Kim, & Hough, 2010). Bias indicators on the MMPI-2-RF (F-r, Fs, FBS-r, K-r, and L-r) were tested for suppression or moderation of the ability of the RC1 and NUC scales to discriminate between Epileptic Seizures (ES) and Non-epileptic Seizures (NES, a conversion disorder that is often misdiagnosed as ES). RC1 and NUC had previously been found to be the best scales on the MMPI-2-RF to differentiate between ES and …

Contributors
Wershba, Rebecca Eve, Lanyon, Richard I, Barrera, Manuel, et al.
Created Date
2013

Researchers are often interested in estimating interactions in multilevel models, but many researchers assume that the same procedures and interpretations for interactions in single-level models apply to multilevel models. However, estimating interactions in multilevel models is much more complex than in single-level models. Because uncentered (RAS) or grand mean centered (CGM) level-1 predictors in two-level models contain two sources of variability (i.e., within-cluster variability and between-cluster variability), interactions involving RAS or CGM level-1 predictors also contain more than one source of variability. In this Master’s thesis, I use simulations to demonstrate that ignoring the four sources of variability in a …

Contributors
Mazza, Gina Lynn, Enders, Craig K., Aiken, Leona S., et al.
Created Date
2015

In the study of regional economic growth and convergence, the distribution dynamics approach which interrogates the evolution of the cross-sectional distribution as a whole and is concerned with both the external and internal dynamics of the distribution has received wide usage. However, many methodological issues remain to be resolved before valid inferences and conclusions can be drawn from empirical research. Among them, spatial effects including spatial heterogeneity and spatial dependence invalidate the assumption of independent and identical distributions underlying the conventional maximum likelihood techniques while the availability of small samples in regional settings questions the usage of the asymptotic properties. …

Contributors
KANG, WEI, Rey, Sergio, Fotheringham, Stewart, et al.
Created Date
2018

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied …

Contributors
Koh, Derek, Runger, George, Wu, Tong, et al.
Created Date
2013

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained on a case-by-case basis, while in other situations, researchers also rely heavily on computational tools for design selection. Three topics are investigated in this dissertation with each one focusing on one type of GLMs. Topic I considers GLMs with factorial effects and one continuous covariate. Factors can have interactions among …

Contributors
Wang, Zhongshen, Stufken, John, Kamarianakis, Ioannis, et al.
Created Date
2018

The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields. The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data …

Contributors
Lin, Sangdi, Runger, George C, Kocher, Jean-Pierre A, et al.
Created Date
2018

Predicting resistant prostate cancer is critical for lowering medical costs and improving the quality of life of advanced prostate cancer patients. I formulate, compare, and analyze two mathematical models that aim to forecast future levels of prostate-specific antigen (PSA). I accomplish these tasks by employing clinical data of locally advanced prostate cancer patients undergoing androgen deprivation therapy (ADT). I demonstrate that the inverse problem of parameter estimation might be too complicated and simply relying on data fitting can give incorrect conclusions, since there is a large error in parameter values estimated and parameters might be unidentifiable. I provide confidence intervals …

Contributors
Baez, Javier, Kuang, Yang, Kostelich, Eric, et al.
Created Date
2017

The Partition of Variance (POV) method is a simplistic way to identify large sources of variation in manufacturing systems. This method identifies the variance by estimating the variance of the means (between variance) and the means of the variance (within variance). The project shows that the method correctly identifies the variance source when compared to the ANOVA method. Although the variance estimators deteriorate when varying degrees of non-normality is introduced through simulation; however, the POV method is shown to be a more stable measure of variance in the aggregate. The POV method also provides non-negative, stable estimates for interaction when …

Contributors
Little, David John, Borror, Connie, Montgomery, Douglas, et al.
Created Date
2015

Time-to-event analysis or equivalently, survival analysis deals with two variables simultaneously: when (time information) an event occurs and whether an event occurrence is observed or not during the observation period (censoring information). In behavioral and social sciences, the event of interest usually does not lead to a terminal state such as death. Other outcomes after the event can be collected and thus, the survival variable can be considered as a predictor as well as an outcome in a study. One example of a case where the survival variable serves as a predictor as well as an outcome is a survival-mediator …

Contributors
Kim, Han Joe, MacKinnon, David P., Tein, Jenn-Yun, et al.
Created Date
2017

In mixture-process variable experiments, it is common that the number of runs is greater than in mixture-only or process-variable experiments. These experiments have to estimate the parameters from the mixture components, process variables, and interactions of both variables. In some of these experiments there are variables that are hard to change or cannot be controlled under normal operating conditions. These situations often prohibit a complete randomization for the experimental runs due to practical and economical considerations. Furthermore, the process variables can be categorized into two types: variables that are controllable and directly affect the response, and variables that are uncontrollable …

Contributors
Cho, Tae-Yeon, Montgomery, Douglas C, Borror, Connie M, et al.
Created Date
2010

Although models for describing longitudinal data have become increasingly sophisticated, the criticism of even foundational growth curve models remains challenging. The challenge arises from the need to disentangle data-model misfit at multiple and interrelated levels of analysis. Using posterior predictive model checking (PPMC)—a popular Bayesian framework for model criticism—the performance of several discrepancy functions was investigated in a Monte Carlo simulation study. The discrepancy functions of interest included two types of conditional concordance correlation (CCC) functions, two types of R2 functions, two types of standardized generalized dimensionality discrepancy (SGDDM) functions, the likelihood ratio (LR), and the likelihood ratio difference test …

Contributors
Fay, Derek M., Levy, Roy, Thompson, Marilyn, et al.
Created Date
2015

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of the relevant patterns This dissertation proposes TS representations and methods for supervised TS analysis. The approaches combine new representations that handle translations and dilations of patterns with bag-of-features strategies and tree-based ensemble learning. This provides flexibility in handling time-warped patterns in a computationally efficient way. The ensemble learners provide a …

Contributors
Baydogan, Mustafa Gokce, Runger, George C, Atkinson, Robert, et al.
Created Date
2012

Missing data are common in psychology research and can lead to bias and reduced power if not properly handled. Multiple imputation is a state-of-the-art missing data method recommended by methodologists. Multiple imputation methods can generally be divided into two broad categories: joint model (JM) imputation and fully conditional specification (FCS) imputation. JM draws missing values simultaneously for all incomplete variables using a multivariate distribution (e.g., multivariate normal). FCS, on the other hand, imputes variables one at a time, drawing missing values from a series of univariate distributions. In the single-level context, these two approaches have been shown to be equivalent …

Contributors
Mistler, Stephen Andrew, Enders, Craig K, Aiken, Leona, et al.
Created Date
2015

Accurate data analysis and interpretation of results may be influenced by many potential factors. The factors of interest in the current work are the chosen analysis model(s), the presence of missing data, and the type(s) of data collected. If analysis models are used which a) do not accurately capture the structure of relationships in the data such as clustered/hierarchical data, b) do not allow or control for missing values present in the data, or c) do not accurately compensate for different data types such as categorical data, then the assumptions associated with the model have not been met and the …

Contributors
Kunze, Katie Lynn, Levy, Roy, Enders, Craig K, et al.
Created Date
2016

A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have measurement errors. This method is commonly called Reduced Major Axis (RMA) regression and is often used instead of Ordinary Least Squares (OLS) regression. Results for confidence intervals, hypothesis testing and asymptotic distributions of coefficient estimates in the bivariate case are reviewed. A generalization of RMA to more than two variables for fitting a plane to data is obtained by minimizing …

Contributors
Li, Jingjin, Young, Dennis, Eubank, Randall, et al.
Created Date
2012

The present thesis explores how statistical methods are conceptualized, used, and interpreted in quantitative Hispanic sociolinguistics in light of the group of statistical methods espoused by Kline (2013) and named by Cumming (2012) as the “new statistics.” The new statistics, as a conceptual framework, repudiates null hypothesis statistical testing (NHST) and replaces it with the ESCI method, or Effect Sizes and Confidence Intervals, as well as meta-analytic thinking. In this thesis, a descriptive review of 44 studies found in three academic journals over the last decade (2005 – 2015), NHST was found to have a tight grip on most researchers. …

Contributors
Kidhardt, Paul Adrian, Cerron-Palomino, Alvaro, Gonzalez-Lopez, Veronica, et al.
Created Date
2015

Transfer learning is a sub-field of statistical modeling and machine learning. It refers to methods that integrate the knowledge of other domains (called source domains) and the data of the target domain in a mathematically rigorous and intelligent way, to develop a better model for the target domain than a model using the data of the target domain alone. While transfer learning is a promising approach in various application domains, my dissertation research focuses on the particular application in health care, including telemonitoring of Parkinson’s Disease (PD) and radiomics for glioblastoma. The first topic is a Mixed Effects Transfer Learning …

Contributors
Yoon, Hyunsoo, Li, Jing, Wu, Teresa, et al.
Created Date
2018

Extraordinary medical advances have led to significant reductions in the burden of infectious diseases in humans. However, infectious diseases still account for more than 13 million annual deaths. This large burden is partly due to some pathogens having found suitable conditions to emerge and spread in denser and more connected host populations, and others having evolved to escape the pressures imposed by the rampant use of antimicrobials. It is then critical to improve our understanding of how diseases spread in these modern landscapes, characterized by new host population structures and socio-economic environments, as well as containment measures such as the …

Contributors
Patterson-Lomba, Oscar, Castillo-Chavez, Carlos, Towers, Sherry, et al.
Created Date
2014

The majority of research in experimental design has, to date, been focused on designs when there is only one type of response variable under consideration. In a decision-making process, however, relying on only one objective or criterion can lead to oversimplified, sub-optimal decisions that ignore important considerations. Incorporating multiple, and likely competing, objectives is critical during the decision-making process in order to balance the tradeoffs of all potential solutions. Consequently, the problem of constructing a design for an experiment when multiple types of responses are of interest does not have a clear answer, particularly when the response variables have different …

Contributors
Burke, Sarah Ellen, Montgomery, Douglas C, Borror, Connie M, et al.
Created Date
2016

Functional or dynamic responses are prevalent in experiments in the fields of engineering, medicine, and the sciences, but proposals for optimal designs are still sparse for this type of response. Experiments with dynamic responses result in multiple responses taken over a spectrum variable, so the design matrix for a dynamic response have more complicated structures. In the literature, the optimal design problem for some functional responses has been solved using genetic algorithm (GA) and approximate design methods. The goal of this dissertation is to develop fast computer algorithms for calculating exact D-optimal designs. First, we demonstrated how the traditional exchange …

Contributors
Saleh, Moein, Pan, Rong, Montgomery, Douglas C, et al.
Created Date
2015

Nowadays product reliability becomes the top concern of the manufacturers and customers always prefer the products with good performances under long period. In order to estimate the lifetime of the product, accelerated life testing (ALT) is introduced because most of the products can last years even decades. Much research has been done in the ALT area and optimal design for ALT is a major topic. This dissertation consists of three main studies. First, a methodology of finding optimal design for ALT with right censoring and interval censoring have been developed and it employs the proportional hazard (PH) model and generalized …

Contributors
Yang, Tao, Pan, Rong, Montgomery, Douglas, et al.
Created Date
2013

This study concerns optimal designs for experiments where responses consist of both binary and continuous variables. Many experiments in engineering, medical studies, and other fields have such mixed responses. Although in recent decades several statistical methods have been developed for jointly modeling both types of response variables, an effective way to design such experiments remains unclear. To address this void, some useful results are developed to guide the selection of optimal experimental designs in such studies. The results are mainly built upon a powerful tool called the complete class approach and a nonlinear optimization algorithm. The complete class approach was …

Contributors
Kim, Soohyun, Kao, Ming-Hung, Dueck, Amylou, et al.
Created Date
2017

This is a two-part thesis: Part 1 characterizes soiling losses using various techniques to understand the effect of soiling on photovoltaic modules. The higher the angle of incidence (AOI), the lower will be the photovoltaic (PV) module performance. Our research group has already reported the AOI investigation for cleaned modules of five different technologies with air/glass interface. However, the modules that are installed in the field would invariably develop a soil layer with varying thickness depending on the site condition, rainfall and tilt angle. The soiled module will have the air/soil/glass interface rather than air/glass interface. This study investigates the …

Contributors
Boppana, Sravanthi, Tamizhmani, Govindasamy, Srinivasan, Devarajan, et al.
Created Date
2015

The Visceral Leishmaniasis (VL) is primarily endemic in five countries, with India and Sudan having the highest burden. The risk factors associated with VL are either unknown in some regions or vary drastically among empirical studies. Here, a dynamical model, motivated and informed by field data from the literature, is analyzed and employed to identify and quantify the impact of region dependent risks on the VL transmission dynamics. Parameter estimation procedures were developed using model-derived quantities and empirical data from multiple resources. The dynamics of VL depend on the estimates of the control reproductive number, RC, interpreted as the average …

Contributors
Barley, Kamal Kevin, Castillo-Chavez, Carlos, Mubayi, Anuj, et al.
Created Date
2016

The comparison of between- versus within-person relations addresses a central issue in psychological research regarding whether group-level relations among variables generalize to individual group members. Between- and within-person effects may differ in magnitude as well as direction, and contextual multilevel models can accommodate this difference. Contextual multilevel models have been explicated mostly for cross-sectional data, but they can also be applied to longitudinal data where level-1 effects represent within-person relations and level-2 effects represent between-person relations. With longitudinal data, estimating the contextual effect allows direct evaluation of whether between-person and within-person effects differ. Furthermore, these models, unlike single-level models, permit …

Contributors
Wurpts, Ingrid Carlson, MacKinnon, David P, West, Stephen G, et al.
Created Date
2016

Distributed Renewable energy generators are now contributing a significant amount of energy into the energy grid. Consequently, reliability adequacy of such energy generators will depend on making accurate forecasts of energy produced by them. Power outputs of Solar PV systems depend on the stochastic variation of environmental factors (solar irradiance, ambient temperature & wind speed) and random mechanical failures/repairs. Monte Carlo Simulation which is typically used to model such problems becomes too computationally intensive leading to simplifying state-space assumptions. Multi-state models for power system reliability offer a higher flexibility in providing a description of system state evolution and an accurate …

Contributors
Kadloor, Nikhil, Kuitche, Joseph, Pan, Rong, et al.
Created Date
2017

A major challenge in health-related policy and program evaluation research is attributing underlying causal relationships where complicated processes may exist in natural or quasi-experimental settings. Spatial interaction and heterogeneity between units at individual or group levels can violate both components of the Stable-Unit-Treatment-Value-Assumption (SUTVA) that are core to the counterfactual framework, making treatment effects difficult to assess. New approaches are needed in health studies to develop spatially dynamic causal modeling methods to both derive insights from data that are sensitive to spatial differences and dependencies, and also be able to rely on a more robust, dynamic technical infrastructure needed for …

Contributors
Kolak, Marynia Aniela, Anselin, Luc, Rey, Sergio, et al.
Created Date
2017

This simulation study compared the utility of various discrepancy measures within a posterior predictive model checking (PPMC) framework for detecting different types of data-model misfit in multidimensional Bayesian network (BN) models. The investigated conditions were motivated by an applied research program utilizing an operational complex performance assessment within a digital-simulation educational context grounded in theories of cognition and learning. BN models were manipulated along two factors: latent variable dependency structure and number of latent classes. Distributions of posterior predicted p-values (PPP-values) served as the primary outcome measure and were summarized in graphical presentations, by median values across replications, and by …

Contributors
Crawford, Aaron Vaughn, Levy, Roy, Green, Samuel, et al.
Created Date
2014

Photovoltaic (PV) modules are typically rated at three test conditions: STC (standard test conditions), NOCT (nominal operating cell temperature) and Low E (low irradiance). The current thesis deals with the power rating of PV modules at twenty-three test conditions as per the recent International Electrotechnical Commission (IEC) standard of IEC 61853 &ndash; 1. In the current research, an automation software tool developed by a previous researcher of ASU &ndash; PRL (ASU Photovoltaic Reliability Laboratory) is validated at various stages. Also in the current research, the power rating of PV modules for four different manufacturers is carried out according to IEC …

Contributors
Vemula, Meena Gupta, Tamizhmani, Govindasamy, Macia, Narcio F., et al.
Created Date
2012

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach. In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore and predict students’ …

Contributors
Tian, Wenbo, Hsiao, Ihan, Bazzi, Rida, et al.
Created Date
2019

Public health surveillance is a special case of the general problem where counts (or rates) of events are monitored for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods are typically limited to spatial and/or temporal covariate information and often fail to use all the information available in modern data that can be paramount in unveiling these subtle changes. Additional complexities associated with modern health …

Contributors
Davila, Saylisse, Runger, George C, Montgomery, Douglas C, et al.
Created Date
2010

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate GCGF outcomes. The purpose of this dissertation was to compare the statistical performance of four regression models (linear regression, Poisson regression, ordinal logistic regression, and beta regression) that can be used when the outcome is a GCGF variable. A simulation study was used to determine the power, type I error, …

Contributors
Coxe, Stefany Jean, Aiken, Leona S, West, Stephen G, et al.
Created Date
2012

Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and powerful computational tools for obtaining good fMRI designs. However, these results are mainly for basic experimental settings with simple statistical models. In this work, a type of modern fMRI experiments is considered, in which the design matrix of the statistical model depends not only on the selected design, but also …

Contributors
Zhou, Lin, Kao, Ming-hung, Reiser, Mark, et al.
Created Date
2014

In many classication problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small and there are many outliers. When classifying such data, example cancer vs normal patients the consequences of mis-classication are probably more important than any other data type, because the data point could be a cancer patient or the classication decision could help determine what gene might be over expressed and perhaps a cause of cancer. These mis-classications are typically higher in the presence of outlier data points. The aim of …

Contributors
Gupta, Sidharth, Kim, Seungchan, Welfert, Bruno, et al.
Created Date
2011

A simulation study was conducted to explore the robustness of general factor mean difference estimation in bifactor ordered-categorical data. In the No Differential Item Functioning (DIF) conditions, the data generation conditions varied were sample size, the number of categories per item, effect size of the general factor mean difference, and the size of specific factor loadings; in data analysis, misspecification conditions were introduced in which the generated bifactor data were fit using a unidimensional model, and/or ordered-categorical data were treated as continuous data. In the DIF conditions, the data generation conditions varied were sample size, the number of categories per …

Contributors
Liu, Yixing, Thompson, Marilyn, Levy, Roy, et al.
Created Date
2019

By the von Neumann min-max theorem, a two person zero sum game with finitely many pure strategies has a unique value for each player (summing to zero) and each player has a non-empty set of optimal mixed strategies. If the payoffs are independent, identically distributed (iid) uniform (0,1) random variables, then with probability one, both players have unique optimal mixed strategies utilizing the same number of pure strategies with positive probability (Jonasson 2004). The pure strategies with positive probability in the unique optimal mixed strategies are called saddle squares. In 1957, Goldman evaluated the probability of a saddle point (a …

Contributors
Manley, Michael, Kadell, Kevin W. J., Kao, Ming-Hung, et al.
Created Date
2011

Yield is a key process performance characteristic in the capital-intensive semiconductor fabrication process. In an industry where machines cost millions of dollars and cycle times are a number of months, predicting and optimizing yield are critical to process improvement, customer satisfaction, and financial success. Semiconductor yield modeling is essential to identifying processing issues, improving quality, and meeting customer demand in the industry. However, the complicated fabrication process, the massive amount of data collected, and the number of models available make yield modeling a complex and challenging task. This work presents modeling strategies to forecast yield using generalized linear models (GLMs) …

Contributors
Krueger, Dana Cheree, Montgomery, Douglas C., Fowler, John, et al.
Created Date
2011

This dissertation involves three problems that are all related by the use of the singular value decomposition (SVD) or generalized singular value decomposition (GSVD). The specific problems are (i) derivation of a generalized singular value expansion (GSVE), (ii) analysis of the properties of the chi-squared method for regularization parameter selection in the case of nonnormal data and (iii) formulation of a partial canonical correlation concept for continuous time stochastic processes. The finite dimensional SVD has an infinite dimensional generalization to compact operators. However, the form of the finite dimensional GSVD developed in, e.g., Van Loan does not extend directly to …

Contributors
Huang, Qing, Eubank, Randall, Renaut, Rosemary, et al.
Created Date
2012

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of focus. In supervised learning like regression, the data consists of many features and only a subset of the features may be responsible for the result. Also, the features might require special structural requirements, which introduces additional complexity for feature selection. The sparse learning package, provides a set of algorithms for …

Contributors
Thulasiram, Ramesh L., Ye, Jieping, Xue, Guoliang, et al.
Created Date
2011

Gerrymandering is a central problem for many representative democracies. Formally, gerrymandering is the manipulation of spatial boundaries to provide political advantage to a particular group (Warf, 2006). The term often refers to political district design, where the boundaries of political districts are “unnaturally” manipulated by redistricting officials to generate durable advantages for one group or party. Since free and fair elections are possibly the critical part of representative democracy, it is important for this cresting tide to have scientifically validated tools. This dissertation supports a current wave of reform by developing a general inferential technique to “localize” inferential bias measures, …

Contributors
Wolf, Levi John, Rey, Sergio J, Anselin, Luc, et al.
Created Date
2017

Large-scale cultivation of perennial bioenergy crops (e.g., miscanthus and switch- grass) offers unique opportunities to mitigate climate change through avoided fossil fuel use and associated greenhouse gas reduction. Although conversion of existing agriculturally intensive lands (e.g., maize and soy) to perennial bioenergy cropping systems has been shown to reduce near-surface temperatures, unintended consequences on natural water resources via depletion of soil moisture may offset these benefits. In the effort of the cross-fertilization across the disciplines of physics-based modeling and spatio-temporal statistics, three topics are investigated in this dissertation aiming to provide a novel quantification and robust justifications of the hydroclimate …

Contributors
Wang, Meng, Kamarianakis, Yiannis, Georgescu, Matei, et al.
Created Date
2018

The concept of distribution is one of the core ideas of probability theory and inferential statistics, if not the core idea. Many introductory statistics textbooks pay lip service to stochastic/random processes but how do students think about these processes? This study sought to explore what understandings of stochastic process students develop as they work through materials intended to support them in constructing the long-run behavior meaning for distribution. I collected data in three phases. First, I conducted a set of task-based clinical interviews that allowed me to build initial models for the students’ meanings for randomness and probability. Second, I …

Contributors
Hatfield, Neil, Thompson, Patrick, Carlson, Marilyn, et al.
Created Date
2019

This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional nonparametric classifiers form a methodological basis, which is used herein to develop a) the family of ESFuNC segment-wise curve classification algorithms and b) per-pixel ensembles based on logistic regression and fused-LASSO. The proposed methods achieve test set accuracy rates as high as 94.3%, while returning information about regions of the temperature domain …

Contributors
Buscaglia, Robert, Kamarianakis, Yiannis, Armbruster, Dieter, et al.
Created Date
2018

Tracking targets in the presence of clutter is inevitable, and presents many challenges. Additionally, rapid, drastic changes in clutter density between different environments or scenarios can make it even more difficult for tracking algorithms to adapt. A novel approach to target tracking in such dynamic clutter environments is proposed using a particle filter (PF) integrated with Interacting Multiple Models (IMMs) to compensate and adapt to the transition between different clutter densities. This model was implemented for the case of a monostatic sensor tracking a single target moving with constant velocity along a two-dimensional trajectory, which crossed between regions of drastically …

Contributors
Dutson, Karl J, Papandreou-Suppappola, Antonia, Kovvali, Narayan, et al.
Created Date
2015

The operating temperature of photovoltaic (PV) modules is affected by external factors such as irradiance, wind speed and ambient temperature as well as internal factors like material properties and design properties. These factors can make a difference in the operating temperatures between cells within a module and between modules within a plant. This is a three-part thesis. Part 1 investigates the behavior of temperature distribution of PV cells within a module through outdoor temperature monitoring under various operating conditions (Pmax, Voc and Isc) and examines deviation in the temperature coefficient values pertaining to this temperature variation. ANOVA, a statistical tool, …

Contributors
PAVGI, ASHWINI, Tamizhmani, Govindasamy, Phelan, Patrick, et al.
Created Date
2016

Parallel Monte Carlo applications require the pseudorandom numbers used on each processor to be independent in a probabilistic sense. The TestU01 software package is the standard testing suite for detecting stream dependence and other properties that make certain pseudorandom generators ineffective in parallel (as well as serial) settings. TestU01 employs two basic schemes for testing parallel generated streams. The first applies serial tests to the individual streams and then tests the resulting P-values for uniformity. The second turns all the parallel generated streams into one long vector and then applies serial tests to the resulting concatenated stream. Various forms of …

Contributors
Ismay, Chester Ivan, Eubank, Randall, Young, Dennis, et al.
Created Date
2013

The purpose of this study was to examine under which conditions "good" data characteristics can compensate for "poor" characteristics in Latent Class Analysis (LCA), as well as to set forth guidelines regarding the minimum sample size and ideal number and quality of indicators. In particular, we studied to which extent including a larger number of high quality indicators can compensate for a small sample size in LCA. The results suggest that in general, larger sample size, more indicators, higher quality of indicators, and a larger covariate effect correspond to more converged and proper replications, as well as fewer boundary estimates …

Contributors
Wurpts, Ingrid Carlson, Geiser, Christian, Aiken, Leona, et al.
Created Date
2012