Skip to main content

ASU Electronic Theses and Dissertations


This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.


Resource Type
  • Doctoral Dissertation
Subject
Date Range
2010 2019


Many longitudinal studies, especially in clinical trials, suffer from missing data issues. Most estimation procedures assume that the missing values are ignorable or missing at random (MAR). However, this assumption leads to unrealistic simplification and is implausible for many cases. For example, an investigator is examining the effect of treatment on depression. Subjects are scheduled with doctors on a regular basis and asked questions about recent emotional situations. Patients who are experiencing severe depression are more likely to miss an appointment and leave the data missing for that particular visit. Data that are not missing at random may produce bias …

Contributors
Zhang, Jun, Reiser, Mark, Barber, Jarrett, et al.
Created Date
2013

Value-added models (VAMs) are used by many states to assess contributions of individual teachers and schools to students' academic growth. The generalized persistence VAM, one of the most flexible in the literature, estimates the ``value added'' by individual teachers to their students' current and future test scores by employing a mixed model with a longitudinal database of test scores. There is concern, however, that missing values that are common in the longitudinal student scores can bias value-added assessments, especially when the models serve as a basis for personnel decisions -- such as promoting or dismissing teachers -- as they are …

Contributors
Karl, Andrew Thomas, Lohr, Sharon L, Yang, Yan, et al.
Created Date
2012

Complex systems are pervasive in science and engineering. Some examples include complex engineered networks such as the internet, the power grid, and transportation networks. The complexity of such systems arises not just from their size, but also from their structure, operation (including control and management), evolution over time, and that people are involved in their design and operation. Our understanding of such systems is limited because their behaviour cannot be characterized using traditional techniques of modelling and analysis. As a step in model development, statistically designed screening experiments may be used to identify the main effects and interactions most significant …

Contributors
Aldaco-Gastelum, Abraham Netzahualcoyotl, Syrotiuk, Violet R., Colbourn, Charles J., et al.
Created Date
2015

The Pearson and likelihood ratio statistics are commonly used to test goodness-of-fit for models applied to data from a multinomial distribution. When data are from a table formed by cross-classification of a large number of variables, the common statistics may have low power and inaccurate Type I error level due to sparseness in the cells of the table. The GFfit statistic can be used to examine model fit in subtables. It is proposed to assess model fit by using a new version of GFfit statistic based on orthogonal components of Pearson chi-square as a diagnostic to examine the fit on …

Contributors
Zhu, Junfei, Reiser, Mark, Stufken, John, et al.
Created Date
2017

Urban growth, from regional sprawl to global urbanization, is the most rapid, drastic, and irreversible form of human modification to the natural environment. Extensive land cover modifications during urban growth have altered the local energy balance, causing the city warmer than its surrounding rural environment, a phenomenon known as an urban heat island (UHI). How are the seasonal and diurnal surface temperatures related to the land surface characteristics, and what land cover types and/or patterns are desirable for ameliorating climate in a fast growing desert city? This dissertation scrutinizes these questions and seeks to address them using a combination of …

Contributors
Fan, Chao, Myint, Soe W, Li, Wenwen, et al.
Created Date
2016

The main objective of this research is to develop an approach to PV module lifetime prediction. In doing so, the aim is to move from empirical generalizations to a formal predictive science based on data-driven case studies of the crystalline silicon PV systems. The evaluation of PV systems aged 5 to 30 years old that results in systematic predictive capability that is absent today. The warranty period provided by the manufacturers typically range from 20 to 25 years for crystalline silicon modules. The end of lifetime (for example, the time-to-degrade by 20% from rated power) of PV modules is usually …

Contributors
Kuitche, Joseph Mathurin, Pan, Rong, TamizhMani, Govindasamy, et al.
Created Date
2014

Statistical process control (SPC) and predictive analytics have been used in industrial manufacturing and design, but up until now have not been applied to threshold data of vital sign monitoring in remote care settings. In this study of 20 elders with COPD and/or CHF, extended months of peak flow monitoring (FEV1) using telemedicine are examined to determine when an earlier or later clinical intervention may have been advised. This study demonstrated that SPC may bring less than a 2.0% increase in clinician workload while providing more robust statistically-derived thresholds than clinician-derived thresholds. Using a random K-fold model, FEV1 output was …

Contributors
Fralick, Celeste Rachelle, Muthuswamy, Jitendran, O'Shea, Terrance, et al.
Created Date
2013

The Pearson and likelihood ratio statistics are well-known in goodness-of-fit testing and are commonly used for models applied to multinomial count data. When data are from a table formed by the cross-classification of a large number of variables, these goodness-of-fit statistics may have lower power and inaccurate Type I error rate due to sparseness. Pearson's statistic can be decomposed into orthogonal components associated with the marginal distributions of observed variables, and an omnibus fit statistic can be obtained as a sum of these components. When the statistic is a sum of components for lower-order marginals, it has good performance for …

Contributors
Dassanayake, Mudiyanselage Maduranga Kasun, Reiser, Mark, Kao, Ming-Hung, et al.
Created Date
2018

Quadratic growth curves of 2nd degree polynomial are widely used in longitudinal studies. For a 2nd degree polynomial, the vertex represents the location of the curve in the XY plane. For a quadratic growth curve, we propose an approximate confidence region as well as the confidence interval for x and y-coordinates of the vertex using two methods, the gradient method and the delta method. Under some models, an indirect test on the location of the curve can be based on the intercept and slope parameters, but in other models, a direct test on the vertex is required. We present a …

Contributors
Yu, Wanchunzi, Reiser, Mark, Barber, Jarrett, et al.
Created Date
2015

This dissertation presents methods for addressing research problems that currently can only adequately be solved using Quality Reliability Engineering (QRE) approaches especially accelerated life testing (ALT) of electronic printed wiring boards with applications to avionics circuit boards. The methods presented in this research are generally applicable to circuit boards, but the data generated and their analysis is for high performance avionics. Avionics equipment typically requires 20 years expected life by aircraft equipment manufacturers and therefore ALT is the only practical way of performing life test estimates. Both thermal and vibration ALT induced failure are performed and analyzed to resolve industry …

Contributors
Juarez, Joseph Moses, Montgomery, Douglas C., Borror, Connie M., et al.
Created Date
2012

This work presents two complementary studies that propose heuristic methods to capture characteristics of data using the ensemble learning method of random forest. The first study is motivated by the problem in education of determining teacher effectiveness in student achievement. Value-added models (VAMs), constructed as linear mixed models, use students’ test scores as outcome variables and teachers’ contributions as random effects to ascribe changes in student performance to the teachers who have taught them. The VAMs teacher score is the empirical best linear unbiased predictor (EBLUP). This approach is limited by the adequacy of the assumed model specification with respect …

Contributors
Valdivia, Arturo, Eubank, Randall, Young, Dennis, et al.
Created Date
2013

Designing studies that use latent growth modeling to investigate change over time calls for optimal approaches for conducting power analysis for a priori determination of required sample size. This investigation (1) studied the impacts of variations in specified parameters, design features, and model misspecification in simulation-based power analyses and (2) compared power estimates across three common power analysis techniques: the Monte Carlo method; the Satorra-Saris method; and the method developed by MacCallum, Browne, and Cai (MBC). Choice of sample size, effect size, and slope variance parameters markedly influenced power estimates; however, level-1 error variance and number of repeated measures (3 …

Contributors
Van Vleet, Bethany L., Thompson, Marilyn S., Green, Samuel B., et al.
Created Date
2011

This dissertation proposes a new set of analytical methods for high dimensional physiological sensors. The methodologies developed in this work were motivated by problems in learning science, but also apply to numerous disciplines where high dimensional signals are present. In the education field, more data is now available from traditional sources and there is an important need for analytical methods to translate this data into improved learning. Affecting Computing which is the study of new techniques that develop systems to recognize and model human emotions is integrating different physiological signals such as electroencephalogram (EEG) and electromyogram (EMG) to detect and …

Contributors
Lujan Moreno, Gustavo A., Runger, George C, Atkinson, Robert K, et al.
Created Date
2017

In the field of infectious disease epidemiology, the assessment of model robustness outcomes plays a significant role in the identification, reformulation, and evaluation of preparedness strategies aimed at limiting the impact of catastrophic events (pandemics or the deliberate release of biological agents) or used in the management of disease prevention strategies, or employed in the identification and evaluation of control or mitigation measures. The research work in this dissertation focuses on: The comparison and assessment of the role of exponentially distributed waiting times versus the use of generalized non-exponential parametric distributed waiting times of infectious periods on the quantitative and …

Contributors
Morale Butler, Emmanuel Jesús, Castillo-Chavez, Carlos, Aparicio, Juan P, et al.
Created Date
2014

Many methodological approaches have been utilized to predict student retention and persistence over the years, yet few have utilized a Bayesian framework. It is believed this is due in part to the absence of an established process for guiding educational researchers reared in a frequentist perspective into the realms of Bayesian analysis and educational data mining. The current study aimed to address this by providing a model-building process for developing a Bayesian network (BN) that leveraged educational data mining, Bayesian analysis, and traditional iterative model-building techniques in order to predict whether community college students will stop out at the completion …

Contributors
Arcuria, Phil, Levy, Roy, Green, Samuel B, et al.
Created Date
2015

Urban scaling analysis has introduced a new scientific paradigm to the study of cities. With it, the notions of <italic>size</italic>, <italic>heterogeneity</italic> and <italic>structure</italic> have taken a leading role. These notions are assumed to be behind the causes for why cities differ from one another, sometimes wildly. However, the mechanisms by which size, heterogeneity and structure shape the general statistical patterns that describe urban economic output are still unclear. Given the rapid rate of urbanization around the globe, we need precise and formal mathematical understandings of these matters. In this context, I perform in this dissertation probabilistic, distributional and computational explorations …

Contributors
Gomez-Lievano, Andres, Lobo, José, Muneepeerakul, Rachata, et al.
Created Date
2014

Investigation of measurement invariance (MI) commonly assumes correct specification of dimensionality across multiple groups. Although research shows that violation of the dimensionality assumption can cause bias in model parameter estimation for single-group analyses, little research on this issue has been conducted for multiple-group analyses. This study explored the effects of mismatch in dimensionality between data and analysis models with multiple-group analyses at the population and sample levels. Datasets were generated using a bifactor model with different factor structures and were analyzed with bifactor and single-factor models to assess misspecification effects on assessments of MI and latent mean differences. As baseline …

Contributors
Xu, Yuning, Green, Samuel, Levy, Roy, et al.
Created Date
2018

Optimal experimental design for generalized linear models is often done using a pseudo-Bayesian approach that integrates the design criterion across a prior distribution on the parameter values. This approach ignores the lack of utility of certain models contained in the prior, and a case is demonstrated where the heavy focus on such hopeless models results in a design with poor performance and with wild swings in coverage probabilities for Wald-type confidence intervals. Design construction using a utility-based approach is shown to result in much more stable coverage probabilities in the area of greatest concern. The pseudo-Bayesian approach can be applied …

Contributors
Hassler, Edgar, Montgomery, Douglas C, Silvestrini, Rachel T, et al.
Created Date
2015

The problem of multiple object tracking seeks to jointly estimate the time-varying cardinality and trajectory of each object. There are numerous challenges that are encountered in tracking multiple objects including a time-varying number of measurements, under varying constraints, and environmental conditions. In this thesis, the proposed statistical methods integrate the use of physical-based models with Bayesian nonparametric methods to address the main challenges in a tracking problem. In particular, Bayesian nonparametric methods are exploited to efficiently and robustly infer object identity and learn time-dependent cardinality; together with Bayesian inference methods, they are also used to associate measurements to objects and …

Contributors
Moraffah, Bahman, Papandreou-Suppappola, Antonia, Bliss, Daniel W., et al.
Created Date
2019

Mixture experiments are useful when the interest is in determining how changes in the proportion of an experimental component affects the response. This research focuses on the modeling and design of mixture experiments when the response is categorical namely, binary and ordinal. Data from mixture experiments is characterized by the perfect collinearity of the experimental components, resulting in model matrices that are singular and inestimable under likelihood estimation procedures. To alleviate problems with estimation, this research proposes the reparameterization of two nonlinear models for ordinal data -- the proportional-odds model with a logistic link and the stereotype model. A study …

Contributors
Mancenido, Michelle V., Montgomery, Douglas C, Pan, Rong, et al.
Created Date
2016

It is common in the analysis of data to provide a goodness-of-fit test to assess the performance of a model. In the analysis of contingency tables, goodness-of-fit statistics are frequently employed when modeling social science, educational or psychological data where the interest is often directed at investigating the association among multi-categorical variables. Pearson's chi-squared statistic is well-known in goodness-of-fit testing, but it is sometimes considered to produce an omnibus test as it gives little guidance to the source of poor fit once the null hypothesis is rejected. However, its components can provide powerful directional tests. In this dissertation, orthogonal components …

Contributors
Milovanovic, Jelena, Young, Dennis, Reiser, Mark, et al.
Created Date
2011

Mediation analysis is used to investigate how an independent variable, X, is related to an outcome variable, Y, through a mediator variable, M (MacKinnon, 2008). If X represents a randomized intervention it is difficult to make a cause and effect inference regarding indirect effects without making no unmeasured confounding assumptions using the potential outcomes framework (Holland, 1988; MacKinnon, 2008; Robins & Greenland, 1992; VanderWeele, 2015), using longitudinal data to determine the temporal order of M and Y (MacKinnon, 2008), or both. The goals of this dissertation were to (1) define all indirect and direct effects in a three-wave longitudinal mediation …

Contributors
Valente, Matthew John, MacKinnon, David P, West, Stephen G, et al.
Created Date
2018

The living world we inhabit and observe is extraordinarily complex. From the perspective of a person analyzing data about the living world, complexity is most commonly encountered in two forms: 1) in the sheer size of the datasets that must be analyzed and the physical number of mathematical computations necessary to obtain an answer and 2) in the underlying structure of the data, which does not conform to classical normal theory statistical assumptions and includes clustering and unobserved latent constructs. Until recently, the methods and tools necessary to effectively address the complexity of biomedical data were not ordinarily available. The …

Contributors
Brown, Justin Reed, Dinu, Valentin, Johnson, William, et al.
Created Date
2012

The dawn of Internet of Things (IoT) has opened the opportunity for mainstream adoption of machine learning analytics. However, most research in machine learning has focused on discovery of new algorithms or fine-tuning the performance of existing algorithms. Little exists on the process of taking an algorithm from the lab-environment into the real-world, culminating in sustained value. Real-world applications are typically characterized by dynamic non-stationary systems with requirements around feasibility, stability and maintainability. Not much has been done to establish standards around the unique analytics demands of real-world scenarios. This research explores the problem of the why so few of …

Contributors
Shahapurkar, Som, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2016

In accelerated life tests (ALTs), complete randomization is hardly achievable because of economic and engineering constraints. Typical experimental protocols such as subsampling or random blocks in ALTs result in a grouped structure, which leads to correlated lifetime observations. In this dissertation, generalized linear mixed model (GLMM) approach is proposed to analyze ALT data and find the optimal ALT design with the consideration of heterogeneous group effects. Two types of ALTs are demonstrated for data analysis. First, constant-stress ALT (CSALT) data with Weibull failure time distribution is modeled by GLMM. The marginal likelihood of observations is approximated by the quadrature rule; …

Contributors
Seo, Kangwon, Pan, Rong, Montgomery, Douglas C, et al.
Created Date
2017

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75, level 3 (L3) sample size ranged from 30 to 150, intraclass correlation (ICC) ranging from 0.10 to 0.50, model complexity ranging from one predictor to three predictors), this study intends to provide general guidelines about adequate sample sizes at three levels under varying ICC conditions for a viable three level HLM analysis (e.g., reasonably unbiased and accurate parameter estimates). In this study, the data generating parameters for the were obtained …

Contributors
Yel, Nedim, Levy, Roy, Elliott, Stephen N, et al.
Created Date
2016

Smoking remains the leading cause of preventable death in the United States, and early initiation is associated with greater difficulty quitting. Among adolescent smokers, those with attention-deficit hyperactivity disorder (ADHD), characterized by difficulties associated with impulsivity, hyperactivity, and inattention, smoke at nearly twice the rate of their peers. Although cigarette smoking is highly addictive, nicotine is a relatively weak primary reinforcer, spurring research on other potential targets that may maintain smoking, including the potential benefits of nicotine on attention, inhibition, and reinforcer efficacy. The present study employs the most prevalent rodent model of ADHD, the spontaneously hypertensive rat (SHR) and …

Contributors
Mazur, Gabriel Joseph, Sanabria, Federico, Killeen, Peter R, et al.
Created Date
2014

Electricity infrastructure vulnerabilities were assessed for future heat waves due to climate change. Critical processes and component relationships were identified and characterized with consideration for the terminal event of service outages, including cascading failures in transmission-level components that can result in blackouts. The most critical dependency identified was the increase in peak electricity demand with higher air temperatures. Historical and future air temperatures were characterized within and across Los Angeles County, California (LAC) and Maricopa County (Phoenix), Arizona. LAC was identified as more vulnerable to heat waves than Phoenix due to a wider distribution of historical temperatures. Two approaches were …

Contributors
Burillo, Daniel, Chester, Mikhail V, Ruddell, Benjamin, et al.
Created Date
2018

As the world embraces a sustainable energy future, alternative energy resources, such as wind power, are increasingly being seen as an integral part of the future electric energy grid. Ultimately, integrating such a dynamic and variable mix of generation requires a better understanding of renewable generation output, in addition to power grid systems that improve power system operational performance in the presence of anticipated events such as wind power ramps. Because of the stochastic, uncontrollable nature of renewable resources, a thorough and accurate characterization of wind activity is necessary to maintain grid stability and reliability. Wind power ramps from an …

Contributors
Ganger, David Wu, Vittal, Vijay, Zhang, Junshan, et al.
Created Date
2016

In two independent and thematically connected chapters, I investigate consumers' willingness to pay a price premium in response to product development that entails prosocial attributes (PATs), those that allude to the reduction of negative externalities to benefit society, and to an innovative participatory pricing design called 'Pay-What-You-Want' (PWYW) pricing, a mechanism that relinquishes the determination of payments in exchange for private goods to the consumers themselves partly relying on their prosocial preferences to drive positive payments. First, I propose a novel statistical approach built on the choice based contingent valuation technique to estimate incremental willingness to pay (IWTP) for PATs …

Contributors
Christopher, Ranjit M., Wiles, Michael, Ketcham, Jonathan, et al.
Created Date
2016

In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two variance parameters are estimated using generalized method of moments, negating the need for a distributional assumption. The mean-variance relation estimates are applied to clustered data and implemented in an adjusted generalized quasi-likelihood approach through an adjustment to the covariance matrix. In the presence of significant correlation in hierarchical structured data, …

Contributors
Irimata, Katherine, Wilson, Jeffrey R, Kamarianakis, Ioannis, et al.
Created Date
2018

Estimating cointegrating relationships requires specific techniques. Canonical correlations are used to determine the rank and space of the cointegrating matrix. The vectors used to transform the data into canonical variables have an eigenvector representation, and the associated canonical correlations have an eigenvalue representation. The number of cointegrating relations is chosen based upon a theoretical difference in the convergence rates of the eignevalues. The number of cointegrating relations is consistently estimated using a threshold function which places a lower bound on the eigenvalues associated with cointegrating relations and an upper bound on the eigenvalues on the eigenvalues not associated with cointegrating …

Contributors
Nowak, Adam Daniel, Ahn, Seung C, Liu, Crocker, et al.
Created Date
2012

Understanding how adherence affects outcomes is crucial when developing and assigning interventions. However, interventions are often evaluated by conducting randomized experiments and estimating intent-to-treat effects, which ignore actual treatment received. Dose-response effects can supplement intent-to-treat effects when participants are offered the full dose but many only receive a partial dose due to nonadherence. Using these data, we can estimate the magnitude of the treatment effect at different levels of adherence, which serve as a proxy for different levels of treatment. In this dissertation, I conducted Monte Carlo simulations to evaluate when linear dose-response effects can be accurately and precisely estimated …

Contributors
Mazza, Gina Lynn, Grimm, Kevin J, West, Stephen G, et al.
Created Date
2018

Functional brain imaging experiments are widely conducted in many fields for study- ing the underlying brain activity in response to mental stimuli. For such experiments, it is crucial to select a good sequence of mental stimuli that allow researchers to collect informative data for making precise and valid statistical inferences at minimum cost. In contrast to most existing studies, the aim of this study is to obtain optimal designs for brain mapping technology with an ultra-high temporal resolution with respect to some common statistical optimality criteria. The first topic of this work is on finding optimal designs when the primary …

Contributors
Alghamdi, Reem, Kao, Ming-Hung, Fricks, John, et al.
Created Date
2019

In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing exact designs for GLMs, that are robust against parameter, link and model uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we …

Contributors
Temkit, M'Hamed, Kao, Jason, Reiser, Mark, et al.
Created Date
2014

Statistics is taught at every level of education, yet teachers often have to assume their students have no knowledge of statistics and start from scratch each time they set out to teach statistics. The motivation for this experimental study comes from interest in exploring educational applications of augmented reality (AR) delivered via mobile technology that could potentially provide rich, contextualized learning for understanding concepts related to statistics education. This study examined the effects of AR experiences for learning basic statistical concepts. Using a 3 x 2 research design, this study compared learning gains of 252 undergraduate and graduate students from …

Contributors
Conley, Quincy, Atkinson, Robert K, Nguyen, Frank, et al.
Created Date
2013

Network analysis is a key conceptual orientation and analytical tool in the social sciences that emphasizes the embeddedness of individual behavior within a larger web of social relations. The network approach is used to better understand the cause and consequence of social interactions which cannot be treated as independent. The relational nature of network data and models, however, amplify the methodological concerns associated with inaccurate or missing data. This dissertation addresses such concerns via three projects. As a motivating substantive example, Project 1 examines factors associated with the selection of interaction partners by students at a large urban high school …

Contributors
Bates, Jordan Taylor, Maroulis, Spiro J, Kang, Yun, et al.
Created Date
2019

Mostly, manufacturing tolerance charts are used these days for manufacturing tolerance transfer but these have the limitation of being one dimensional only. Some research has been undertaken for the three dimensional geometric tolerances but it is too theoretical and yet to be ready for operator level usage. In this research, a new three dimensional model for tolerance transfer in manufacturing process planning is presented that is user friendly in the sense that it is built upon the Coordinate Measuring Machine (CMM) readings that are readily available in any decent manufacturing facility. This model can take care of datum reference change …

Contributors
Khan, M Nadeem Shafi, Phelan, Patrick E, Montgomery, Douglas, et al.
Created Date
2011

Information divergence functions, such as the Kullback-Leibler divergence or the Hellinger distance, play a critical role in statistical signal processing and information theory; however estimating them can be challenge. Most often, parametric assumptions are made about the two distributions to estimate the divergence of interest. In cases where no parametric model fits the data, non-parametric density estimation is used. In statistical signal processing applications, Gaussianity is usually assumed since closed-form expressions for common divergence measures have been derived for this family of distributions. Parametric assumptions are preferred when it is known that the data follows the model, however this is …

Contributors
Wisler, Alan, Berisha, Visar, Spanias, Andreas, et al.
Created Date
2017

Technological advances have enabled the generation and collection of various data from complex systems, thus, creating ample opportunity to integrate knowledge in many decision making applications. This dissertation introduces holistic learning as the integration of a comprehensive set of relationships that are used towards the learning objective. The holistic view of the problem allows for richer learning from data and, thereby, improves decision making. The first topic of this dissertation is the prediction of several target attributes using a common set of predictor attributes. In a holistic learning approach, the relationships between target attributes are embedded into the learning algorithm …

Contributors
Azarnoush, Bahareh, Runger, George C, Bekki, Jennifer, et al.
Created Date
2014

The use of bias indicators in psychological measurement has been contentious, with some researchers questioning whether they actually suppress or moderate the ability of substantive psychological indictors to discriminate (McGrath, Mitchell, Kim, & Hough, 2010). Bias indicators on the MMPI-2-RF (F-r, Fs, FBS-r, K-r, and L-r) were tested for suppression or moderation of the ability of the RC1 and NUC scales to discriminate between Epileptic Seizures (ES) and Non-epileptic Seizures (NES, a conversion disorder that is often misdiagnosed as ES). RC1 and NUC had previously been found to be the best scales on the MMPI-2-RF to differentiate between ES and …

Contributors
Wershba, Rebecca Eve, Lanyon, Richard I, Barrera, Manuel, et al.
Created Date
2013

In the study of regional economic growth and convergence, the distribution dynamics approach which interrogates the evolution of the cross-sectional distribution as a whole and is concerned with both the external and internal dynamics of the distribution has received wide usage. However, many methodological issues remain to be resolved before valid inferences and conclusions can be drawn from empirical research. Among them, spatial effects including spatial heterogeneity and spatial dependence invalidate the assumption of independent and identical distributions underlying the conventional maximum likelihood techniques while the availability of small samples in regional settings questions the usage of the asymptotic properties. …

Contributors
KANG, WEI, Rey, Sergio, Fotheringham, Stewart, et al.
Created Date
2018

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied …

Contributors
Koh, Derek, Runger, George, Wu, Tong, et al.
Created Date
2013

Generalized Linear Models (GLMs) are widely used for modeling responses with non-normal error distributions. When the values of the covariates in such models are controllable, finding an optimal (or at least efficient) design could greatly facilitate the work of collecting and analyzing data. In fact, many theoretical results are obtained on a case-by-case basis, while in other situations, researchers also rely heavily on computational tools for design selection. Three topics are investigated in this dissertation with each one focusing on one type of GLMs. Topic I considers GLMs with factorial effects and one continuous covariate. Factors can have interactions among …

Contributors
Wang, Zhongshen, Stufken, John, Kamarianakis, Ioannis, et al.
Created Date
2018

The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields. The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data …

Contributors
Lin, Sangdi, Runger, George C, Kocher, Jean-Pierre A, et al.
Created Date
2018

Predicting resistant prostate cancer is critical for lowering medical costs and improving the quality of life of advanced prostate cancer patients. I formulate, compare, and analyze two mathematical models that aim to forecast future levels of prostate-specific antigen (PSA). I accomplish these tasks by employing clinical data of locally advanced prostate cancer patients undergoing androgen deprivation therapy (ADT). I demonstrate that the inverse problem of parameter estimation might be too complicated and simply relying on data fitting can give incorrect conclusions, since there is a large error in parameter values estimated and parameters might be unidentifiable. I provide confidence intervals …

Contributors
Baez, Javier, Kuang, Yang, Kostelich, Eric, et al.
Created Date
2017

Time-to-event analysis or equivalently, survival analysis deals with two variables simultaneously: when (time information) an event occurs and whether an event occurrence is observed or not during the observation period (censoring information). In behavioral and social sciences, the event of interest usually does not lead to a terminal state such as death. Other outcomes after the event can be collected and thus, the survival variable can be considered as a predictor as well as an outcome in a study. One example of a case where the survival variable serves as a predictor as well as an outcome is a survival-mediator …

Contributors
Kim, Han Joe, MacKinnon, David P., Tein, Jenn-Yun, et al.
Created Date
2017

In mixture-process variable experiments, it is common that the number of runs is greater than in mixture-only or process-variable experiments. These experiments have to estimate the parameters from the mixture components, process variables, and interactions of both variables. In some of these experiments there are variables that are hard to change or cannot be controlled under normal operating conditions. These situations often prohibit a complete randomization for the experimental runs due to practical and economical considerations. Furthermore, the process variables can be categorized into two types: variables that are controllable and directly affect the response, and variables that are uncontrollable …

Contributors
Cho, Tae-Yeon, Montgomery, Douglas C, Borror, Connie M, et al.
Created Date
2010

Although models for describing longitudinal data have become increasingly sophisticated, the criticism of even foundational growth curve models remains challenging. The challenge arises from the need to disentangle data-model misfit at multiple and interrelated levels of analysis. Using posterior predictive model checking (PPMC)—a popular Bayesian framework for model criticism—the performance of several discrepancy functions was investigated in a Monte Carlo simulation study. The discrepancy functions of interest included two types of conditional concordance correlation (CCC) functions, two types of R2 functions, two types of standardized generalized dimensionality discrepancy (SGDDM) functions, the likelihood ratio (LR), and the likelihood ratio difference test …

Contributors
Fay, Derek M., Levy, Roy, Thompson, Marilyn, et al.
Created Date
2015

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of the relevant patterns This dissertation proposes TS representations and methods for supervised TS analysis. The approaches combine new representations that handle translations and dilations of patterns with bag-of-features strategies and tree-based ensemble learning. This provides flexibility in handling time-warped patterns in a computationally efficient way. The ensemble learners provide a …

Contributors
Baydogan, Mustafa Gokce, Runger, George C, Atkinson, Robert, et al.
Created Date
2012