Skip to main content

ASU Electronic Theses and Dissertations


This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.


Contributor
Date Range
2010 2019


The National Basketball Association (NBA) is the most popular basketball league in the world. The world-wide mighty high popularity to the league leads to large amount of interesting and challenging research problems. Among them, predicting the outcome of an upcoming NBA match between two specific teams according to their historical data is especially attractive. With rapid development of machine learning techniques, it opens the door to examine the correlation between statistical data and outcome of matches. However, existing methods typically make predictions before game starts. In-game prediction, or real-time prediction, has not yet been sufficiently studied. During a match, data …

Contributors
Lin, Rongyu, Tong, Hanghang, He, Jingrui, et al.
Created Date
2017

Predictive analytics embraces an extensive area of techniques from statistical modeling to machine learning to data mining and is applied in business intelligence, public health, disaster management and response, and many other fields. To date, visualization has been broadly used to support tasks in the predictive analytics pipeline under the underlying assumption that a human-in-the-loop can aid the analysis by integrating domain knowledge that might not be broadly captured by the system. Primary uses of visualization in the predictive analytics pipeline have focused on data cleaning, exploratory analysis, and diagnostics. More recently, numerous visual analytics systems for feature selection, incremental …

Contributors
Lu, Yafeng, Maciejewski, Ross, Cooke, Nancy, et al.
Created Date
2017

Social networking services have emerged as an important platform for large-scale information sharing and communication. With the growing popularity of social media, spamming has become rampant in the platforms. Complex network interactions and evolving content present great challenges for social spammer detection. Different from some existing well-studied platforms, distinct characteristics of newly emerged social media data present new challenges for social spammer detection. First, texts in social media are short and potentially linked with each other via user connections. Second, it is observed that abundant contextual information may play an important role in distinguishing social spammers and normal users. Third, …

Contributors
Hu, Xia, Liu, Huan, Kambhampati, Subbarao, et al.
Created Date
2015

Models using feature interactions have been applied successfully in many areas such as biomedical analysis, recommender systems. The popularity of using feature interactions mainly lies in (1) they are able to capture the nonlinearity of the data compared with linear effects and (2) they enjoy great interpretability. In this thesis, I propose a series of formulations using feature interactions for real world problems and develop efficient algorithms for solving them. Specifically, I first propose to directly solve the non-convex formulation of the weak hierarchical Lasso which imposes weak hierarchy on individual features and interactions but can only be approximately solved …

Contributors
Liu, Yashu, Ye, Jieping, Xue, Guoliang, et al.
Created Date
2018

Due to vast resources brought by social media services, social data mining has received increasing attention in recent years. The availability of sheer amounts of user-generated data presents data scientists both opportunities and challenges. Opportunities are presented with additional data sources. The abundant link information in social networks could provide another rich source in deriving implicit information for social data mining. However, the vast majority of existing studies overwhelmingly focus on positive links between users while negative links are also prevailing in real- world social networks such as distrust relations in Epinions and foe links in Slashdot. Though recent studies …

Contributors
Cheng, Kewei, Liu, Huan, Tong, Hanghang, et al.
Created Date
2017

The pervasive use of social media gives it a crucial role in helping the public perceive reliable information. Meanwhile, the openness and timeliness of social networking sites also allow for the rapid creation and dissemination of misinformation. It becomes increasingly difficult for online users to find accurate and trustworthy information. As witnessed in recent incidents of misinformation, it escalates quickly and can impact social media users with undesirable consequences and wreak havoc instantaneously. Different from some existing research in psychology and social sciences about misinformation, social media platforms pose unprecedented challenges for misinformation detection. First, intentional spreaders of misinformation will …

Contributors
Wu, Liang, Liu, Huan, Tong, Hanghang, et al.
Created Date
2019

Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this …

Contributors
Sun, Liang, Ye, Jieping, Li, Baoxin, et al.
Created Date
2011

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It is particularly desirable to share the domain knowledge (among the tasks) when there are a number of related tasks but only limited training data is available for each task. Modeling the relationship of multiple tasks is critical to the generalization performance of the MTL algorithms. In this dissertation, I propose …

Contributors
Chen, Jianhui, Ye, Jieping, Kumar, Sudhir, et al.
Created Date
2011

The popularity of social media has generated abundant large-scale social networks, which advances research on network analytics. Good representations of nodes in a network can facilitate many network mining tasks. The goal of network representation learning (network embedding) is to learn low-dimensional vector representations of social network nodes that capture certain properties of the networks. With the learned node representations, machine learning and data mining algorithms can be applied for network mining tasks such as link prediction and node classification. Because of its ability to learn good node representations, network representation learning is attracting increasing attention and various network embedding …

Contributors
Wang, Suhang, Liu, Huan, Aggarwal, Charu, et al.
Created Date
2018

The rapid growth in the high-throughput technologies last few decades makes the manual processing of the generated data to be impracticable. Even worse, the machine learning and data mining techniques seemed to be paralyzed against these massive datasets. High-dimensionality is one of the most common challenges for machine learning and data mining tasks. Feature selection aims to reduce dimensionality by selecting a small subset of the features that perform at least as good as the full feature set. Generally, the learning performance, e.g. classification accuracy, and algorithm complexity are used to measure the quality of the algorithm. Recently, the stability …

Contributors
Alelyani, Salem, Liu, Huan, Xue, Guoliang, et al.
Created Date
2013