Skip to main content

ASU Electronic Theses and Dissertations


This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.


Contributor
Date Range
2010 2020


Computer science education is an increasingly vital area of study with various challenges that increase the difficulty level for new students resulting in higher attrition rates. As part of an effort to resolve this issue, a new visual programming language environment was developed for this research, the Visual IoT and Robotics Programming Language Environment (VIPLE). VIPLE is based on computational thinking and flowchart, which reduces the needs of memorization of detailed syntax in text-based programming languages. VIPLE has been used at Arizona State University (ASU) in multiple years and sections of FSE100 as well as in universities worldwide. Another major …

Contributors
De Luca, Gennaro, Chen, Yinong, Liu, Huan, et al.
Created Date
2020

The pervasive use of social media gives it a crucial role in helping the public perceive reliable information. Meanwhile, the openness and timeliness of social networking sites also allow for the rapid creation and dissemination of misinformation. It becomes increasingly difficult for online users to find accurate and trustworthy information. As witnessed in recent incidents of misinformation, it escalates quickly and can impact social media users with undesirable consequences and wreak havoc instantaneously. Different from some existing research in psychology and social sciences about misinformation, social media platforms pose unprecedented challenges for misinformation detection. First, intentional spreaders of misinformation will …

Contributors
Wu, Liang, Liu, Huan, Tong, Hanghang, et al.
Created Date
2019

Social media has become the norm of everyone for communication. The usage of social media has increased exponentially in the last decade. The myriads of Social media services such as Facebook, Twitter, Snapchat, and Instagram etc allow people to connect with their friends, and followers freely. The attackers who try to take advantage of this situation has also increased at an exponential rate. Every social media service has its own recommender systems and user profiling algorithms. These algorithms use users current information to make different recommendations. Often the data that is formed from social media services is Linked data as …

Contributors
Magham, Venkatesh, Liu, Huan, Wu, Liang, et al.
Created Date
2019

The rapid advancements of technology have greatly extended the ubiquitous nature of smartphones acting as a gateway to numerous social media applications. This brings an immense convenience to the users of these applications wishing to stay connected to other individuals through sharing their statuses, posting their opinions, experiences, suggestions, etc on online social networks (OSNs). Exploring and analyzing this data has a great potential to enable deep and fine-grained insights into the behavior, emotions, and language of individuals in a society. This proposed dissertation focuses on utilizing these online social footprints to research two main threads – 1) Analysis: to …

Contributors
Manikonda, Lydia, Kambhampati, Subbarao, Liu, Huan, et al.
Created Date
2019

Attributes - that delineating the properties of data, and connections - that describing the dependencies of data, are two essential components to characterize most real-world phenomena. The synergy between these two principal elements renders a unique data representation - the attributed networks. In many cases, people are inundated with vast amounts of data that can be structured into attributed networks, and their use has been attractive to researchers and practitioners in different disciplines. For example, in social media, users interact with each other and also post personalized content; in scientific collaboration, researchers cooperate and are distinct from peers by their …

Contributors
Li, Jundong, Liu, Huan, Faloutsos, Christos, et al.
Created Date
2019

Social media has become a primary platform for real-time information sharing among users. News on social media spreads faster than traditional outlets and millions of users turn to this platform to receive the latest updates on major events especially disasters. Social media bridges the gap between the people who are affected by disasters, volunteers who offer contributions, and first responders. On the other hand, social media is a fertile ground for malicious users who purposefully disturb the relief processes facilitated on social media. These malicious users take advantage of social bots to overrun social media posts with fake images, rumors, …

Contributors
Hossein Nazer, Tahora, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2019

Live streaming has risen to significant popularity in the recent past and largely this live streaming is a feature of existing social networks like Facebook, Instagram, and Snapchat. However, there does exist at least one social network entirely devoted to live streaming, and specifically the live streaming of video games, Twitch. This social network is unique for a number of reasons, not least because of its hyper-focus on live content and this uniqueness has challenges for social media researchers. Despite this uniqueness, almost no scientific work has been performed on this public social network. Thus, it is unclear what user …

Contributors
Jones, Isaac, Liu, Huan, Maciejewski, Ross, et al.
Created Date
2019

Social media bot detection has been a signature challenge in recent years in online social networks. Many scholars agree that the bot detection problem has become an "arms race" between malicious actors, who seek to create bots to influence opinion on these networks, and the social media platforms to remove these accounts. Despite this acknowledged issue, bot presence continues to remain on social media networks. So, it has now become necessary to monitor different bots over time to identify changes in their activities or domain. Since monitoring individual accounts is not feasible, because the bots may get suspended or deleted, …

Contributors
Davis, Matthew William, Liu, Huan, Xue, Guoliang, et al.
Created Date
2019

Graph is a ubiquitous data structure, which appears in a broad range of real-world scenarios. Accordingly, there has been a surge of research to represent and learn from graphs in order to accomplish various machine learning and graph analysis tasks. However, most of these efforts only utilize the graph structure while nodes in real-world graphs usually come with a rich set of attributes. Typical examples of such nodes and their attributes are users and their profiles in social networks, scientific articles and their content in citation networks, protein molecules and their gene sets in biological networks as well as web …

Contributors
Salehi, Amin, Davulcu, Hasan, Liu, Huan, et al.
Created Date
2019

Social media is a medium that contains rich information which has been shared by many users every second every day. This information can be utilized for various outcomes such as understanding user behaviors, learning the effect of social media on a community, and developing a decision-making system based on the information available. With the growing popularity of social networking sites, people can freely express their opinions and feelings which results in a tremendous amount of user-generated data. The rich amount of social media data has opened the path for researchers to study and understand the users’ behaviors and mental health …

Contributors
Kamarudin, Nur Shazwani, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2019

Millions of users leave digital traces of their political engagements on social media platforms every day. Users form networks of interactions, produce textual content, like and share each others' content. This creates an invaluable opportunity to better understand the political engagements of internet users. In this proposal, I present three algorithmic solutions to three facets of online political networks; namely, detection of communities, antagonisms and the impact of certain types of accounts on political polarization. First, I develop a multi-view community detection algorithm to find politically pure communities. I find that word usage among other content types (i.e. hashtags, URLs) …

Contributors
Ozer, Mert, Davulcu, Hasan, Liu, Huan, et al.
Created Date
2019

The popularity of social media has generated abundant large-scale social networks, which advances research on network analytics. Good representations of nodes in a network can facilitate many network mining tasks. The goal of network representation learning (network embedding) is to learn low-dimensional vector representations of social network nodes that capture certain properties of the networks. With the learned node representations, machine learning and data mining algorithms can be applied for network mining tasks such as link prediction and node classification. Because of its ability to learn good node representations, network representation learning is attracting increasing attention and various network embedding …

Contributors
Wang, Suhang, Liu, Huan, Aggarwal, Charu, et al.
Created Date
2018

Social media refers computer-based technology that allows the sharing of information and building the virtual networks and communities. With the development of internet based services and applications, user can engage with social media via computer and smart mobile devices. In recent years, social media has taken the form of different activities such as social network, business network, text sharing, photo sharing, blogging, etc. With the increasing popularity of social media, it has accumulated a large amount of data which enables understanding the human behavior possible. Compared with traditional survey based methods, the analysis of social media provides us a golden …

Contributors
Wang, Yilin, Li, Baoxin, Liu, Huan, et al.
Created Date
2018

Teams are increasingly indispensable to achievements in any organizations. Despite the organizations' substantial dependency on teams, fundamental knowledge about the conduct of team-enabled operations is lacking, especially at the {\it social, cognitive} and {\it information} level in relation to team performance and network dynamics. The goal of this dissertation is to create new instruments to {\it predict}, {\it optimize} and {\it explain} teams' performance in the context of composite networks (i.e., social-cognitive-information networks). Understanding the dynamic mechanisms that drive the success of high-performing teams can provide the key insights into building the best teams and hence lift the productivity and …

Contributors
Li, Liangyue, Tong, Hanghang, Baral, Chitta, et al.
Created Date
2018

Computer vision technology automatically extracts high level, meaningful information from visual data such as images or videos, and the object recognition and detection algorithms are essential in most computer vision applications. In this dissertation, we focus on developing algorithms used for real life computer vision applications, presenting innovative algorithms for object segmentation and feature extraction for objects and actions recognition in video data, and sparse feature selection algorithms for medical image analysis, as well as automated feature extraction using convolutional neural network for blood cancer grading. To detect and classify objects in video, the objects have to be separated from …

Contributors
Cao, Jun, Li, Baoxin, Liu, Huan, et al.
Created Date
2018

Models using feature interactions have been applied successfully in many areas such as biomedical analysis, recommender systems. The popularity of using feature interactions mainly lies in (1) they are able to capture the nonlinearity of the data compared with linear effects and (2) they enjoy great interpretability. In this thesis, I propose a series of formulations using feature interactions for real world problems and develop efficient algorithms for solving them. Specifically, I first propose to directly solve the non-convex formulation of the weak hierarchical Lasso which imposes weak hierarchy on individual features and interactions but can only be approximately solved …

Contributors
Liu, Yashu, Ye, Jieping, Xue, Guoliang, et al.
Created Date
2018

Predictive analytics embraces an extensive area of techniques from statistical modeling to machine learning to data mining and is applied in business intelligence, public health, disaster management and response, and many other fields. To date, visualization has been broadly used to support tasks in the predictive analytics pipeline under the underlying assumption that a human-in-the-loop can aid the analysis by integrating domain knowledge that might not be broadly captured by the system. Primary uses of visualization in the predictive analytics pipeline have focused on data cleaning, exploratory analysis, and diagnostics. More recently, numerous visual analytics systems for feature selection, incremental …

Contributors
Lu, Yafeng, Maciejewski, Ross, Cooke, Nancy, et al.
Created Date
2017

The National Basketball Association (NBA) is the most popular basketball league in the world. The world-wide mighty high popularity to the league leads to large amount of interesting and challenging research problems. Among them, predicting the outcome of an upcoming NBA match between two specific teams according to their historical data is especially attractive. With rapid development of machine learning techniques, it opens the door to examine the correlation between statistical data and outcome of matches. However, existing methods typically make predictions before game starts. In-game prediction, or real-time prediction, has not yet been sufficiently studied. During a match, data …

Contributors
Lin, Rongyu, Tong, Hanghang, He, Jingrui, et al.
Created Date
2017

Cyberbullying is a phenomenon which negatively affects individuals. Victims of the cyberbullying suffer from a range of mental issues, ranging from depression to low self-esteem. Due to the advent of the social media platforms, cyberbullying is becoming more and more prevalent. Traditional mechanisms to fight against cyberbullying include use of standards and guidelines, human moderators, use of blacklists based on profane words, and regular expressions to manually detect cyberbullying. However, these mechanisms fall short in social media and do not scale well. Users in social media use intentional evasive expressions like, obfuscation of abusive words, which necessitates the development of …

Contributors
Dani, Harsh, Liu, Huan, Tong, Hanghang, et al.
Created Date
2017

Due to vast resources brought by social media services, social data mining has received increasing attention in recent years. The availability of sheer amounts of user-generated data presents data scientists both opportunities and challenges. Opportunities are presented with additional data sources. The abundant link information in social networks could provide another rich source in deriving implicit information for social data mining. However, the vast majority of existing studies overwhelmingly focus on positive links between users while negative links are also prevailing in real- world social networks such as distrust relations in Epinions and foe links in Slashdot. Though recent studies …

Contributors
Cheng, Kewei, Liu, Huan, Tong, Hanghang, et al.
Created Date
2017

Phishing is a form of online fraud where a spoofed website tries to gain access to user's sensitive information by tricking the user into believing that it is a benign website. There are several solutions to detect phishing attacks such as educating users, using blacklists or extracting phishing characteristics found to exist in phishing attacks. In this thesis, we analyze approaches that extract features from phishing websites and train classification models with extracted feature set to classify phishing websites. We create an exhaustive list of all features used in these approaches and categorize them into 6 broader categories and 33 …

Contributors
Namasivayam, Bhuvana Lalitha, Bazzi, Rida, Zhao, Ziming, et al.
Created Date
2017

Exabytes of data are created online every day. This deluge of data is no more apparent than it is on social media. Naturally, finding ways to leverage this unprecedented source of human information is an active area of research. Social media platforms have become laboratories for conducting experiments about people at scales thought unimaginable only a few years ago. Researchers and practitioners use social media to extract actionable patterns such as where aid should be distributed in a crisis. However, the validity of these patterns relies on having a representative dataset. As this dissertation shows, the data collected from social …

Contributors
Morstatter, Fred, Liu, Huan, Kambhampati, Subbarao, et al.
Created Date
2017

Online social media is popular due to its real-time nature, extensive connectivity and a large user base. This motivates users to employ social media for seeking information by reaching out to their large number of social connections. Information seeking can manifest in the form of requests for personal and time-critical information or gathering perspectives on important issues. Social media platforms are not designed for resource seeking and experience large volumes of messages, leading to requests not being fulfilled satisfactorily. Designing frameworks to facilitate efficient information seeking in social media will help users to obtain appropriate assistance for their needs and …

Contributors
Ranganath, Suhas, Liu, Huan, Lai, Ying-Cheng, et al.
Created Date
2017

While techniques for reading DNA in some capacity has been possible for decades, the ability to accurately edit genomes at scale has remained elusive. Novel techniques have been introduced recently to aid in the writing of DNA sequences. While writing DNA is more accessible, it still remains expensive, justifying the increased interest in in silico predictions of cell behavior. In order to accurately predict the behavior of cells it is necessary to extensively model the cell environment, including gene-to-gene interactions as completely as possible. Significant algorithmic advances have been made for identifying these interactions, but despite these improvements current techniques …

Contributors
Faucon, Philippe Christophe, Liu, Huan, Wang, Xiao, et al.
Created Date
2017

Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with …

Contributors
Yang, Jian, Gonzalez, Graciela, Davulcu, Hasan, et al.
Created Date
2017

The rapid growth of social media in recent years provides a large amount of user-generated visual objects, e.g., images and videos. Advanced semantic understanding approaches on such visual objects are desired to better serve applications such as human-machine interaction, image retrieval, etc. Semantic visual attributes have been proposed and utilized in multiple visual computing tasks to bridge the so-called "semantic gap" between extractable low-level feature representations and high-level semantic understanding of the visual objects. Despite years of research, there are still some unsolved problems on semantic attribute learning. First, real-world applications usually involve hundreds of attributes which requires great effort …

Contributors
Chen, Lin, Li, Baoxin, Turaga, Pavan, et al.
Created Date
2016

Traditionally, visualization is one of the most important and commonly used methods of generating insight into large scale data. Particularly for spatiotemporal data, the translation of such data into a visual form allows users to quickly see patterns, explore summaries and relate domain knowledge about underlying geographical phenomena that would not be apparent in tabular form. However, several critical challenges arise when visualizing and exploring these large spatiotemporal datasets. While, the underlying geographical component of the data lends itself well to univariate visualization in the form of traditional cartographic representations (e.g., choropleth, isopleth, dasymetric maps), as the data becomes multivariate, …

Contributors
Zhang, Yifan, Maciejewski, Ross, Mack, Elizabeth, et al.
Created Date
2016

Identifying chemical compounds that inhibit bacterial infection has recently gained a considerable amount of attention given the increased number of highly resistant bacteria and the serious health threat it poses around the world. With the development of automated microscopy and image analysis systems, the process of identifying novel therapeutic drugs can generate an immense amount of data - easily reaching terabytes worth of information. Despite increasing the vast amount of data that is currently generated, traditional analytical methods have not increased the overall success rate of identifying active chemical compounds that eventually become novel therapeutic drugs. Moreover, multispectral imaging has …

Contributors
Trevino, Robert, Liu, Huan, Lamkin, Thomas J, et al.
Created Date
2016

Keyword search provides a simple and user-friendly mechanism for information search, and has become increasingly popular for accessing structured or semi-structured data. However, there are two open issues of keyword search on semi/structured data which are not well addressed by existing work yet. First, while an increasing amount of investigation has been done in this important area, most existing work concentrates on efficiency instead of search quality and may fail to deliver high quality results from semantic perspectives. Majority of the existing work generates minimal sub-graph results that are oblivious to the entity and relationship semantics embedded in the data …

Contributors
Shan, Yi, Chen, Yi, Bansal, Srividya, et al.
Created Date
2016

The dawn of Internet of Things (IoT) has opened the opportunity for mainstream adoption of machine learning analytics. However, most research in machine learning has focused on discovery of new algorithms or fine-tuning the performance of existing algorithms. Little exists on the process of taking an algorithm from the lab-environment into the real-world, culminating in sustained value. Real-world applications are typically characterized by dynamic non-stationary systems with requirements around feasibility, stability and maintainability. Not much has been done to establish standards around the unique analytics demands of real-world scenarios. This research explores the problem of the why so few of …

Contributors
Shahapurkar, Som, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2016

Online health forums provide a convenient channel for patients, caregivers, and medical professionals to share their experience, support and encourage each other, and form health communities. The fast growing content in health forums provides a large repository for people to seek valuable information. A forum user can issue a keyword query to search health forums regarding to some specific questions, e.g., what treatments are effective for a disease symptom? A medical researcher can discover medical knowledge in a timely and large-scale fashion by automatically aggregating the latest evidences emerging in health forums. This dissertation studies how to effectively discover information …

Contributors
Liu, Yunzhong, Chen, Yi, Liu, Huan, et al.
Created Date
2016

Internet and social media devices created a new public space for debate on political and social topics (Papacharissi 2002; Himelboim 2010). Hotly debated issues span all spheres of human activity; from liberal vs. conservative politics, to radical vs. counter-radical religious debate, to climate change debate in scientific community, to globalization debate in economics, and to nuclear disarmament debate in security. Many prominent ’camps’ have emerged within Internet debate rhetoric and practice (Dahlberg, n.d.). In this research I utilized feature extraction and model fitting techniques to process the rhetoric found in the web sites of 23 Indonesian Islamic religious organizations, later …

Contributors
Tikves, Sukru, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2016

The purpose of information source detection problem (or called rumor source detection) is to identify the source of information diffusion in networks based on available observations like the states of the nodes and the timestamps at which nodes adopted the information (or called infected). The solution of the problem can be used to answer a wide range of important questions in epidemiology, computer network security, etc. This dissertation studies the fundamental theory and the design of efficient and robust algorithms for the information source detection problem. For tree networks, the maximum a posterior (MAP) estimator of the information source is …

Contributors
Zhu, Kai, Ying, Lei, Lai, Ying-Cheng, et al.
Created Date
2015

Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution, but we may have plenty of labeled data from one or multiple related sources with different distributions. Due to its capability of migrating knowledge from related domains, transfer learning has shown to be effective for cross-domain learning problems. In this dissertation, I carry out research along this direction with a particular focus on designing efficient and effective algorithms for BioImaging and Bilingual applications. Specifically, I propose deep …

Contributors
Sun, Qian, Ye, Jieping, Ye, Jieping, et al.
Created Date
2015

A myriad of social media services are emerging in recent years that allow people to communicate and express themselves conveniently and easily. The pervasive use of social media generates massive data at an unprecedented rate. It becomes increasingly difficult for online users to find relevant information or, in other words, exacerbates the information overload problem. Meanwhile, users in social media can be both passive content consumers and active content producers, causing the quality of user-generated content can vary dramatically from excellence to abuse or spam, which results in a problem of information credibility. Trust, providing evidence about with whom users …

Contributors
Tang, Jiliang, Liu, Huan, Xue, Guoliang, et al.
Created Date
2015

A community in a social network can be viewed as a structure formed by individuals who share similar interests. Not all communities are explicit; some may be hidden in a large network. Therefore, discovering these hidden communities becomes an interesting problem. Researchers from a number of fields have developed algorithms to tackle this problem. Besides the common feature above, communities within a social network have two unique characteristics: communities are mostly small and overlapping. Unfortunately, many traditional algorithms have difficulty recognizing these small communities (often called the resolution limit problem) as well as overlapping communities. In this work, two enhanced …

Contributors
Wang, Ran, Liu, Huan, Sen, Arunabha, et al.
Created Date
2015

US Senate is the venue of political debates where the federal bills are formed and voted. Senators show their support/opposition along the bills with their votes. This information makes it possible to extract the polarity of the senators. Similarly, blogosphere plays an increasingly important role as a forum for public debate. Authors display sentiment toward issues, organizations or people using a natural language. In this research, given a mixed set of senators/blogs debating on a set of political issues from opposing camps, I use signed bipartite graphs for modeling debates, and I propose an algorithm for partitioning both the opinion …

Contributors
Gokalp, Sedat, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2015

Social networking services have emerged as an important platform for large-scale information sharing and communication. With the growing popularity of social media, spamming has become rampant in the platforms. Complex network interactions and evolving content present great challenges for social spammer detection. Different from some existing well-studied platforms, distinct characteristics of newly emerged social media data present new challenges for social spammer detection. First, texts in social media are short and potentially linked with each other via user connections. Second, it is observed that abundant contextual information may play an important role in distinguishing social spammers and normal users. Third, …

Contributors
Hu, Xia, Liu, Huan, Kambhampati, Subbarao, et al.
Created Date
2015

Crises or large-scale emergencies such as earthquakes and hurricanes cause massive damage to lives and property. Crisis response is an essential task to mitigate the impact of a crisis. An effective response to a crisis necessitates information gathering and analysis. Traditionally, this process has been restricted to the information collected by first responders on the ground in the affected region or by official agencies such as local governments involved in the response. However, the ubiquity of mobile devices has empowered people to publish information during a crisis through social media, such as the damage reports from a hurricane. Social media …

Contributors
Kumar, Shamanth, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2015

Users often join an online social networking (OSN) site, like Facebook, to remain social, by either staying connected with friends or expanding social networks. On an OSN site, users generally share variety of personal information which is often expected to be visible to their friends, but sometimes vulnerable to unwarranted access from others. The recent study suggests that many personal attributes, including religious and political affiliations, sexual orientation, relationship status, age, and gender, are predictable using users' personal data from an OSN site. The majority of users want to remain socially active, and protect their personal data at the same …

Contributors
Gundecha, Pritam Sureshlal, Liu, Huan, Ahn, Gail-Joon, et al.
Created Date
2015

One of the most remarkable outcomes resulting from the evolution of the web into Web 2.0, has been the propelling of blogging into a widely adopted and globally accepted phenomenon. While the unprecedented growth of the Blogosphere has added diversity and enriched the media, it has also added complexity. To cope with the relentless expansion, many enthusiastic bloggers have embarked on voluntarily writing, tagging, labeling, and cataloguing their posts in hopes of reaching the widest possible audience. Unbeknown to them, this reaching-for-others process triggers the generation of a new kind of collective wisdom, a result of shared collaboration, and the …

Contributors
Galan, Magdiel Francisco, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2015

With the rise of social media, user-generated content has become available at an unprecedented scale. On Twitter, 1 billion tweets are posted every 5 days and on Facebook, 20 million links are shared every 20 minutes. These massive collections of user-generated content have introduced the human behavior's big-data. This big data has brought about countless opportunities for analyzing human behavior at scale. However, is this data enough? Unfortunately, the data available at the individual-level is limited for most users. This limited individual-level data is often referred to as thin data. Hence, researchers face a big-data paradox, where this big-data is …

Contributors
Zafarani, Reza, Liu, Huan, Kambhampati, Subbarao, et al.
Created Date
2015

Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called originators, and are of some significance to the browser who has chosen to subscribe to the tweets from the originator by following the originator. Although, hashtags are used to tag tweets in an effort to attach context to the tweets, many tweets do not have a hashtag. Such tweets are called orphan tweets and they adversely affect the experience of …

Contributors
Mallapura Umamaheshwar, Tejas, Kambhampati, Subbarao, Liu, Huan, et al.
Created Date
2015

Social media platforms such as Twitter, Facebook, and blogs have emerged as valuable - in fact, the de facto - virtual town halls for people to discover, report, share and communicate with others about various types of events. These events range from widely-known events such as the U.S Presidential debate to smaller scale, local events such as a local Halloween block party. During these events, we often witness a large amount of commentary contributed by crowds on social media. This burst of social media responses surges with the "second-screen" behavior and greatly enriches the user experience when interacting with the …

Contributors
Hu, Yuheng, Kambhampati, Subbarao, Horvitz, Eric, et al.
Created Date
2014

With the rise of social media, hundreds of millions of people spend countless hours all over the globe on social media to connect, interact, share, and create user-generated data. This rich environment provides tremendous opportunities for many different players to easily and effectively reach out to people, interact with them, influence them, or get their opinions. There are two pieces of information that attract most attention on social media sites, including user preferences and interactions. Businesses and organizations use this information to better understand and therefore provide customized services to social media users. This data can be used for different …

Contributors
Abbasi, Mohammad Ali, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2014

Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this thesis, I provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. I thus …

Contributors
De, Sushovan, Kambhampati, Subbarao, Chen, Yi, et al.
Created Date
2014

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with his/her friend on that topic. Finding context for an Orphaned tweet manually is challenging because of larger social graph of a user , the enormous volume of tweets generated per second, topic diversity, and limited information from tweet length of 140 characters. To help the user to get the context …

Contributors
Vijayakumar, Manikandan, Kambhampati, Subbarao, Liu, Huan, et al.
Created Date
2014

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a subset of items presented to her. The network operator again, shows that activity to a selection of peers, and thus creating a behavioral loop. That mechanism of interaction and information flow raises some very interesting questions such as: can network operator design social signals to promote a particular activity like …

Contributors
Le, Tien Dinh, Sundaram, Hari, Davulcu, Hasan, et al.
Created Date
2014

As the size and scope of valuable datasets has exploded across many industries and fields of research in recent years, an increasingly diverse audience has sought out effective tools for their large-scale data analytics needs. Over this period, machine learning researchers have also been very prolific in designing improved algorithms which are capable of finding the hidden structure within these datasets. As consumers of popular Big Data frameworks have sought to apply and benefit from these improved learning algorithms, the problems encountered with the frameworks have motivated a new generation of Big Data tools to address the shortcomings of the …

Contributors
Krouse, Brian Richard, Ye, Jieping, Liu, Huan, et al.
Created Date
2014

The rapid urban expansion has greatly extended the physical boundary of our living area, along with a large number of POIs (points of interest) being developed. A POI is a specific location (e.g., hotel, restaurant, theater, mall) that a user may find useful or interesting. When exploring the city and neighborhood, the increasing number of POIs could enrich people's daily life, providing them with more choices of life experience than before, while at the same time also brings the problem of "curse of choices", resulting in the difficulty for a user to make a satisfied decision on "where to go" …

Contributors
Gao, Huiji, Liu, Huan, Xue, Guoliang, et al.
Created Date
2014