ASU Electronic Theses and Dissertations

Permanent Link Feedback

Contributor
Date Range
2010 2017

In supervised learning, machine learning techniques can be applied to learn a model on a small set of labeled documents which can be used to classify a larger set of unknown documents. Machine learning techniques can be used to analyze a political scenario in a given society. A lot of research has been going on in this field to understand the interactions of various people in the society in response to actions taken by their organizations. This paper talks about understanding the Russian influence on people in Latvia. This is done by building an eeffective model learnt on initial set ...

Contributors
Bollapragada, Lakshmi Gayatri Niharika, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2016

Online health forums provide a convenient channel for patients, caregivers, and medical professionals to share their experience, support and encourage each other, and form health communities. The fast growing content in health forums provides a large repository for people to seek valuable information. A forum user can issue a keyword query to search health forums regarding to some specific questions, e.g., what treatments are effective for a disease symptom? A medical researcher can discover medical knowledge in a timely and large-scale fashion by automatically aggregating the latest evidences emerging in health forums. This dissertation studies how to effectively discover information ...

Contributors
Liu, Yunzhong, Chen, Yi, Liu, Huan, et al.
Created Date
2016

While discrete emotions like joy, anger, disgust etc. are quite popular, continuous emotion dimensions like arousal and valence are gaining popularity within the research community due to an increase in the availability of datasets annotated with these emotions. Unlike the discrete emotions, continuous emotions allow modeling of subtle and complex affect dimensions but are difficult to predict. Dimension reduction techniques form the core of emotion recognition systems and help create a new feature space that is more helpful in predicting emotions. But these techniques do not necessarily guarantee a better predictive capability as most of them are unsupervised, especially in ...

Contributors
Lade, Prasanth, Panchanathan, Sethuraman, Davulcu, Hasan, et al.
Created Date
2015

Proliferation of social media websites and discussion forums in the last decade has resulted in social media mining emerging as an effective mechanism to extract consumer patterns. Most research on social media and pharmacovigilance have concentrated on Adverse Drug Reaction (ADR) identification. Such methods employ a step of drug search followed by classification of the associated text as consisting an ADR or not. Although this method works efficiently for ADR classifications, if ADR evidence is present in users posts over time, drug mentions fail to capture such ADRs. It also fails to record additional user information which may provide an ...

Contributors
Chandrashekar, Pramod Bharadwaj Chandrashekar, Davulcu, Hasan, Gonzalez, Graciela, et al.
Created Date
2016

Cyber systems, including IoT (Internet of Things), are increasingly being used ubiquitously to vastly improve the efficiency and reduce the cost of critical application areas, such as finance, transportation, defense, and healthcare. Over the past two decades, computing efficiency and hardware cost have dramatically been improved. These improvements have made cyber systems omnipotent, and control many aspects of human lives. Emerging trends in successful cyber system breaches have shown increasing sophistication in attacks and that attackers are no longer limited by resources, including human and computing power. Most existing cyber defense systems for IoT systems have two major issues: (1) ...

Contributors
Buduru, Arun Balaji, Yau, Sik-Sang, Ahn, Gail-Joon, et al.
Created Date
2016

The amount of time series data generated is increasing due to the integration of sensor technologies with everyday applications, such as gesture recognition, energy optimization, health care, video surveillance. The use of multiple sensors simultaneously for capturing different aspects of the real world attributes has also led to an increase in dimensionality from uni-variate to multi-variate time series. This has facilitated richer data representation but also has necessitated algorithms determining similarity between two multi-variate time series for search and analysis. Various algorithms have been extended from uni-variate to multi-variate case, such as multi-variate versions of Euclidean distance, edit distance, dynamic ...

Contributors
Garg, Yash, Candan, Kasim Selcuk, Chowell-Punete, Gerardo, et al.
Created Date
2015

A new algebraic system, Test Algebra (TA), is proposed for identifying faults in combinatorial testing for SaaS (Software-as-a-Service) applications. In the context of cloud computing, SaaS is a new software delivery model, in which mission-critical applications are composed, deployed, and executed on cloud platforms. Testing SaaS applications is challenging because new applications need to be tested once they are composed, and prior to their deployment. A composition of components providing services yields a configuration providing a SaaS application. While individual components in the configuration may have been thoroughly tested, faults still arise due to interactions among the components composed, making ...

Contributors
Qi, Guanqiu, Tsai, Wei-Tek, Davulcu, Hasan, et al.
Created Date
2014

Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect the model performance. In this thesis, I focus on developing learning methods for the high-dimensional imbalanced biomedical data. In the first part, a sparse canonical correlation analysis (CCA) method is presented. The penalty terms is used to control the sparsity of the projection matrices of CCA. The sparse CCA method ...

Contributors
Yang, Tao, Ye, Jieping, Wang, Yalin, et al.
Created Date
2013

In recent years, there are increasing numbers of applications that use multi-variate time series data where multiple uni-variate time series coexist. However, there is a lack of systematic of multi-variate time series. This thesis focuses on (a) defining a simplified inter-related multi-variate time series (IMTS) model and (b) developing robust multi-variate temporal (RMT) feature extraction algorithm that can be used for locating, filtering, and describing salient features in multi-variate time series data sets. The proposed RMT feature can also be used for supporting multiple analysis tasks, such as visualization, segmentation, and searching / retrieving based on multi-variate time series similarities. ...

Contributors
Wang, Xiaolan, Candan, Kasim Selcuk, Sapino, Maria Luisa, et al.
Created Date
2013

The purpose of this research is to efficiently analyze certain data provided and to see if a useful trend can be observed as a result. This trend can be used to analyze certain probabilities. There are three main pieces of data which are being analyzed in this research: The value for δ of the call and put option, the %B value of the stock, and the amount of time until expiration of the stock option. The %B value is the most important. The purpose of analyzing the data is to see the relationship between the variables and, given certain values, ...

Contributors
Reeves, Michael Thomas, Richa, Andrea, McCarville, Daniel, et al.
Created Date
2015

In visualizing information hierarchies, icicle plots are efficient diagrams in that they provide the user a straightforward layout for different levels of data in a hierarchy and enable the user to compare items based on the item width. However, as the size of the hierarchy grows large, the items in an icicle plot end up being small and indistinguishable. In this thesis, by maintaining the positive characteristics of traditional icicle plots and incorporating new features such as dynamic diagram and active layer, we developed an interactive visualization that allows the user to selectively drill down or roll up to review ...

Contributors
Wu, Bi, Maciejewski, Ross, Runger, George, et al.
Created Date
2014

With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain webpages before getting at the webpage he/she wanted. This problem of Information Overload can be solved using Automatic Text Summarization. Summarization is a process of obtaining at abridged version of documents so that user can have a quick view to understand what exactly the document is about. Email threads from ...

Contributors
Nadella, Sravan, Davulcu, Hasan, Li, Baoxin, et al.
Created Date
2015

In this thesis multiple approaches are explored to enhance sentiment analysis of tweets. A standard sentiment analysis model with customized features is first trained and tested to establish a baseline. This is compared to an existing topic based mixture model and a new proposed topic based vector model both of which use Latent Dirichlet Allocation (LDA) for topic modeling. The proposed topic based vector model has higher accuracies in terms of averaged F scores than the other two models. Dissertation/Thesis

Contributors
Baskaran, Swetha, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2016

Multi-tenancy architecture (MTA) is often used in Software-as-a-Service (SaaS) and the central idea is that multiple tenant applications can be developed using compo nents stored in the SaaS infrastructure. Recently, MTA has been extended where a tenant application can have its own sub-tenants as the tenant application acts like a SaaS infrastructure. In other words, MTA is extended to STA (Sub-Tenancy Architecture ). In STA, each tenant application not only need to develop its own functionalities, but also need to prepare an infrastructure to allow its sub-tenants to develop customized applications. This dissertation formulates eight models for STA, and proposes ...

Contributors
Zhong, Peide, Davulcu, Hasan, Sarjoughian, Hessam, et al.
Created Date
2017

Similarity search in high-dimensional spaces is popular for applications like image processing, time series, and genome data. In higher dimensions, the phenomenon of curse of dimensionality kills the effectiveness of most of the index structures, giving way to approximate methods like Locality Sensitive Hashing (LSH), to answer similarity searches. In addition to range searches and k-nearest neighbor searches, there is a need to answer negative queries formed by excluded regions, in high-dimensional data. Though there have been a slew of variants of LSH to improve efficiency, reduce storage, and provide better accuracies, none of the techniques are capable of answering ...

Contributors
Bhat, Aneesha, Candan, Kasim Selcuk, Davulcu, Hasan, et al.
Created Date
2016

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such ...

Contributors
Rihan, Preet Inder Singh, Kambhampati, Subbarao, Liu, Huan, et al.
Created Date
2013

Measuring node centrality is a critical common denominator behind many important graph mining tasks. While the existing literature offers a wealth of different node centrality measures, it remains a daunting task on how to intervene the node centrality in a desired way. In this thesis, we study the problem of minimizing the centrality of one or more target nodes by edge operation. The heart of the proposed method is an accurate and efficient algorithm to estimate the impact of edge deletion on the spectrum of the underlying network, based on the observation that the edge deletion is essentially a local, ...

Contributors
Peng, Ruiyue, Tong, Hanghang, He, Jingrui, et al.
Created Date
2016

With the advent of social media (like Twitter, Facebook etc.,) people are easily sharing their opinions, sentiments and enforcing their ideologies on others like never before. Even people who are otherwise socially inactive would like to share their thoughts on current affairs by tweeting and sharing news feeds with their friends and acquaintances. In this thesis study, we chose Twitter as our main data platform to analyze shifts and movements of 27 political organizations in Indonesia. So far, we have collected over 30 million tweets and 150,000 news articles from RSS feeds of the corresponding organizations for our analysis. For ...

Contributors
Poornachandran, Sathishkumar, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2013

Crises or large-scale emergencies such as earthquakes and hurricanes cause massive damage to lives and property. Crisis response is an essential task to mitigate the impact of a crisis. An effective response to a crisis necessitates information gathering and analysis. Traditionally, this process has been restricted to the information collected by first responders on the ground in the affected region or by official agencies such as local governments involved in the response. However, the ubiquity of mobile devices has empowered people to publish information during a crisis through social media, such as the damage reports from a hurricane. Social media ...

Contributors
Kumar, Shamanth, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2015

A story is defined as "an actor(s) taking action(s) that culminates in a resolution(s)''. I present novel sets of features to facilitate story detection among text via supervised classification and further reveal different forms within stories via unsupervised clustering. First, I investigate the utility of a new set of semantic features compared to standard keyword features combined with statistical features, such as density of part-of-speech (POS) tags and named entities, to develop a story classifier. The proposed semantic features are based on <Subject, Verb, Object> triplets that can be extracted using a shallow parser. Experimental results show that a model ...

Contributors
Ceran, Saadet Betul, Davulcu, Hasan, Corman, Steven R, et al.
Created Date
2016

Emerging trends in cyber system security breaches in critical cloud infrastructures show that attackers have abundant resources (human and computing power), expertise and support of large organizations and possible foreign governments. In order to greatly improve the protection of critical cloud infrastructures, incorporation of human behavior is needed to predict potential security breaches in critical cloud infrastructures. To achieve such prediction, it is envisioned to develop a probabilistic modeling approach with the capability of accurately capturing system-wide causal relationship among the observed operational behaviors in the critical cloud infrastructure and accurately capturing probabilistic human (users’) behaviors on subsystems as the ...

Contributors
Nagaraja, Vinjith, Yau, Stephen S, Ahn, Gail-Joon, et al.
Created Date
2015

One of the most remarkable outcomes resulting from the evolution of the web into Web 2.0, has been the propelling of blogging into a widely adopted and globally accepted phenomenon. While the unprecedented growth of the Blogosphere has added diversity and enriched the media, it has also added complexity. To cope with the relentless expansion, many enthusiastic bloggers have embarked on voluntarily writing, tagging, labeling, and cataloguing their posts in hopes of reaching the widest possible audience. Unbeknown to them, this reaching-for-others process triggers the generation of a new kind of collective wisdom, a result of shared collaboration, and the ...

Contributors
Galan, Magdiel Francisco, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2015

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the written news articles or transcripts of events in giving the popular public opinion about these events. So it would be useful to annotate the transcript with tweets. The technical challenge is to align the tweets with the correct segment of the transcript. ET-LDA by Hu et al [9] addresses this issue by modeling the whole process with an LDA-based graphical ...

Contributors
Acharya, Anirudh, Kambhampati, Subbarao, Davulcu, Hasan, et al.
Created Date
2015

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many ...

Contributors
Kanwar, Pradeep, Davulcu, Hasan, Dinu, Valentin, et al.
Created Date
2010

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with his/her friend on that topic. Finding context for an Orphaned tweet manually is challenging because of larger social graph of a user , the enormous volume of tweets generated per second, topic diversity, and limited information from tweet length of 140 characters. To help the user to get the context ...

Contributors
Vijayakumar, Manikandan, Kambhampati, Subbarao, Liu, Huan, et al.
Created Date
2014

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a subset of items presented to her. The network operator again, shows that activity to a selection of peers, and thus creating a behavioral loop. That mechanism of interaction and information flow raises some very interesting questions such as: can network operator design social signals to promote a particular activity like ...

Contributors
Le, Tien Dinh, Sundaram, Hari, Davulcu, Hasan, et al.
Created Date
2014

As the size and scope of valuable datasets has exploded across many industries and fields of research in recent years, an increasingly diverse audience has sought out effective tools for their large-scale data analytics needs. Over this period, machine learning researchers have also been very prolific in designing improved algorithms which are capable of finding the hidden structure within these datasets. As consumers of popular Big Data frameworks have sought to apply and benefit from these improved learning algorithms, the problems encountered with the frameworks have motivated a new generation of Big Data tools to address the shortcomings of the ...

Contributors
Krouse, Brian Richard, Ye, Jieping, Liu, Huan, et al.
Created Date
2014

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and drug development. Scientists studying these interconnected processes have identified various pathways involved in drug metabolism, diseases, and signal transduction, etc. High-throughput technologies, new algorithms and speed improvements over the last decade have resulted in deeper knowledge about biological systems, leading to more refined pathways. Such pathways tend to be large ...

Contributors
Anwar, Saadat, Baral, Chitta, Inoue, Katsumi, et al.
Created Date
2014

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this ...

Contributors
Carey, Maurice, Colbourn, Charles, Collofello, James, et al.
Created Date
2013

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is ...

Contributors
Ravikumar, Srijith, Kambhampati, Subbarao, Davulcu, Hasan, et al.
Created Date
2013

Text classification, in the artificial intelligence domain, is an activity in which text documents are automatically classified into predefined categories using machine learning techniques. An example of this is classifying uncategorized news articles into different predefined categories such as "Business", "Politics", "Education", "Technology" , etc. In this thesis, supervised machine learning approach is followed, in which a module is first trained with pre-classified training data and then class of test data is predicted. Good feature extraction is an important step in the machine learning approach and hence the main component of this text classifier is semantic triplet based features in ...

Contributors
Karad, Ravi Chandravadan, Davulcu, Hasan, Corman, Steven, et al.
Created Date
2013

A semiconductor supply chain modeling and simulation platform using Linear Program (LP) optimization and parallel Discrete Event System Specification (DEVS) process models has been developed in a joint effort by ASU and Intel Corporation. A Knowledge Interchange Broker (KIBDEVS/LP) was developed to broker information synchronously between the DEVS and LP models. Recently a single-echelon heuristic Inventory Strategy Module (ISM) was added to correct for forecast bias in customer demand data using different smoothing techniques. The optimization model could then use information provided by the forecast model to make better decisions for the process model. The composition of ISM with LP ...

Contributors
Smith, James Melkon, Sarjoughian, Hessam S, Davulcu, Hasan, et al.
Created Date
2012

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management systems (DBMS). Also, unlike other client-driven work, this approach provides support for a richer set of schema updates including vertical split (normalization), horizontal split, vertical and horizontal merge (union), difference and intersection. The update process automatically generates a runtime update client from a mapping between the old the new schemas. ...

Contributors
Tyagi, Preetika, Bazzi, Rida, Candan, Kasim S, et al.
Created Date
2011

This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log archives over a period of five years) and focused on drawing more precise metrics from different perspectives of the communication data. Also, I attempted to overcome the scalability issue by using Apache Pig libraries, which run on a MapReduce framework based Hadoop Cluster. I described four metrics based on which ...

Contributors
Motamarri, Lakshminarayana, Santanam, Raghu, Ye, Jieping, et al.
Created Date
2011

Navigating within non-linear structures is a challenge for all users when the space is large but the problem is most pronounced when the users are blind or visually impaired. Such users access digital content through screen readers like JAWS which read out the text on the screen. However presentation of non-linear narratives in such a manner without visual cues and information about spatial dependencies is very inefficient for such users. The NSDL Science Literacy StrandMaps are visual layouts to help students and teachers browse educational resources. A Strandmap shows relationships between concepts and how they build upon one another across ...

Contributors
Gaur, Shruti, Candan, Kasim Selçuk, Sundaram, Hari, et al.
Created Date
2011

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently ...

Contributors
Liu, Ziyang, Chen, Yi, Candan, Kasim S, et al.
Created Date
2011

Currently Java is making its way into the embedded systems and mobile devices like androids. The programs written in Java are compiled into machine independent binary class byte codes. A Java Virtual Machine (JVM) executes these classes. The Java platform additionally specifies the Java Native Interface (JNI). JNI allows Java code that runs within a JVM to interoperate with applications or libraries that are written in other languages and compiled to the host CPU ISA. JNI plays an important role in embedded system as it provides a mechanism to interact with libraries specific to the platform. This thesis addresses the ...

Contributors
Chandrian, Preetham, Lee, Yann-Hang, Davulcu, Hasan, et al.
Created Date
2011

The pay-as-you-go economic model of cloud computing increases the visibility, traceability, and verifiability of software costs. Application developers must understand how their software uses resources when running in the cloud in order to stay within budgeted costs and/or produce expected profits. Cloud computing's unique economic model also leads naturally to an earn-as-you-go profit model for many cloud based applications. These applications can benefit from low level analyses for cost optimization and verification. Testing cloud applications to ensure they meet monetary cost objectives has not been well explored in the current literature. When considering revenues and costs for cloud applications, the ...

Contributors
Buell, Kevin, Collofello, James, Davulcu, Hasan, et al.
Created Date
2012

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and user credit card purchase pattern monitoring, however the matches to the user queries are in fact plentiful and the system has to efficiently sift through these many matches to locate only the few most preferable matches. In this work, we propose a complex pattern ranking (CPR) framework for specifying top-k ...

Contributors
Wang, Xinxin, Candan, K. Selcuk, Chen, Yi, et al.
Created Date
2011

Muslim radicalism is recognized as one of the greatest security threats for the United States and the rest of the world. Use of force to eliminate specific radical entities is ineffective in containing radicalism as a whole. There is a need to understand the origin, ideologies and behavior of Radical and Counter-Radical organizations and how they shape up over a period of time. Recognizing and supporting counter-radical organizations is one of the most important steps towards impeding radical organizations. A lot of research has already been done to categorize and recognize organizations, to understand their behavior, their interactions with other ...

Contributors
Nair, Shreejay, Davulcu, Hasan, Dasgpta, Partha, et al.
Created Date
2012

The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences in image quality (resolution, brightness, occlusion and color), changes in camera perspective, dissimilar backgrounds and an inherent diversity of the samples themselves. Machine learning techniques like transfer learning are employed to adapt computational models across distributions. Domain adaptation is a special case of transfer learning, where knowledge from a source ...

Contributors
Demakethepalli Venkateswara, Hemanth, Panchanathan, Sethuraman, Li, Baoxin, et al.
Created Date
2017

Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability of customer information and interests. In this thesis, bank telemarketing data from a Portuguese banking institution were analyzed to determine predictability of several client demographic and financial attributes and find most contributing factors in each. Data were preprocessed to ensure quality, and then data mining models were generated for the attributes with ...

Contributors
Ejaz, Samira, Davulcu, Hasan, Balasooriya, Janaka, et al.
Created Date
2016

Keyword search provides a simple and user-friendly mechanism for information search, and has become increasingly popular for accessing structured or semi-structured data. However, there are two open issues of keyword search on semi/structured data which are not well addressed by existing work yet. First, while an increasing amount of investigation has been done in this important area, most existing work concentrates on efficiency instead of search quality and may fail to deliver high quality results from semantic perspectives. Majority of the existing work generates minimal sub-graph results that are oblivious to the entity and relationship semantics embedded in the data ...

Contributors
Shan, Yi, Chen, Yi, Bansal, Srividya, et al.
Created Date
2016

There has been a lot of research in the field of artificial intelligence about thinking machines. Alan Turing proposed a test to observe a machine's intelligent behaviour with respect to natural language conversation. The Winograd schema challenge is suggested as an alternative, to the Turing test. It needs inferencing capabilities, reasoning abilities and background knowledge to get the answer right. It involves a coreference resolution task in which a machine is given a sentence containing a situation which involves two entities, one pronoun and some more information about the situation and the machine has to come up with the right ...

Contributors
Budukh, Tejas Ulhas, Baral, Chitta, Vanlehn, Kurt, et al.
Created Date
2013

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous data, it is interesting to design efficient machine learning models that are capable of performing variable selection and feature group (data source) selection simultaneously (a.k.a bi-level selection). In this thesis, I carry out research along this direction with a particular focus on designing efficient optimization algorithms. I start with a ...

Contributors
Xiang, Shuo, Ye, Jieping, Mittelmann, Hans D, et al.
Created Date
2014

A major challenge in automated text analysis is that different words are used for related concepts. Analyzing text at the surface level would treat related concepts (i.e. actors, actions, targets, and victims) as different objects, potentially missing common narrative patterns. Generalized concepts are used to overcome this problem. Generalization may result into word sense disambiguation failing to find similarity. This is addressed by taking into account contextual synonyms. Concept discovery based on contextual synonyms reveal information about the semantic roles of the words leading to concepts. Merger engine generalize the concepts so that it can be used as features in ...

Contributors
Kedia, Nitesh, Davulcu, Hasan, Corman, Steve R, et al.
Created Date
2015

Our research focuses on finding answers through decentralized search, for complex, imprecise queries (such as "Which is the best hair salon nearby?") in situations where there is a spatiotemporal constraint (say answer needs to be found within 15 minutes) associated with the query. In general, human networks are good in answering imprecise queries. We try to use the social network of a person to answer his query. Our research aims at designing a framework that exploits the user's social network in order to maximize the answers for a given query. Exploiting an user's social network has several challenges. The major ...

Contributors
Swaminathan, Neelakantan, Sundaram, Hari, Davulcu, Hasan, et al.
Created Date
2013

Traditionally, visualization is one of the most important and commonly used methods of generating insight into large scale data. Particularly for spatiotemporal data, the translation of such data into a visual form allows users to quickly see patterns, explore summaries and relate domain knowledge about underlying geographical phenomena that would not be apparent in tabular form. However, several critical challenges arise when visualizing and exploring these large spatiotemporal datasets. While, the underlying geographical component of the data lends itself well to univariate visualization in the form of traditional cartographic representations (e.g., choropleth, isopleth, dasymetric maps), as the data becomes multivariate, ...

Contributors
Zhang, Yifan, Maciejewski, Ross, Mack, Elizabeth, et al.
Created Date
2016

Nowadays, Computing is so pervasive that it has become indeed the 5th utility (after water, electricity, gas, telephony) as Leonard Kleinrock once envisioned. Evolved from utility computing, cloud computing has emerged as a computing infrastructure that enables rapid delivery of computing resources as a utility in a dynamically scalable, virtualized manner. However, the current industrial cloud computing implementations promote segregation among different cloud providers, which leads to user lockdown because of prohibitive migration cost. On the other hand, Service-Orented Computing (SOC) including service-oriented architecture (SOA) and Web Services (WS) promote standardization and openness with its enabling standards and communication protocols. ...

Contributors
Sun, Xin, Tsai, Wei-Tek, Xue, Guoliang, et al.
Created Date
2016

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's computerized devices and displays largely engage--have become overloaded, creating possibilities for distractions, delays and high cognitive load; which in turn can lead to a loss of situational awareness, increasing chances for life threatening situations such as texting while driving. Surprisingly, alternative modalities for information delivery have seen little exploration. Touch, ...

Contributors
Mcdaniel, Troy Lee, Panchanathan, Sethuraman, Davulcu, Hasan, et al.
Created Date
2012

With the advent of social media and micro-blogging sites, people have become active in sharing their thoughts, opinions, ideologies and furthermore enforcing them on others. Users have become the source for the production and dissemination of real time information. The content posted by the users can be used to understand them and track their behavior. Using this content of the user, data analysis can be performed to understand their social ideology and affinity towards Radical and Counter-Radical Movements. During the process of expressing their opinions people use hashtags in their messages in Twitter. These hashtags are a rich source of ...

Contributors
Garipalli, Sravan Kumar, Davulcu, Hasan, Shakarian, Paulo, et al.
Created Date
2015

In trading, volume is a measure of how much stock has been exchanged in a given period of time. Since every stock is distinctive and has an alternate measure of shares, volume can be contrasted with historical volume inside a stock to spot changes. It is likewise used to affirm value patterns, breakouts, and spot potential reversals. In my thesis, I hypothesize that the concept of trading volume can be extrapolated to social media (Twitter). The ubiquity of social media, especially Twitter, in financial market has been overly resonant in the past couple of years. With the growth of its ...

Contributors
Awasthi, Piyush, Davulcu, Hasan, Tong, Hanghang, et al.
Created Date
2015

US Senate is the venue of political debates where the federal bills are formed and voted. Senators show their support/opposition along the bills with their votes. This information makes it possible to extract the polarity of the senators. Similarly, blogosphere plays an increasingly important role as a forum for public debate. Authors display sentiment toward issues, organizations or people using a natural language. In this research, given a mixed set of senators/blogs debating on a set of political issues from opposing camps, I use signed bipartite graphs for modeling debates, and I propose an algorithm for partitioning both the opinion ...

Contributors
Gokalp, Sedat, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2015

Multidimensional data have various representations. Thanks to their simplicity in modeling multidimensional data and the availability of various mathematical tools (such as tensor decompositions) that support multi-aspect analysis of such data, tensors are increasingly being used in many application domains including scientific data management, sensor data management, and social network data analysis. Relational model, on the other hand, enables semantic manipulation of data using relational operators, such as projection, selection, Cartesian-product, and set operators. For many multidimensional data applications, tensor operations as well as relational operations need to be supported throughout the data life cycle. In this thesis, we introduce ...

Contributors
Kim, Mijung, Candan, K. Selcuk, Davulcu, Hasan, et al.
Created Date
2014

Predicting when an individual will adopt a new behavior is an important problem in application domains such as marketing and public health. This thesis examines the performance of a wide variety of social network based measurements proposed in the literature - which have not been previously compared directly. This research studies the probability of an individual becoming influenced based on measurements derived from neighborhood (i.e. number of influencers, personal network exposure), structural diversity, locality, temporal measures, cascade measures, and metadata. It also examines the ability to predict influence based on choice of the classifier and how the ratio of positive ...

Contributors
Nanda Kumar, Nikhil, Shakarian, Paulo, Sen, Arunabha, et al.
Created Date
2016

Skyline queries are a well-established technique used in multi criteria decision applications. There is a recent interest among the research community to efficiently compute skylines but the problem of presenting the skyline that takes into account the preferences of the user is still open. Each user has varying interests towards each attribute and hence "one size fits all" methodology might not satisfy all the users. True user satisfaction can be obtained only when the skyline is tailored specifically for each user based on his preferences. This research investigates the problem of preference aware skyline processing which consists of inferring the ...

Contributors
Rathinavelu, Sriram, Candan, Kasim Selcuk, Davulcu, Hasan, et al.
Created Date
2014

The wide adoption and continued advancement of information and communications technologies (ICT) have made it easier than ever for individuals and groups to stay connected over long distances. These advances have greatly contributed in dramatically changing the dynamics of the modern day workplace to the point where it is now commonplace to see large, distributed multidisciplinary teams working together on a daily basis. However, in this environment, motivating, understanding, and valuing the diverse contributions of individual workers in collaborative enterprises becomes challenging. To address these issues, this thesis presents the goals, design, and implementation of Taskville, a distributed workplace game ...

Contributors
Nikkila, Shawn, Sundaram, Hari, Byrne, Daragh, et al.
Created Date
2013

With the rise of social media, hundreds of millions of people spend countless hours all over the globe on social media to connect, interact, share, and create user-generated data. This rich environment provides tremendous opportunities for many different players to easily and effectively reach out to people, interact with them, influence them, or get their opinions. There are two pieces of information that attract most attention on social media sites, including user preferences and interactions. Businesses and organizations use this information to better understand and therefore provide customized services to social media users. This data can be used for different ...

Contributors
Abbasi, Mohammad Ali, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2014

Stock market news and investing tips are popular topics in Twitter. In this dissertation, first I utilize a 5-year financial news corpus comprising over 50,000 articles collected from the NASDAQ website matching the 30 stock symbols in Dow Jones Index (DJI) to train a directional stock price prediction system based on news content. Next, I proceed to show that information in articles indicated by breaking Tweet volumes leads to a statistically significant boost in the hourly directional prediction accuracies for the DJI stock prices mentioned in these articles. Secondly, I show that using document-level sentiment extraction does not yield a ...

Contributors
Alostad, Hana, Davulcu, Hasan, Corman, Steven, et al.
Created Date
2016

In contemporary society, sustainability and public well-being have been pressing challenges. Some of the important questions are:how can sustainable practices, such as reducing carbon emission, be encouraged? , How can a healthy lifestyle be maintained?Even though individuals are interested, they are unable to adopt these behaviors due to resource constraints. Developing a framework to enable cooperative behavior adoption and to sustain it for a long period of time is a major challenge. As a part of developing this framework, I am focusing on methods to understand behavior diffusion over time. Facilitating behavior diffusion with resource constraints in a large population ...

Contributors
Dey, Anindita, Sundaram, Hari, Turaga, Pavan, et al.
Created Date
2013

The increasing usage of smart-phones and mobile devices in work environment and IT industry has brought about unique set of challenges and opportunities. ARM architecture in particular has evolved to a point where it supports implementations across wide spectrum of performance points and ARM based tablets and smart-phones are in demand. The enhancements to basic ARM RISC architecture allow ARM to have high performance, small code size, low power consumption and small silicon area. Users want their devices to perform many tasks such as read email, play games, and run other online applications and organizations no longer desire to provision ...

Contributors
Chowdhary, Ankur, Huang, Dijiang, Tong, Hanghang, et al.
Created Date
2015

Corporations invest considerable resources to create, preserve and analyze their data; yet while organizations are interested in protecting against unauthorized data transfer, there lacks a comprehensive metric to discriminate what data are at risk of leaking. This thesis motivates the need for a quantitative leakage risk metric, and provides a risk assessment system, called Whispers, for computing it. Using unsupervised machine learning techniques, Whispers uncovers themes in an organization's document corpus, including previously unknown or unclassified data. Then, by correlating the document with its authors, Whispers can identify which data are easier to contain, and conversely which are at risk. ...

Contributors
Wright, Jeremy Lee, Syrotiuk, Violet, Davulcu, Hasan, et al.
Created Date
2014

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems. An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms ...

Contributors
Nagendra, Mithila, Candan, Kasim Selcuk, Chen, Yi, et al.
Created Date
2014

Interactive remote e-learning is one of the youngest and most popular methods that is used in today's teaching method. WebRTC, on the other hand, has become the popular concept and method in real time communication. Unlike the old fashioned Adobe Flash, user will communicate directly to each other rather than calling server as the middle man. The world is changing from plug-in to web-browser. However, the WebRTC have not been widely used for school education. By taking into consideration of the WebRTC solution for data transferring, we propose a new Cloud based interactive multimedia which enables virtual lab learning environment. ...

Contributors
Li, Qingyun, Huang, Dijiang, Davulcu, Hasan, et al.
Created Date
2014

Lighting systems and air-conditioning systems are two of the largest energy consuming end-uses in buildings. Lighting control in smart buildings and homes can be automated by having computer controlled lights and window blinds along with illumination sensors that are distributed in the building, while temperature control can be automated by having computer controlled air-conditioning systems. However, programming actuators in a large-scale environment for buildings and homes can be time consuming and expensive. This dissertation presents an approach that algorithmically sets up the control system that can automate any building without requiring custom programming. This is achieved by imbibing the system ...

Contributors
Wang, Yuan, Dasgupta, Partha, Davulcu, Hasan, et al.
Created Date
2015

Node proximity measures are commonly used for quantifying how nearby or otherwise related to two or more nodes in a graph are. Node significance measures are mainly used to find how much nodes are important in a graph. The measures of node proximity/significance have been highly effective in many predictions and applications. Despite their effectiveness, however, there are various shortcomings. One such shortcoming is a scalability problem due to their high computation costs on large size graphs and another problem on the measures is low accuracy when the significance of node and its degree in the graph are not related. ...

Contributors
Kim, Jung Hyun, Candan, K. Selcuk, Davulcu, Hasan, et al.
Created Date
2017

Bangladesh is a secular democracy with almost 90% of its population constituting of Muslims and the rest 10% constituting of the minority groups that includes Hindus, Christians, Buddhists, Ahmadi Muslims, Shia, Sufi, LGBT groups and Atheists. In recent years, Bangladesh has experienced an increase in attacks by religious extremist groups, such as IS and AQIS affiliates, hate-groups and politically motivated violence. Attacks have also become indiscriminate, with assailants targeting a wide variety of individuals, including religious minorities and foreigners. According to the telecoms regulator, the number of internet users in Bangladesh now stands at over 66.8 million reaching 41% penetration. ...

Contributors
Chhabra, Pankaj, Davulcu, Hasan, Li, Baoxin, et al.
Created Date
2017

With the recent expansion in the use of wearable technology, a large number of users access personal data with these smart devices. The consumer market of wearables includes smartwatches, health and fitness bands, and gesture control armbands. These smart devices enable users to communicate with each other, control other devices, relax and work out more effectively. As part of their functionality, these devices store, transmit, and/or process sensitive user personal data, perhaps biological and location data, making them an abundant source of confidential user information. Thus, prevention of unauthorized access to wearables is necessary. In fact, it is important to ...

Contributors
Mukherjee, Tamalika, Yau, Sik-Sang, Ahn, Gail-Joon, et al.
Created Date
2017

Most current database management systems are optimized for single query execution. Yet, often, queries come as part of a query workload. Therefore, there is a need for index structures that can take into consideration existence of multiple queries in a query workload and efficiently produce accurate results for the entire query workload. These index structures should be scalable to handle large amounts of data as well as large query workloads. The main objective of this dissertation is to create and design scalable index structures that are optimized for range query workloads. Range queries are an important type of queries with ...

Contributors
Nagarkar, Parth, Candan, Kasim S, Davulcu, Hasan, et al.
Created Date
2017

The game held by National Basketball Association (NBA) is the most popular basketball event on earth. Each year, tons of statistical data are generated from this industry. Meanwhile, managing teams, sports media, and scientists are digging deep into the data ocean. Recent research literature is reviewed with respect to whether NBA teams could be analyzed as connected networks. However, it becomes very time-consuming, if not impossible, for human labor to capture every detail of game events on court of large amount. In this study, an alternative method is proposed to parse public resources from NBA related websites to build degenerated ...

Contributors
Zhang, Xiaoyu, Tong, Hanghang, He, Jingrui, et al.
Created Date
2017

Visual Question Answering (VQA) is a new research area involving technologies ranging from computer vision, natural language processing, to other sub-fields of artificial intelligence such as knowledge representation. The fundamental task is to take as input one image and one question (in text) related to the given image, and to generate a textual answer to the input question. There are two key research problems in VQA: image understanding and the question answering. My research mainly focuses on developing solutions to support solving these two problems. In image understanding, one important research area is semantic segmentation, which takes images as input ...

Contributors
Tian, Qiongjie, Li, Baoxin, Tong, Hanghang, et al.
Created Date
2017

With the rise of Online Social Networks (OSN) in the last decade, social network analysis has become a crucial research topic. The OSN graphs have unique properties that distinguish them from other types of graphs. In this thesis, five month Tweet corpus collected from Bangladesh - between June 2016 and October 2016 is analyzed, in order to detect accounts that belong to groups. These groups consist of official and non-official twitter handles of political organizations and NGOs in Bangladesh. A set of network, temporal, spatial and behavioral features are proposed to discriminate between accounts belonging to individual twitter users, news, ...

Contributors
Gore, Chinmay Chandrashekhar, Davulcu, Hasan, Hsiao, Ihan, et al.
Created Date
2017

Internet and social media devices created a new public space for debate on political and social topics (Papacharissi 2002; Himelboim 2010). Hotly debated issues span all spheres of human activity; from liberal vs. conservative politics, to radical vs. counter-radical religious debate, to climate change debate in scientific community, to globalization debate in economics, and to nuclear disarmament debate in security. Many prominent ’camps’ have emerged within Internet debate rhetoric and practice (Dahlberg, n.d.). In this research I utilized feature extraction and model fitting techniques to process the rhetoric found in the web sites of 23 Indonesian Islamic religious organizations, later ...

Contributors
Tikves, Sukru, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2016

Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called originators, and are of some significance to the browser who has chosen to subscribe to the tweets from the originator by following the originator. Although, hashtags are used to tag tweets in an effort to attach context to the tweets, many tweets do not have a hashtag. Such tweets are called orphan tweets and they adversely affect the experience of ...

Contributors
Mallapura Umamaheshwar, Tejas, Kambhampati, Subbarao, Liu, Huan, et al.
Created Date
2015

There have been extensive research in how news and twitter feeds can affect the outcome of a given stock. However, a majority of this research has studied the short term effects of sentiment with a given stock price. Within this research, I studied the long-term effects of a given stock price using fundamental analysis techniques. Within this research, I collected both sentiment data and fundamental data for Apple Inc., Microsoft Corp., and Peabody Energy Corp. Using a neural network algorithm, I found that sentiment does have an effect on the annual growth of these companies but the fundamentals are more ...

Contributors
Reeves, Tyler Joseph, Davulcu, Hasan, Baral, Chitta, et al.
Created Date
2016

The connections between different entities define different kinds of networks, and many such networked phenomena are influenced by their underlying geographical relationships. By integrating network and geospatial analysis, the goal is to extract information about interaction topologies and the relationships to related geographical constructs. In the recent decades, much work has been done analyzing the dynamics of spatial networks; however, many challenges still remain in this field. First, the development of social media and transportation technologies has greatly reshaped the typologies of communications between different geographical regions. Second, the distance metrics used in spatial analysis should also be enriched with ...

Contributors
Wang, Feng, Maciejewski, Ross, Davulcu, Hasan, et al.
Created Date
2017

Continuous Delivery, as one of the youngest and most popular member of agile model family, has become a popular concept and method in software development industry recently. Instead of the traditional software development method, which requirements and solutions must be fixed before starting software developing, it promotes adaptive planning, evolutionary development and delivery, and encourages rapid and flexible response to change. However, several problems prevent Continuous Delivery to be introduced into education world. Taking into the consideration of the barriers, we propose a new Cloud based Continuous Delivery Software Developing System. This system is designed to fully utilize the whole ...

Contributors
Deng, Yuli, Huang, Dijiang, Davulcu, Hasan, et al.
Created Date
2013

The dawn of Internet of Things (IoT) has opened the opportunity for mainstream adoption of machine learning analytics. However, most research in machine learning has focused on discovery of new algorithms or fine-tuning the performance of existing algorithms. Little exists on the process of taking an algorithm from the lab-environment into the real-world, culminating in sustained value. Real-world applications are typically characterized by dynamic non-stationary systems with requirements around feasibility, stability and maintainability. Not much has been done to establish standards around the unique analytics demands of real-world scenarios. This research explores the problem of the why so few of ...

Contributors
Shahapurkar, Som, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2016

Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement, it is more important to rank images based on their aesthetic quality instead of binary-categorizing them. Furthermore, in such applications, it may be possible that all images belong to the same category. Hence determining the aesthetic ranking of the images is more appropriate. To this end, a novel problem of ranking images with respect to their aesthetic quality is formulated ...

Contributors
Gattupalli, Jaya Vijetha R., Li, Baoxin, Davulcu, Hasan, et al.
Created Date
2016

Process migration is a heavily studied research area and has a number of applications in distributed systems. Process migration means transferring a process running on one machine to another such that it resumes execution from the point at which it was suspended. The conventional approach to implement process migration is to move the entire state information of the process (including hardware context, virtual memory, files etc.) from one machine to another. Copying all the state information is costly. This thesis proposes and demonstrates a new approach of migrating a process between two cores of Intel Single Chip Cloud (SCC), an ...

Contributors
Jain, Vaibhav, Dasgupta, Partha, Shriavstava, Aviral, et al.
Created Date
2013

Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based local measures for assessing source quality. These local measures are necessary but not sufficient as they are agnostic to source trustworthiness and result importance, which, given the autonomous and uncurated nature of deep-web, have become indispensible for searching deep-web. SourceRank provides a global measure for assessing source quality based on source trustworthiness and result importance. SourceRank's effectiveness has been evaluated ...

Contributors
Jha, Manishkumar, Kambhampati, Subbarao, Liu, Huan, et al.
Created Date
2011

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query languages have been a subject of research for more than 30 years and are a natural fit for expressing queries that involve a temporal dimension. However, operators developed in this context cannot be directly applied to event streams. The research extends a preexisting relational framework for event stream processing to ...

Contributors
Shiva, Foruhar Ali, Urban, Susan D, Chen, Yi, et al.
Created Date
2012

Templates are wildly used in Web sites development. Finding the template for a given set of Web pages could be very important and useful for many applications like Web page classification and monitoring content and structure changes of Web pages. In this thesis, two novel sequence-based Web page template detection algorithms are presented. Different from tree mapping algorithms which are based on tree edit distance, sequence-based template detection algorithms operate on the Prüfer/Consolidated Prüfer sequences of trees. Since there are one-to-one correspondences between Prüfer/Consolidated Prüfer sequences and trees, sequence-based template detection algorithms identify the template by finding a common subsequence ...

Contributors
Huang, Wei, Candan, Kasim Selçuk, Sundaram, Hari, et al.
Created Date
2011

The overall contribution of the Minerva Initiative at ASU is to map social organizations in a multidimensional space that provides a measure of their radical or counter radical influence over the demographics of a nation. This tool serves as a simple content management system to store and track project resources like documents, images, videos and web links. It provides centralized and secure access to email conversations among project team members. Conversations are categorized into one of the seven pre-defined categories. Each category is associated with a certain set of keywords and we follow a frequency based approach for matching email ...

Contributors
Nair, Apurva, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2012

Analysis of political texts, which contains a huge amount of personal political opinions, sentiments, and emotions towards powerful individuals, leaders, organizations, and a large number of people, is an interesting task, which can lead to discover interesting interactions between the political parties and people. Recently, political blogosphere plays an increasingly important role in politics, as a forum for debating political issues. Most of the political weblogs are biased towards their political parties, and they generally express their sentiments towards their issues (i.e. leaders, topics etc.,) and also towards issues of the opposing parties. In this thesis, I have modeled the ...

Contributors
Thirumalai, Dananjayan, Davulcu, Hasan, Sarjoughian, Hessam, et al.
Created Date
2012

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work ...

Contributors
Lee, Jang, Gonzalez, Graciela, Ye, Jieping, et al.
Created Date
2011

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost ...

Contributors
Chaudhari, Mahesh Balkrishna, Dietrich, Suzanne W, Urban, Susan D, et al.
Created Date
2011

Embedded Networked Systems (ENS) consist of various devices, which are embedded into physical objects (e.g., home appliances, vehicles, buidlings, people). With rapid advances in processing and networking technologies, these devices can be fully connected and pervasive in the environment. The devices can interact with the physical world, collaborate to share resources, and provide context-aware services. This dissertation focuses on collaboration in ENS to provide smart services. However, there are several challenges because the system must be - scalable to a huge number of devices; robust against noise, loss and failure; and secure despite communicating with strangers. To address these challenges, ...

Contributors
Kim, Su Jin, Gupta, Sandeep K. S., Dasgupta, Partha, et al.
Created Date
2010

Text search is a very useful way of retrieving document information from a particular website. The public generally use internet search engines over the local enterprise search engines, because the enterprise content is not cross linked and does not follow a page rank algorithm. On the other hand the enterprise search engine uses metadata information, which allows the user to specify the conditions that any retrieved document should meet. Therefore, using metadata information for searching will also be very useful. My thesis aims on developing an enterprise search engine using metadata information by providing advanced features like faceted navigation. The ...

Contributors
Sanaka, Srinivasa Raviteja, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2010

This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries.

For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.