Skip to main content

ASU Electronic Theses and Dissertations


This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.


Contributor
Date Range
2010 2018


Online social networks are the hubs of social activity in cyberspace, and using them to exchange knowledge, experiences, and opinions is common. In this work, an advanced topic modeling framework is designed to analyse complex longitudinal health information from social media with minimal human annotation, and Adverse Drug Events and Reaction (ADR) information is extracted and automatically processed by using a biased topic modeling method. This framework improves and extends existing topic modelling algorithms that incorporate background knowledge. Using this approach, background knowledge such as ADR terms and other biomedical knowledge can be incorporated during the text mining process, with ...

Contributors
Yang, Jian, Gonzalez, Graciela, Davulcu, Hasan, et al.
Created Date
2017

Continuous Delivery, as one of the youngest and most popular member of agile model family, has become a popular concept and method in software development industry recently. Instead of the traditional software development method, which requirements and solutions must be fixed before starting software developing, it promotes adaptive planning, evolutionary development and delivery, and encourages rapid and flexible response to change. However, several problems prevent Continuous Delivery to be introduced into education world. Taking into the consideration of the barriers, we propose a new Cloud based Continuous Delivery Software Developing System. This system is designed to fully utilize the whole ...

Contributors
Deng, Yuli, Huang, Dijiang, Davulcu, Hasan, et al.
Created Date
2013

Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement, it is more important to rank images based on their aesthetic quality instead of binary-categorizing them. Furthermore, in such applications, it may be possible that all images belong to the same category. Hence determining the aesthetic ranking of the images is more appropriate. To this end, a novel problem of ranking images with respect to their aesthetic quality is formulated ...

Contributors
Gattupalli, Jaya Vijetha R., Li, Baoxin, Davulcu, Hasan, et al.
Created Date
2016

Text classification, in the artificial intelligence domain, is an activity in which text documents are automatically classified into predefined categories using machine learning techniques. An example of this is classifying uncategorized news articles into different predefined categories such as "Business", "Politics", "Education", "Technology" , etc. In this thesis, supervised machine learning approach is followed, in which a module is first trained with pre-classified training data and then class of test data is predicted. Good feature extraction is an important step in the machine learning approach and hence the main component of this text classifier is semantic triplet based features in ...

Contributors
Karad, Ravi Chandravadan, Davulcu, Hasan, Corman, Steven, et al.
Created Date
2013

Proliferation of social media websites and discussion forums in the last decade has resulted in social media mining emerging as an effective mechanism to extract consumer patterns. Most research on social media and pharmacovigilance have concentrated on Adverse Drug Reaction (ADR) identification. Such methods employ a step of drug search followed by classification of the associated text as consisting an ADR or not. Although this method works efficiently for ADR classifications, if ADR evidence is present in users posts over time, drug mentions fail to capture such ADRs. It also fails to record additional user information which may provide an ...

Contributors
Chandrashekar, Pramod Bharadwaj Chandrashekar, Davulcu, Hasan, Gonzalez, Graciela, et al.
Created Date
2016

The overall contribution of the Minerva Initiative at ASU is to map social organizations in a multidimensional space that provides a measure of their radical or counter radical influence over the demographics of a nation. This tool serves as a simple content management system to store and track project resources like documents, images, videos and web links. It provides centralized and secure access to email conversations among project team members. Conversations are categorized into one of the seven pre-defined categories. Each category is associated with a certain set of keywords and we follow a frequency based approach for matching email ...

Contributors
Nair, Apurva, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2012

Muslim radicalism is recognized as one of the greatest security threats for the United States and the rest of the world. Use of force to eliminate specific radical entities is ineffective in containing radicalism as a whole. There is a need to understand the origin, ideologies and behavior of Radical and Counter-Radical organizations and how they shape up over a period of time. Recognizing and supporting counter-radical organizations is one of the most important steps towards impeding radical organizations. A lot of research has already been done to categorize and recognize organizations, to understand their behavior, their interactions with other ...

Contributors
Nair, Shreejay, Davulcu, Hasan, Dasgpta, Partha, et al.
Created Date
2012

Cyber systems, including IoT (Internet of Things), are increasingly being used ubiquitously to vastly improve the efficiency and reduce the cost of critical application areas, such as finance, transportation, defense, and healthcare. Over the past two decades, computing efficiency and hardware cost have dramatically been improved. These improvements have made cyber systems omnipotent, and control many aspects of human lives. Emerging trends in successful cyber system breaches have shown increasing sophistication in attacks and that attackers are no longer limited by resources, including human and computing power. Most existing cyber defense systems for IoT systems have two major issues: (1) ...

Contributors
Buduru, Arun Balaji, Yau, Sik-Sang, Ahn, Gail-Joon, et al.
Created Date
2016

With the recent expansion in the use of wearable technology, a large number of users access personal data with these smart devices. The consumer market of wearables includes smartwatches, health and fitness bands, and gesture control armbands. These smart devices enable users to communicate with each other, control other devices, relax and work out more effectively. As part of their functionality, these devices store, transmit, and/or process sensitive user personal data, perhaps biological and location data, making them an abundant source of confidential user information. Thus, prevention of unauthorized access to wearables is necessary. In fact, it is important to ...

Contributors
Mukherjee, Tamalika, Yau, Sik-Sang, Ahn, Gail-Joon, et al.
Created Date
2017

Predicting when an individual will adopt a new behavior is an important problem in application domains such as marketing and public health. This thesis examines the performance of a wide variety of social network based measurements proposed in the literature - which have not been previously compared directly. This research studies the probability of an individual becoming influenced based on measurements derived from neighborhood (i.e. number of influencers, personal network exposure), structural diversity, locality, temporal measures, cascade measures, and metadata. It also examines the ability to predict influence based on choice of the classifier and how the ratio of positive ...

Contributors
Nanda Kumar, Nikhil, Shakarian, Paulo, Sen, Arunabha, et al.
Created Date
2016

There has been a lot of research in the field of artificial intelligence about thinking machines. Alan Turing proposed a test to observe a machine's intelligent behaviour with respect to natural language conversation. The Winograd schema challenge is suggested as an alternative, to the Turing test. It needs inferencing capabilities, reasoning abilities and background knowledge to get the answer right. It involves a coreference resolution task in which a machine is given a sentence containing a situation which involves two entities, one pronoun and some more information about the situation and the machine has to come up with the right ...

Contributors
Budukh, Tejas Ulhas, Baral, Chitta, Vanlehn, Kurt, et al.
Created Date
2013

With the advent of social media (like Twitter, Facebook etc.,) people are easily sharing their opinions, sentiments and enforcing their ideologies on others like never before. Even people who are otherwise socially inactive would like to share their thoughts on current affairs by tweeting and sharing news feeds with their friends and acquaintances. In this thesis study, we chose Twitter as our main data platform to analyze shifts and movements of 27 political organizations in Indonesia. So far, we have collected over 30 million tweets and 150,000 news articles from RSS feeds of the corresponding organizations for our analysis. For ...

Contributors
Poornachandran, Sathishkumar, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2013

The purpose of this research is to efficiently analyze certain data provided and to see if a useful trend can be observed as a result. This trend can be used to analyze certain probabilities. There are three main pieces of data which are being analyzed in this research: The value for δ of the call and put option, the %B value of the stock, and the amount of time until expiration of the stock option. The %B value is the most important. The purpose of analyzing the data is to see the relationship between the variables and, given certain values, ...

Contributors
Reeves, Michael Thomas, Richa, Andrea, McCarville, Daniel, et al.
Created Date
2015

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query languages have been a subject of research for more than 30 years and are a natural fit for expressing queries that involve a temporal dimension. However, operators developed in this context cannot be directly applied to event streams. The research extends a preexisting relational framework for event stream processing to ...

Contributors
Shiva, Foruhar Ali, Urban, Susan D, Chen, Yi, et al.
Created Date
2012

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work ...

Contributors
Lee, Jang, Gonzalez, Graciela, Ye, Jieping, et al.
Created Date
2011

With the advent of Internet, the data being added online is increasing at enormous rate. Though search engines are using IR techniques to facilitate the search requests from users, the results are not effective towards the search query of the user. The search engine user has to go through certain webpages before getting at the webpage he/she wanted. This problem of Information Overload can be solved using Automatic Text Summarization. Summarization is a process of obtaining at abridged version of documents so that user can have a quick view to understand what exactly the document is about. Email threads from ...

Contributors
Nadella, Sravan, Davulcu, Hasan, Li, Baoxin, et al.
Created Date
2015

Online learning platforms such as massive online open courses (MOOCs) and intelligent tutoring systems (ITSs) have made learning more accessible and personalized. These systems generate unprecedented amounts of behavioral data and open the way for predicting students’ future performance based on their behavior, and for assessing their strengths and weaknesses in learning. This thesis attempts to mine students’ working patterns using a programming problem solving system, and build predictive models to estimate students’ learning. QuizIT, a programming solving system, was used to collect students’ problem-solving activities from a lower-division computer science programming course in 2016 Fall semester. Differential mining techniques ...

Contributors
Mandal, Partho Pratim, Hsiao, I-Han, Davulcu, Hasan, et al.
Created Date
2017

Continuous advancements in biomedical research have resulted in the production of vast amounts of scientific data and literature discussing them. The ultimate goal of computational biology is to translate these large amounts of data into actual knowledge of the complex biological processes and accurate life science models. The ability to rapidly and effectively survey the literature is necessary for the creation of large scale models of the relationships among biomedical entities as well as hypothesis generation to guide biomedical research. To reduce the effort and time spent in performing these activities, an intelligent search system is required. Even though many ...

Contributors
Kanwar, Pradeep, Davulcu, Hasan, Dinu, Valentin, et al.
Created Date
2010

In trading, volume is a measure of how much stock has been exchanged in a given period of time. Since every stock is distinctive and has an alternate measure of shares, volume can be contrasted with historical volume inside a stock to spot changes. It is likewise used to affirm value patterns, breakouts, and spot potential reversals. In my thesis, I hypothesize that the concept of trading volume can be extrapolated to social media (Twitter). The ubiquity of social media, especially Twitter, in financial market has been overly resonant in the past couple of years. With the growth of its ...

Contributors
Awasthi, Piyush, Davulcu, Hasan, Tong, Hanghang, et al.
Created Date
2015

In visualizing information hierarchies, icicle plots are efficient diagrams in that they provide the user a straightforward layout for different levels of data in a hierarchy and enable the user to compare items based on the item width. However, as the size of the hierarchy grows large, the items in an icicle plot end up being small and indistinguishable. In this thesis, by maintaining the positive characteristics of traditional icicle plots and incorporating new features such as dynamic diagram and active layer, we developed an interactive visualization that allows the user to selectively drill down or roll up to review ...

Contributors
Wu, Bi, Maciejewski, Ross, Runger, George, et al.
Created Date
2014

The pay-as-you-go economic model of cloud computing increases the visibility, traceability, and verifiability of software costs. Application developers must understand how their software uses resources when running in the cloud in order to stay within budgeted costs and/or produce expected profits. Cloud computing's unique economic model also leads naturally to an earn-as-you-go profit model for many cloud based applications. These applications can benefit from low level analyses for cost optimization and verification. Testing cloud applications to ensure they meet monetary cost objectives has not been well explored in the current literature. When considering revenues and costs for cloud applications, the ...

Contributors
Buell, Kevin, Collofello, James, Davulcu, Hasan, et al.
Created Date
2012

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management systems (DBMS). Also, unlike other client-driven work, this approach provides support for a richer set of schema updates including vertical split (normalization), horizontal split, vertical and horizontal merge (union), difference and intersection. The update process automatically generates a runtime update client from a mapping between the old the new schemas. ...

Contributors
Tyagi, Preetika, Bazzi, Rida, Candan, Kasim S, et al.
Created Date
2011

Embedded Networked Systems (ENS) consist of various devices, which are embedded into physical objects (e.g., home appliances, vehicles, buidlings, people). With rapid advances in processing and networking technologies, these devices can be fully connected and pervasive in the environment. The devices can interact with the physical world, collaborate to share resources, and provide context-aware services. This dissertation focuses on collaboration in ENS to provide smart services. However, there are several challenges because the system must be - scalable to a huge number of devices; robust against noise, loss and failure; and secure despite communicating with strangers. To address these challenges, ...

Contributors
Kim, Su Jin, Gupta, Sandeep K. S., Dasgupta, Partha, et al.
Created Date
2010

Interactive remote e-learning is one of the youngest and most popular methods that is used in today's teaching method. WebRTC, on the other hand, has become the popular concept and method in real time communication. Unlike the old fashioned Adobe Flash, user will communicate directly to each other rather than calling server as the middle man. The world is changing from plug-in to web-browser. However, the WebRTC have not been widely used for school education. By taking into consideration of the WebRTC solution for data transferring, we propose a new Cloud based interactive multimedia which enables virtual lab learning environment. ...

Contributors
Li, Qingyun, Huang, Dijiang, Davulcu, Hasan, et al.
Created Date
2014

US Senate is the venue of political debates where the federal bills are formed and voted. Senators show their support/opposition along the bills with their votes. This information makes it possible to extract the polarity of the senators. Similarly, blogosphere plays an increasingly important role as a forum for public debate. Authors display sentiment toward issues, organizations or people using a natural language. In this research, given a mixed set of senators/blogs debating on a set of political issues from opposing camps, I use signed bipartite graphs for modeling debates, and I propose an algorithm for partitioning both the opinion ...

Contributors
Gokalp, Sedat, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2015

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and user credit card purchase pattern monitoring, however the matches to the user queries are in fact plentiful and the system has to efficiently sift through these many matches to locate only the few most preferable matches. In this work, we propose a complex pattern ranking (CPR) framework for specifying top-k ...

Contributors
Wang, Xinxin, Candan, K. Selcuk, Chen, Yi, et al.
Created Date
2011

The dawn of Internet of Things (IoT) has opened the opportunity for mainstream adoption of machine learning analytics. However, most research in machine learning has focused on discovery of new algorithms or fine-tuning the performance of existing algorithms. Little exists on the process of taking an algorithm from the lab-environment into the real-world, culminating in sustained value. Real-world applications are typically characterized by dynamic non-stationary systems with requirements around feasibility, stability and maintainability. Not much has been done to establish standards around the unique analytics demands of real-world scenarios. This research explores the problem of the why so few of ...

Contributors
Shahapurkar, Som, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2016

Our research focuses on finding answers through decentralized search, for complex, imprecise queries (such as "Which is the best hair salon nearby?") in situations where there is a spatiotemporal constraint (say answer needs to be found within 15 minutes) associated with the query. In general, human networks are good in answering imprecise queries. We try to use the social network of a person to answer his query. Our research aims at designing a framework that exploits the user's social network in order to maximize the answers for a given query. Exploiting an user's social network has several challenges. The major ...

Contributors
Swaminathan, Neelakantan, Sundaram, Hari, Davulcu, Hasan, et al.
Created Date
2013

The subliminal impact of framing of social, political and environmental issues such as climate change has been studied for decades in political science and communications research. Media framing offers an “interpretative package" for average citizens on how to make sense of climate change and its consequences to their livelihoods, how to deal with its negative impacts, and which mitigation or adaptation policies to support. A line of related work has used bag of words and word-level features to detect frames automatically in text. Such works face limitations since standard keyword based features may not generalize well to accommodate surface variations ...

Contributors
Alashri, Saud, Davulcu, Hasan, Desouza, Kevin C., et al.
Created Date
2018

With the rise of Online Social Networks (OSN) in the last decade, social network analysis has become a crucial research topic. The OSN graphs have unique properties that distinguish them from other types of graphs. In this thesis, five month Tweet corpus collected from Bangladesh - between June 2016 and October 2016 is analyzed, in order to detect accounts that belong to groups. These groups consist of official and non-official twitter handles of political organizations and NGOs in Bangladesh. A set of network, temporal, spatial and behavioral features are proposed to discriminate between accounts belonging to individual twitter users, news, ...

Contributors
Gore, Chinmay Chandrashekhar, Davulcu, Hasan, Hsiao, Ihan, et al.
Created Date
2017

Social Computing is an area of computer science concerned with dynamics of communities and cultures, created through computer-mediated social interaction. Various social media platforms, such as social network services and microblogging, enable users to come together and create social movements expressing their opinions on diverse sets of issues, events, complaints, grievances, and goals. Methods for monitoring and summarizing these types of sociopolitical trends, its leaders and followers, messages, and dynamics are needed. In this dissertation, a framework comprising of community and content-based computational methods is presented to provide insights for multilingual and noisy political social media content. First, a model ...

Contributors
Alzahrani, Sultan, Davulcu, Hasan, Corman, Steve R., et al.
Created Date
2018

Stock market news and investing tips are popular topics in Twitter. In this dissertation, first I utilize a 5-year financial news corpus comprising over 50,000 articles collected from the NASDAQ website matching the 30 stock symbols in Dow Jones Index (DJI) to train a directional stock price prediction system based on news content. Next, I proceed to show that information in articles indicated by breaking Tweet volumes leads to a statistically significant boost in the hourly directional prediction accuracies for the DJI stock prices mentioned in these articles. Secondly, I show that using document-level sentiment extraction does not yield a ...

Contributors
Alostad, Hana, Davulcu, Hasan, Corman, Steven, et al.
Created Date
2016

The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences in image quality (resolution, brightness, occlusion and color), changes in camera perspective, dissimilar backgrounds and an inherent diversity of the samples themselves. Machine learning techniques like transfer learning are employed to adapt computational models across distributions. Domain adaptation is a special case of transfer learning, where knowledge from a source ...

Contributors
Demakethepalli Venkateswara, Hemanth, Panchanathan, Sethuraman, Li, Baoxin, et al.
Created Date
2017

Currently Java is making its way into the embedded systems and mobile devices like androids. The programs written in Java are compiled into machine independent binary class byte codes. A Java Virtual Machine (JVM) executes these classes. The Java platform additionally specifies the Java Native Interface (JNI). JNI allows Java code that runs within a JVM to interoperate with applications or libraries that are written in other languages and compiled to the host CPU ISA. JNI plays an important role in embedded system as it provides a mechanism to interact with libraries specific to the platform. This thesis addresses the ...

Contributors
Chandrian, Preetham, Lee, Yann-Hang, Davulcu, Hasan, et al.
Created Date
2011

Node proximity measures are commonly used for quantifying how nearby or otherwise related to two or more nodes in a graph are. Node significance measures are mainly used to find how much nodes are important in a graph. The measures of node proximity/significance have been highly effective in many predictions and applications. Despite their effectiveness, however, there are various shortcomings. One such shortcoming is a scalability problem due to their high computation costs on large size graphs and another problem on the measures is low accuracy when the significance of node and its degree in the graph are not related. ...

Contributors
Kim, Jung Hyun, Candan, K. Selcuk, Davulcu, Hasan, et al.
Created Date
2017

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems. An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms ...

Contributors
Nagendra, Mithila, Candan, Kasim Selcuk, Chen, Yi, et al.
Created Date
2014

In this thesis multiple approaches are explored to enhance sentiment analysis of tweets. A standard sentiment analysis model with customized features is first trained and tested to establish a baseline. This is compared to an existing topic based mixture model and a new proposed topic based vector model both of which use Latent Dirichlet Allocation (LDA) for topic modeling. The proposed topic based vector model has higher accuracies in terms of averaged F scores than the other two models. Dissertation/Thesis

Contributors
Baskaran, Swetha, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2016

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently ...

Contributors
Liu, Ziyang, Chen, Yi, Candan, Kasim S, et al.
Created Date
2011

Text search is a very useful way of retrieving document information from a particular website. The public generally use internet search engines over the local enterprise search engines, because the enterprise content is not cross linked and does not follow a page rank algorithm. On the other hand the enterprise search engine uses metadata information, which allows the user to specify the conditions that any retrieved document should meet. Therefore, using metadata information for searching will also be very useful. My thesis aims on developing an enterprise search engine using metadata information by providing advanced features like faceted navigation. The ...

Contributors
Sanaka, Srinivasa Raviteja, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2010

Process migration is a heavily studied research area and has a number of applications in distributed systems. Process migration means transferring a process running on one machine to another such that it resumes execution from the point at which it was suspended. The conventional approach to implement process migration is to move the entire state information of the process (including hardware context, virtual memory, files etc.) from one machine to another. Copying all the state information is costly. This thesis proposes and demonstrates a new approach of migrating a process between two cores of Intel Single Chip Cloud (SCC), an ...

Contributors
Jain, Vaibhav, Dasgupta, Partha, Shriavstava, Aviral, et al.
Created Date
2013

In supervised learning, machine learning techniques can be applied to learn a model on a small set of labeled documents which can be used to classify a larger set of unknown documents. Machine learning techniques can be used to analyze a political scenario in a given society. A lot of research has been going on in this field to understand the interactions of various people in the society in response to actions taken by their organizations. This paper talks about understanding the Russian influence on people in Latvia. This is done by building an eeffective model learnt on initial set ...

Contributors
Bollapragada, Lakshmi Gayatri Niharika, Davulcu, Hasan, Sen, Arunabha, et al.
Created Date
2016

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a subset of items presented to her. The network operator again, shows that activity to a selection of peers, and thus creating a behavioral loop. That mechanism of interaction and information flow raises some very interesting questions such as: can network operator design social signals to promote a particular activity like ...

Contributors
Le, Tien Dinh, Sundaram, Hari, Davulcu, Hasan, et al.
Created Date
2014

Browsing Twitter users, or browsers, often find it increasingly cumbersome to attach meaning to tweets that are displayed on their timeline as they follow more and more users or pages. The tweets being browsed are created by Twitter users called originators, and are of some significance to the browser who has chosen to subscribe to the tweets from the originator by following the originator. Although, hashtags are used to tag tweets in an effort to attach context to the tweets, many tweets do not have a hashtag. Such tweets are called orphan tweets and they adversely affect the experience of ...

Contributors
Mallapura Umamaheshwar, Tejas, Kambhampati, Subbarao, Liu, Huan, et al.
Created Date
2015

Bangladesh is a secular democracy with almost 90% of its population constituting of Muslims and the rest 10% constituting of the minority groups that includes Hindus, Christians, Buddhists, Ahmadi Muslims, Shia, Sufi, LGBT groups and Atheists. In recent years, Bangladesh has experienced an increase in attacks by religious extremist groups, such as IS and AQIS affiliates, hate-groups and politically motivated violence. Attacks have also become indiscriminate, with assailants targeting a wide variety of individuals, including religious minorities and foreigners. According to the telecoms regulator, the number of internet users in Bangladesh now stands at over 66.8 million reaching 41% penetration. ...

Contributors
Chhabra, Pankaj, Davulcu, Hasan, Li, Baoxin, et al.
Created Date
2017

This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log archives over a period of five years) and focused on drawing more precise metrics from different perspectives of the communication data. Also, I attempted to overcome the scalability issue by using Apache Pig libraries, which run on a MapReduce framework based Hadoop Cluster. I described four metrics based on which ...

Contributors
Motamarri, Lakshminarayana, Santanam, Raghu, Ye, Jieping, et al.
Created Date
2011

As the size and scope of valuable datasets has exploded across many industries and fields of research in recent years, an increasingly diverse audience has sought out effective tools for their large-scale data analytics needs. Over this period, machine learning researchers have also been very prolific in designing improved algorithms which are capable of finding the hidden structure within these datasets. As consumers of popular Big Data frameworks have sought to apply and benefit from these improved learning algorithms, the problems encountered with the frameworks have motivated a new generation of Big Data tools to address the shortcomings of the ...

Contributors
Krouse, Brian Richard, Ye, Jieping, Liu, Huan, et al.
Created Date
2014

Machine learning models convert raw data in the form of video, images, audio, text, etc. into feature representations that are convenient for computational process- ing. Deep neural networks have proven to be very efficient feature extractors for a variety of machine learning tasks. Generative models based on deep neural networks introduce constraints on the feature space to learn transferable and disentangled rep- resentations. Transferable feature representations help in training machine learning models that are robust across different distributions of data. For example, with the application of transferable features in domain adaptation, models trained on a source distribution can be applied ...

Contributors
Eusebio, Jose Miguel Ang, Panchanathan, Sethuraman, Davulcu, Hasan, et al.
Created Date
2018

One of the most remarkable outcomes resulting from the evolution of the web into Web 2.0, has been the propelling of blogging into a widely adopted and globally accepted phenomenon. While the unprecedented growth of the Blogosphere has added diversity and enriched the media, it has also added complexity. To cope with the relentless expansion, many enthusiastic bloggers have embarked on voluntarily writing, tagging, labeling, and cataloguing their posts in hopes of reaching the widest possible audience. Unbeknown to them, this reaching-for-others process triggers the generation of a new kind of collective wisdom, a result of shared collaboration, and the ...

Contributors
Galan, Magdiel Francisco, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2015

In recent years, there are increasing numbers of applications that use multi-variate time series data where multiple uni-variate time series coexist. However, there is a lack of systematic of multi-variate time series. This thesis focuses on (a) defining a simplified inter-related multi-variate time series (IMTS) model and (b) developing robust multi-variate temporal (RMT) feature extraction algorithm that can be used for locating, filtering, and describing salient features in multi-variate time series data sets. The proposed RMT feature can also be used for supporting multiple analysis tasks, such as visualization, segmentation, and searching / retrieving based on multi-variate time series similarities. ...

Contributors
Wang, Xiaolan, Candan, Kasim Selcuk, Sapino, Maria Luisa, et al.
Created Date
2013

Similarity search in high-dimensional spaces is popular for applications like image processing, time series, and genome data. In higher dimensions, the phenomenon of curse of dimensionality kills the effectiveness of most of the index structures, giving way to approximate methods like Locality Sensitive Hashing (LSH), to answer similarity searches. In addition to range searches and k-nearest neighbor searches, there is a need to answer negative queries formed by excluded regions, in high-dimensional data. Though there have been a slew of variants of LSH to improve efficiency, reduce storage, and provide better accuracies, none of the techniques are capable of answering ...

Contributors
Bhat, Aneesha, Candan, Kasim Selcuk, Davulcu, Hasan, et al.
Created Date
2016