## ASU Electronic Theses and Dissertations

This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.

Contributor
Language
• English
Date Range
2010 2018

## Recent Submissions

Advances in the area of ubiquitous, pervasive and wearable computing have resulted in the development of low band-width, data rich environmental and body sensor networks, providing a reliable and non-intrusive methodology for capturing activity data from humans and the environments they inhabit. Assistive technologies that promote independent living amongst elderly and individuals with cognitive impairment are a major motivating factor for sensor-based activity recognition systems. However, the process of discerning relevant activity information from these sensor streams such as accelerometers is a non-trivial task and is an on-going research area. The difficulty stems from factors such as spatio-temporal variations in …

Contributors
Chatapuram Krishnan, Narayanan, Panchanathan, Sethuraman, Sundaram, Hari, et al.
Created Date
2010

Under the framework of intelligent management of power grids by leveraging advanced information, communication and control technologies, a primary objective of this study is to develop novel data mining and data processing schemes for several critical applications that can enhance the reliability of power systems. Specifically, this study is broadly organized into the following two parts: I) spatio-temporal wind power analysis for wind generation forecast and integration, and II) data mining and information fusion of synchrophasor measurements toward secure power grids. Part I is centered around wind power generation forecast and integration. First, a spatio-temporal analysis approach for short-term wind …

Contributors
He, Miao, Zhang, Junshan, Vittal, Vijay, et al.
Created Date
2013

Real-world environments are characterized by non-stationary and continuously evolving data. Learning a classification model on this data would require a framework that is able to adapt itself to newer circumstances. Under such circumstances, transfer learning has come to be a dependable methodology for improving classification performance with reduced training costs and without the need for explicit relearning from scratch. In this thesis, a novel instance transfer technique that adapts a "Cost-sensitive" variation of AdaBoost is presented. The method capitalizes on the theoretical and functional properties of AdaBoost to selectively reuse outdated training instances obtained from a "source" domain to effectively …

Contributors
Venkatesan, Ashok, Panchanathan, Sethuraman, Li, Baoxin, et al.
Created Date
2011

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating …

Contributors
Leaman, James Robert, Gonzalez, Graciela, Baral, Chitta, et al.
Created Date
2013

Natural Language Processing is a subject that combines computer science and linguistics, aiming to provide computers with the ability to understand natural language and to develop a more intuitive human-computer interaction. The research community has developed ways to translate natural language to mathematical formalisms. It has not yet been shown, however, how to automatically translate different kinds of knowledge in English to distinct formal languages. Most of the recent work presents the problem that the translation method aims to a specific formal language or is hard to generalize. In this research, I take a first step to overcome this difficulty …

Contributors
Alvarez Gonzalez, Marcos, Baral, Chitta, Lee, Joohyung, et al.
Created Date
2010

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work …

Contributors
Lee, Jang, Gonzalez, Graciela, Ye, Jieping, et al.
Created Date
2011

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities …

Contributors
Chakraborty, Shayok, Panchanathan, Sethuraman, Balasubramanian, Vineeth N., et al.
Created Date
2013

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems. The greatest challenge in developing such systems is the subject-dependent data variations or subject-based variability in physiological and biomedical data, which leads to difference in data distributions making the task of modeling these data, using traditional machine learning algorithms, complex and challenging. As a result, despite the wide application of …

Contributors
Chattopadhyay, Rita, Panchanathan, Sethuraman, Ye, Jieping, et al.
Created Date
2013

In blindness research, the corpus callosum (CC) is the most frequently studied sub-cortical structure, due to its important involvement in visual processing. While most callosal analyses from brain structural magnetic resonance images (MRI) are limited to the 2D mid-sagittal slice, we propose a novel framework to capture a complete set of 3D morphological differences in the corpus callosum between two groups of subjects. The CCs are segmented from whole brain T1-weighted MRI and modeled as 3D tetrahedral meshes. The callosal surface is divided into superior and inferior patches on which we compute a volumetric harmonic field by solving the Laplace's …

Contributors
Xu, Liang, Wang, Yalin, Maciejewski, Ross, et al.
Created Date
2013

With the introduction of compressed sensing and sparse representation,many image processing and computer vision problems have been looked at in a new way. Recent trends indicate that many challenging computer vision and image processing problems are being solved using compressive sensing and sparse representation algorithms. This thesis assays some applications of compressive sensing and sparse representation with regards to image enhancement, restoration and classication. The first application deals with image Super-Resolution through compressive sensing based sparse representation. A novel framework is developed for understanding and analyzing some of the implications of compressive sensing in reconstruction and recovery of an image …

Contributors
Kulkarni, Naveen, Li, Baoxin, Ye, Jieping, et al.
Created Date
2011

Medical images constitute a special class of images that are captured to allow diagnosis of disease, and their "correct" interpretation is vitally important. Because they are not "natural" images, radiologists must be trained to visually interpret them. This training process includes implicit perceptual learning that is gradually acquired over an extended period of exposure to medical images. This dissertation proposes novel computational methods for evaluating and facilitating perceptual training in radiologists. Part 1 of this dissertation proposes an eye-tracking-based metric for measuring the training progress of individual radiologists. Six metrics were identified as potentially useful: time to complete task, fixation …

Contributors
Created Date
2012

A myriad of social media services are emerging in recent years that allow people to communicate and express themselves conveniently and easily. The pervasive use of social media generates massive data at an unprecedented rate. It becomes increasingly difficult for online users to find relevant information or, in other words, exacerbates the information overload problem. Meanwhile, users in social media can be both passive content consumers and active content producers, causing the quality of user-generated content can vary dramatically from excellence to abuse or spam, which results in a problem of information credibility. Trust, providing evidence about with whom users …

Contributors
Tang, Jiliang, Liu, Huan, Xue, Guoliang, et al.
Created Date
2015

The fields of pattern recognition and machine learning are on a fundamental quest to design systems that can learn the way humans do. One important aspect of human intelligence that has so far not been given sufficient attention is the capability of humans to express when they are certain about a decision, or when they are not. Machine learning techniques today are not yet fully equipped to be trusted with this critical task. This work seeks to address this fundamental knowledge gap. Existing approaches that provide a measure of confidence on a prediction such as learning algorithms based on the …

Contributors
Nallure Balasubramanian, Vineeth, Panchanathan, Sethuraman, Ye, Jieping, et al.
Created Date
2010

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our connections and the expansion of our social networks easier. The aggregation of people who share common interests forms social groups, which are fundamental parts of our social lives. Social behavioral analysis at a group level is an active research area and attracts many interests from the industry. Challenges of my …

Contributors
Wang, Xufei, Liu, Huan, Kambhampati, Subbarao, et al.
Created Date
2013

Prognostics and health management (PHM) is a method that permits the reliability of a system to be evaluated in its actual application conditions. This work involved developing a robust system to determine the advent of failure. Using the data from the PHM experiment, a model was developed to estimate the prognostic features and build a condition based system based on measured prognostics. To enable prognostics, a framework was developed to extract load parameters required for damage assessment from irregular time-load data. As a part of the methodology, a database engine was built to maintain and monitor the experimental data. This …

Contributors
Varadarajan, Gayathri, Liu, Huan, Ye, Jieping, et al.
Created Date
2010

The widespread adoption of computer vision models is often constrained by the issue of domain mismatch. Models that are trained with data belonging to one distribution, perform poorly when tested with data from a different distribution. Variations in vision based data can be attributed to the following reasons, viz., differences in image quality (resolution, brightness, occlusion and color), changes in camera perspective, dissimilar backgrounds and an inherent diversity of the samples themselves. Machine learning techniques like transfer learning are employed to adapt computational models across distributions. Domain adaptation is a special case of transfer learning, where knowledge from a source …

Contributors
Demakethepalli Venkateswara, Hemanth, Panchanathan, Sethuraman, Li, Baoxin, et al.
Created Date
2017

Drosophila melanogaster, as an important model organism, is used to explore the mechanism which governs cell differentiation and embryonic development. Understanding the mechanism will help to reveal the effects of genes on other species or even human beings. Currently, digital camera techniques make high quality Drosophila gene expression imaging possible. On the other hand, due to the advances in biology, gene expression images which can reveal spatiotemporal patterns are generated in a high-throughput pace. Thus, an automated and efficient system that can analyze gene expression will become a necessary tool for investigating the gene functions, interactions and developmental processes. One …

Contributors
Pan, Cheng, Ye, Jieping, Li, Baoxin, et al.
Created Date
2012

Understanding the complexity of temporal and spatial characteristics of gene expression over brain development is one of the crucial research topics in neuroscience. An accurate description of the locations and expression status of relative genes requires extensive experiment resources. The Allen Developing Mouse Brain Atlas provides a large number of in situ hybridization (ISH) images of gene expression over seven different mouse brain developmental stages. Studying mouse brain models helps us understand the gene expressions in human brains. This atlas collects about thousands of genes and now they are manually annotated by biologists. Due to the high labor cost of …

Contributors
Zhao, Xinlin, Ye, Jieping, Wang, Yalin, et al.
Created Date
2016

Sparse learning is a powerful tool to generate models of high-dimensional data with high interpretability, and it has many important applications in areas such as bioinformatics, medical image processing, and computer vision. Recently, the a priori structural information has been shown to be powerful for improving the performance of sparse learning models. A graph is a fundamental way to represent structural information of features. This dissertation focuses on graph-based sparse learning. The first part of this dissertation aims to integrate a graph into sparse learning to improve the performance. Specifically, the problem of feature grouping and selection over a given …

Contributors
Yang, Sen, Ye, Jieping, Wonka, Peter, et al.
Created Date
2014

The widespread adoption of mobile devices gives rise to new opportunities and challenges for authentication mechanisms. Many traditional authentication mechanisms become unsuitable for smart devices. For example, while password is widely used on computers as user identity authentication, inputting password on small smartphone screen is error-prone and not convenient. In the meantime, there are emerging demands for new types of authentication. Proximity authentication is an example, which is not needed for computers but quite necessary for smart devices. These challenges motivate me to study and develop novel authentication mechanisms specific for smart devices. In this dissertation, I am interested in …

Contributors
Li, Lingjun, Xue, Guoliang, Ahn, Gail-Joon, et al.
Created Date
2014

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from video data by decomposing various key contributing factors, such as pose, view angle, and body shape, in the generation of the image observations. Experimental results have shown that the resulting pose features extracted using the proposed methods exhibit excellent invariance properties to changes in view angles and body shapes. Furthermore, …

Contributors
Peng, Bo, Qian, Gang, Ye, Jieping, et al.
Created Date
2011

This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log archives over a period of five years) and focused on drawing more precise metrics from different perspectives of the communication data. Also, I attempted to overcome the scalability issue by using Apache Pig libraries, which run on a MapReduce framework based Hadoop Cluster. I described four metrics based on which …

Contributors
Motamarri, Lakshminarayana, Santanam, Raghu, Ye, Jieping, et al.
Created Date
2011

As the size and scope of valuable datasets has exploded across many industries and fields of research in recent years, an increasingly diverse audience has sought out effective tools for their large-scale data analytics needs. Over this period, machine learning researchers have also been very prolific in designing improved algorithms which are capable of finding the hidden structure within these datasets. As consumers of popular Big Data frameworks have sought to apply and benefit from these improved learning algorithms, the problems encountered with the frameworks have motivated a new generation of Big Data tools to address the shortcomings of the …

Contributors
Krouse, Brian Richard, Ye, Jieping, Liu, Huan, et al.
Created Date
2014

One of the most remarkable outcomes resulting from the evolution of the web into Web 2.0, has been the propelling of blogging into a widely adopted and globally accepted phenomenon. While the unprecedented growth of the Blogosphere has added diversity and enriched the media, it has also added complexity. To cope with the relentless expansion, many enthusiastic bloggers have embarked on voluntarily writing, tagging, labeling, and cataloguing their posts in hopes of reaching the widest possible audience. Unbeknown to them, this reaching-for-others process triggers the generation of a new kind of collective wisdom, a result of shared collaboration, and the …

Contributors
Galan, Magdiel Francisco, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2015

Alzheimer's Disease (AD) is the most common form of dementia observed in elderly patients and has significant social-economic impact. There are many initiatives which aim to capture leading causes of AD. Several genetic, imaging, and biochemical markers are being explored to monitor progression of AD and explore treatment and detection options. The primary focus of this thesis is to identify key biomarkers to understand the pathogenesis and prognosis of Alzheimer's Disease. Feature selection is the process of finding a subset of relevant features to develop efficient and robust learning models. It is an active research topic in diverse areas such …

Contributors
Dubey, Rashmi, Ye, Jieping, Wang, Yalin, et al.
Created Date
2012

Major Depression, clinically called Major Depressive Disorder, is a mood disorder that affects about one eighth of population in US and is projected to be the second leading cause of disability in the world by the year 2020. Recent advances in biotechnology have enabled us to collect a great variety of data which could potentially offer us a deeper understanding of the disorder as well as advancing personalized medicine. This dissertation focuses on developing methods for three different aspects of predictive analytics related to the disorder: automatic diagnosis, prognosis, and prediction of long-term treatment outcome. The data used for each …

Contributors
Nie, Zhi, Ye, Jieping, He, Jingrui, et al.
Created Date
2017

Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect the model performance. In this thesis, I focus on developing learning methods for the high-dimensional imbalanced biomedical data. In the first part, a sparse canonical correlation analysis (CCA) method is presented. The penalty terms is used to control the sparsity of the projection matrices of CCA. The sparse CCA method …

Contributors
Yang, Tao, Ye, Jieping, Wang, Yalin, et al.
Created Date
2013

Users often join an online social networking (OSN) site, like Facebook, to remain social, by either staying connected with friends or expanding social networks. On an OSN site, users generally share variety of personal information which is often expected to be visible to their friends, but sometimes vulnerable to unwarranted access from others. The recent study suggests that many personal attributes, including religious and political affiliations, sexual orientation, relationship status, age, and gender, are predictable using users' personal data from an OSN site. The majority of users want to remain socially active, and protect their personal data at the same …

Contributors
Gundecha, Pritam Sureshlal, Liu, Huan, Ahn, Gail-Joon, et al.
Created Date
2015

Social networking services have emerged as an important platform for large-scale information sharing and communication. With the growing popularity of social media, spamming has become rampant in the platforms. Complex network interactions and evolving content present great challenges for social spammer detection. Different from some existing well-studied platforms, distinct characteristics of newly emerged social media data present new challenges for social spammer detection. First, texts in social media are short and potentially linked with each other via user connections. Second, it is observed that abundant contextual information may play an important role in distinguishing social spammers and normal users. Third, …

Contributors
Hu, Xia, Liu, Huan, Kambhampati, Subbarao, et al.
Created Date
2015

Models using feature interactions have been applied successfully in many areas such as biomedical analysis, recommender systems. The popularity of using feature interactions mainly lies in (1) they are able to capture the nonlinearity of the data compared with linear effects and (2) they enjoy great interpretability. In this thesis, I propose a series of formulations using feature interactions for real world problems and develop efficient algorithms for solving them. Specifically, I first propose to directly solve the non-convex formulation of the weak hierarchical Lasso which imposes weak hierarchy on individual features and interactions but can only be approximately solved …

Contributors
Liu, Yashu, Ye, Jieping, Xue, Guoliang, et al.
Created Date
2018

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or low-level data entities. Also, additional domain knowledge may often be indispensable for uncovering the underlying semantics, but in most cases such domain knowledge is not readily available from the acquired media streams. Thus, making use of various types of contextual information and leveraging corresponding domain knowledge are vital for effectively …

Contributors
Wang, Zheshen, Li, Baoxin, Sundaram, Hari, et al.
Created Date
2011

Software-as-a-Service (SaaS) has received significant attention in recent years as major computer companies such as Google, Microsoft, Amazon, and Salesforce are adopting this new approach to develop software and systems. Cloud computing is a computing infrastructure to enable rapid delivery of computing resources as a utility in a dynamic, scalable, and virtualized manner. Computer Simulations are widely utilized to analyze the behaviors of software and test them before fully implementations. Simulation can further benefit SaaS application in a cost-effective way taking the advantages of cloud such as customizability, configurability and multi-tendency. This research introduces Modeling, Simulation and Analysis for Software-as-Service …

Contributors
Li, Wu, Tsai, Wei-Tek, Sarjoughian, Hessam, et al.
Created Date
2015

Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this …

Contributors
Sun, Liang, Ye, Jieping, Li, Baoxin, et al.
Created Date
2011

In many fields one needs to build predictive models for a set of related machine learning tasks, such as information retrieval, computer vision and biomedical informatics. Traditionally these tasks are treated independently and the inference is done separately for each task, which ignores important connections among the tasks. Multi-task learning aims at simultaneously building models for all tasks in order to improve the generalization performance, leveraging inherent relatedness of these tasks. In this thesis, I firstly propose a clustered multi-task learning (CMTL) formulation, which simultaneously learns task models and performs task clustering. I provide theoretical analysis to establish the equivalence …

Contributors
Zhou, Jiayu, Ye, Jieping, Mittelmann, Hans, et al.
Created Date
2014

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It is particularly desirable to share the domain knowledge (among the tasks) when there are a number of related tasks but only limited training data is available for each task. Modeling the relationship of multiple tasks is critical to the generalization performance of the MTL algorithms. In this dissertation, I propose …

Contributors
Chen, Jianhui, Ye, Jieping, Kumar, Sudhir, et al.
Created Date
2011

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these massive datasets lay in their complex structures, such as high-dimensionality, hierarchy, multi-modality, heterogeneity and data uncertainty. Besides the statistical challenges, the associated computational approaches are also considered essential in achieving efficiency, effectiveness, as well as the numerical stability in practice. On the other hand, some recent developments in statistics and …

Contributors
Huang, Shuai, Li, Jing, Li, Jing, et al.
Created Date
2012

Detecting anatomical structures, such as the carina, the pulmonary trunk and the aortic arch, is an important step in designing a CAD system of detection Pulmonary Embolism. The presented CAD system gets rid of the high-level prior defined knowledge to become a system which can easily extend to detect other anatomic structures. The system is based on a machine learning algorithm --- AdaBoost and a general feature --- Haar. This study emphasizes on off-line and on-line AdaBoost learning. And in on-line AdaBoost, the thesis further deals with extremely imbalanced condition. The thesis first reviews several knowledge-based detection methods, which are …

Contributors
Wu, Hong, Liang, Jianming, Farin, Gerald, et al.
Created Date
2011

The rapid growth in the high-throughput technologies last few decades makes the manual processing of the generated data to be impracticable. Even worse, the machine learning and data mining techniques seemed to be paralyzed against these massive datasets. High-dimensionality is one of the most common challenges for machine learning and data mining tasks. Feature selection aims to reduce dimensionality by selecting a small subset of the features that perform at least as good as the full feature set. Generally, the learning performance, e.g. classification accuracy, and algorithm complexity are used to measure the quality of the algorithm. Recently, the stability …

Contributors
Alelyani, Salem, Liu, Huan, Xue, Guoliang, et al.
Created Date
2013

Nowadays, wireless communications and networks have been widely used in our daily lives. One of the most important topics related to networking research is using optimization tools to improve the utilization of network resources. In this dissertation, we concentrate on optimization for resource-constrained wireless networks, and study two fundamental resource-allocation problems: 1) distributed routing optimization and 2) anypath routing optimization. The study on the distributed routing optimization problem is composed of two main thrusts, targeted at understanding distributed routing and resource optimization for multihop wireless networks. The first thrust is dedicated to understanding the impact of full-duplex transmission on wireless …

Contributors
Fang, Xi, Xue, Guoliang, Yau, Sik-Sang, et al.
Created Date
2013

The rapid urban expansion has greatly extended the physical boundary of our living area, along with a large number of POIs (points of interest) being developed. A POI is a specific location (e.g., hotel, restaurant, theater, mall) that a user may find useful or interesting. When exploring the city and neighborhood, the increasing number of POIs could enrich people's daily life, providing them with more choices of life experience than before, while at the same time also brings the problem of "curse of choices", resulting in the difficulty for a user to make a satisfied decision on "where to go" …

Contributors
Gao, Huiji, Liu, Huan, Xue, Guoliang, et al.
Created Date
2014

Online social networks, including Twitter, have expanded in both scale and diversity of content, which has created significant challenges to the average user. These challenges include finding relevant information on a topic and building social ties with like-minded individuals. The fundamental question addressed by this thesis is if an individual can leverage social network to search for information that is relevant to him or her. We propose to answer this question by developing computational algorithms that analyze a user's social network. The features of the social network we analyze include the network topology and member communications of a specific user's …

Contributors
Xu, Ke, Sundaram, Hari, Ye, Jieping, et al.
Created Date
2010

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to the interface language of the system. NL2KR (Natural language to knowledge representation) v.1 system is a prototype of such a system. It is a learning based system that learns new meanings of words in terms of lambda-calculus formulas given an initial lexicon of some words and their meanings and a …

Contributors
Kumbhare, Kanchan R., Baral, Chitta, Ye, Jieping, et al.
Created Date
2013

Large-scale $\ell_1$-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. In many applications, it remains challenging to apply the sparse learning model to large-scale problems that have massive data samples with high-dimensional features. One popular and promising strategy is to scaling up the optimization problem in parallel. Parallel solvers run multiple cores on a shared memory system or a distributed environment to speed up the computation, while the practical usage is limited by the huge dimension in the feature space and synchronization problems. In this dissertation, I carry …

Contributors
Li, Qingyang, Ye, Jieping, Xue, Guoliang, et al.
Created Date
2017

A story is defined as "an actor(s) taking action(s) that culminates in a resolution(s)''. I present novel sets of features to facilitate story detection among text via supervised classification and further reveal different forms within stories via unsupervised clustering. First, I investigate the utility of a new set of semantic features compared to standard keyword features combined with statistical features, such as density of part-of-speech (POS) tags and named entities, to develop a story classifier. The proposed semantic features are based on <Subject, Verb, Object> triplets that can be extracted using a shallow parser. Experimental results show that a model …

Contributors
Ceran, Saadet Betul, Davulcu, Hasan, Corman, Steven R, et al.
Created Date
2016

Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of their impressive performance in many applications. In the literature, many of such sparse learning methods focus on designing or application of some learning techniques for certain feature space without much explicit consideration on possible interaction between the underlying semantics of the visual data and the employed learning technique. Rich semantic information in most visual data, if properly incorporated into algorithm …

Contributors
Zhang, Qiang, Li, Baoxin, Turaga, Pavan, et al.
Created Date
2014

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous data, it is interesting to design efficient machine learning models that are capable of performing variable selection and feature group (data source) selection simultaneously (a.k.a bi-level selection). In this thesis, I carry out research along this direction with a particular focus on designing efficient optimization algorithms. I start with a …

Contributors
Xiang, Shuo, Ye, Jieping, Mittelmann, Hans D, et al.
Created Date
2014

Sparse learning is a technique in machine learning for feature selection and dimensionality reduction, to find a sparse set of the most relevant features. In any machine learning problem, there is a considerable amount of irrelevant information, and separating relevant information from the irrelevant information has been a topic of focus. In supervised learning like regression, the data consists of many features and only a subset of the features may be responsible for the result. Also, the features might require special structural requirements, which introduces additional complexity for feature selection. The sparse learning package, provides a set of algorithms for …

Contributors
Thulasiram, Ramesh L., Ye, Jieping, Xue, Guoliang, et al.
Created Date
2011

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I …

Contributors
Yuan, Lei, Ye, Jieping, Wang, Yalin, et al.
Created Date
2013

Imaging genetics is an emerging and promising technique that investigates how genetic variations affect brain development, structure, and function. By exploiting disorder-related neuroimaging phenotypes, this class of studies provides a novel direction to reveal and understand the complex genetic mechanisms. Oftentimes, imaging genetics studies are challenging due to the relatively small number of subjects but extremely high-dimensionality of both imaging data and genomic data. In this dissertation, I carry on my research on imaging genetics with particular focuses on two tasks---building predictive models between neuroimaging data and genomic data, and identifying disorder-related genetic risk factors through image-based biomarkers. To this …

Contributors
Yang, Tao, Ye, Jieping, Xue, Guoliang, et al.
Created Date
2017

Multidimensional data have various representations. Thanks to their simplicity in modeling multidimensional data and the availability of various mathematical tools (such as tensor decompositions) that support multi-aspect analysis of such data, tensors are increasingly being used in many application domains including scientific data management, sensor data management, and social network data analysis. Relational model, on the other hand, enables semantic manipulation of data using relational operators, such as projection, selection, Cartesian-product, and set operators. For many multidimensional data applications, tensor operations as well as relational operations need to be supported throughout the data life cycle. In this thesis, we introduce …

Contributors
Kim, Mijung, Candan, K. Selcuk, Davulcu, Hasan, et al.
Created Date
2014

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this …

Contributors
Carey, Maurice, Colbourn, Charles, Collofello, James, et al.
Created Date
2013

Despite significant advances in digital pathology and automation sciences, current diagnostic practice for cancer detection primarily relies on a qualitative manual inspection of tissue architecture and cell and nuclear morphology in stained biopsies using low-magnification, two-dimensional (2D) brightfield microscopy. The efficacy of this process is limited by inter-operator variations in sample preparation and imaging, and by inter-observer variability in assessment. Over the past few decades, the predictive value quantitative morphology measurements derived from computerized analysis of micrographs has been compromised by the inability of 2D microscopy to capture information in the third dimension, and by the anisotropic spatial resolution inherent …

Contributors
Nandakumar, Vivek, Meldrum, Deirdre R, Nelson, Alan C, et al.
Created Date
2013

Cloud computing has received significant attention recently as it is a new computing infrastructure to enable rapid delivery of computing resources as a utility in a dynamic, scalable, and visualized manner. SaaS (Software-as-a-Service) provide a now paradigm in cloud computing, which goal is to provide an effective and intelligent way to support end users' on-demand requirements to computing resources, including maturity levels of customizable, multi-tenancy and scalability. To meet requirements of on-demand, my thesis discusses several critical research problems and proposed solutions using real application scenarios. Service providers receive multiple requests from customers, how to prioritize those service requests to …

Contributors
Shao, Qihong, Tsai, Wei-Tek, Askin, Ronald, et al.
Created Date
2011

Discriminative learning when training and test data belong to different distributions is a challenging and complex task. Often times we have very few or no labeled data from the test or target distribution, but we may have plenty of labeled data from one or multiple related sources with different distributions. Due to its capability of migrating knowledge from related domains, transfer learning has shown to be effective for cross-domain learning problems. In this dissertation, I carry out research along this direction with a particular focus on designing efficient and effective algorithms for BioImaging and Bilingual applications. Specifically, I propose deep …

Contributors
Sun, Qian, Ye, Jieping, Ye, Jieping, et al.
Created Date
2015

With the rise of social media, hundreds of millions of people spend countless hours all over the globe on social media to connect, interact, share, and create user-generated data. This rich environment provides tremendous opportunities for many different players to easily and effectively reach out to people, interact with them, influence them, or get their opinions. There are two pieces of information that attract most attention on social media sites, including user preferences and interactions. Businesses and organizations use this information to better understand and therefore provide customized services to social media users. This data can be used for different …

Contributors
Abbasi, Mohammad Ali, Liu, Huan, Davulcu, Hasan, et al.
Created Date
2014