Skip to main content

ASU Electronic Theses and Dissertations


This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.


Contributor
Subject
Date Range
2011 2019


Dynamic software update (DSU) enables a program to update while it is running. DSU aims to minimize the loss due to program downtime for updates. Usually DSU is done in three steps: suspending the execution of an old program, mapping the execution state from the old program to a new one, and resuming execution of the new program with the mapped state. The semantic correctness of DSU depends largely on the state mapping which is mostly composed by developers manually nowadays. However, the manual construction of a state mapping does not necessarily ensure sound and dependable state mapping. This dissertation …

Contributors
Shen, Jun, Bazzi, Rida A, Fainekos, Georgios, et al.
Created Date
2015

One of the main goals of computer architecture design is to improve performance without much increase in the power consumption. It cannot be achieved by adding increasingly complex intelligent schemes in the hardware, since they will become increasingly less power-efficient. Therefore, parallelism comes up as the solution. In fact, the irrevocable trend of computer design in near future is still to keep increasing the number of cores while reducing the operating frequency. However, it is not easy to scale number of cores. One important challenge is that existing cores consume too much power. Another challenge is that cache-based memory hierarchy …

Contributors
Lu, Jing, Shrivastava, Aviral, Sarjoughian, Hessam, et al.
Created Date
2019

In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Current design frameworks for developing stream applications on SPM enhanced embedded architectures typically do not include a compiler that can perform automatic partitioning, mapping and scheduling under limited on-chip SPM capacities and memory access delays. Consequently, many designs are implemented manually, which …

Contributors
Che, Weijia, Chatha, Karam Singh, Chatha, Karam Singh, et al.
Created Date
2012

The holy grail of computer hardware across all market segments has been to sustain performance improvement at the same pace as silicon technology scales. As the technology scales and the size of transistors shrinks, the power consumption and energy usage per transistor decrease. On the other hand, the transistor density increases significantly by technology scaling. Due to technology factors, the reduction in power consumption per transistor is not sufficient to offset the increase in power consumption per unit area. Therefore, to improve performance, increasing energy-efficiency must be addressed at all design levels from circuit level to application and algorithm levels. …

Contributors
Hamzeh, Mahdi, Vrudhula, Sarma, Gopalakrishnan, Kailash, et al.
Created Date
2015

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable …

Contributors
Bai, Ke, Shrivastava, Aviral, Chatha, Karamvir, et al.
Created Date
2014

The Internet of Things ecosystem has spawned a wide variety of embedded real-time systems that complicate the identification and resolution of bugs in software. The methods of concurrent checkpoint provide a means to monitor the application state with the ability to replay the execution on like hardware and software, without holding off and delaying the execution of application threads. In this thesis, it is accomplished by monitoring physical memory of the application using a soft-dirty page tracker and measuring the various types of overhead when employing concurrent checkpointing. The solution presented is an advancement of the Checkpoint and Replay In …

Contributors
Prinke, Michael L, Lee, Yann-Hang, Shrivastava, Aviral, et al.
Created Date
2018

Software Managed Manycore (SMM) architectures - in which each core has only a scratch pad memory (instead of caches), - are a promising solution for scaling memory hierarchy to hundreds of cores. However, in these architectures, the code and data of the tasks mapped to the cores must be explicitly managed in the software by the compiler. State-of-the-art compiler techniques for SMM architectures require inter-procedural information and analysis. A call graph of the program does not have enough information, and Global CFG, i.e., combining all the control flow graphs of the program has too much information, and becomes too big. …

Contributors
Holton, Bryce Harvard, Shrivastava, Aviral, Collofello, James, et al.
Created Date
2014

A benchmark suite that is representative of the programs a processor typically executes is necessary to understand a processor's performance or energy consumption characteristics. The first contribution of this work addresses this need for mobile platforms with MobileBench, a selection of representative smartphone applications. In smartphones, like any other portable computing systems, energy is a limited resource. Based on the energy characterization of a commercial widely-used smartphone, application cores are found to consume a significant part of the total energy consumption of the device. With this insight, the subsequent part of this thesis focuses on the portion of energy that …

Contributors
Pandiyan, Dhinakaran, Wu, Carole-Jean, Shrivastava, Aviral, et al.
Created Date
2014

Most embedded applications are constructed with multiple threads to handle concurrent events. For optimization and debugging of the programs, dynamic program analysis is widely used to collect execution information while the program is running. Unfortunately, the non-deterministic behavior of multithreaded embedded software makes the dynamic analysis difficult. In addition, instrumentation overhead for gathering execution information may change the execution of a program, and lead to distorted analysis results, i.e., probe effect. This thesis presents a framework that tackles the non-determinism and probe effect incurred in dynamic analysis of embedded software. The thesis largely consists of three parts. First of all, …

Contributors
Song, Young Wn, Lee, Yann-Hang, Shrivastava, Aviral, et al.
Created Date
2015

In this thesis we deal with the problem of temporal logic robustness estimation. We present a dynamic programming algorithm for the robust estimation problem of Metric Temporal Logic (MTL) formulas regarding a finite trace of time stated sequence. This algorithm not only tests if the MTL specification is satisfied by the given input which is a finite system trajectory, but also quantifies to what extend does the sequence satisfies or violates the MTL specification. The implementation of the algorithm is the DP-TALIRO toolbox for MATLAB. Currently it is used as the temporal logic robust computing engine of S-TALIRO which is …

Contributors
Yang, Hengyi, Fainekos, Georgios, Sarjoughian, Hessam, et al.
Created Date
2013

As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a …

Contributors
Rawat, Tushar Shishpal, Shrivastava, Aviral, Dasgupta, Partha, et al.
Created Date
2014

The use of energy-harvesting in a wireless sensor network (WSN) is essential for situations where it is either difficult or not cost effective to access the network's nodes to replace the batteries. In this paper, the problems involved in controlling an active sensor network that is powered both by batteries and solar energy are investigated. The objective is to develop control strategies to maximize the quality of coverage (QoC), which is defined as the minimum number of targets that must be covered and reported over a 24 hour period. Assuming a time varying solar profile, the problem is to optimally …

Contributors
Gaudette, Benjamin David, Vrudhula, Sarma, Shrivastava, Aviral, et al.
Created Date
2012

Cyber-Physical Systems (CPS) are being used in many safety-critical applications. Due to the important role in virtually every aspect of human life, it is crucial to make sure that a CPS works properly before its deployment. However, formal verification of CPS is a computationally hard problem. Therefore, lightweight verification methods such as testing and monitoring of the CPS are considered in the industry. The formal representation of the CPS requirements is a challenging task. In addition, checking the system outputs with respect to requirements is a computationally complex problem. In this dissertation, these problems for the verification of CPS are …

Contributors
Dokhanchi, Adel, Fainekos, Georgios, Lee, Yann-Hang, et al.
Created Date
2017

Several decades of transistor technology scaling has brought the threat of soft errors to modern embedded processors. Several techniques have been proposed to protect these systems from soft errors. However, their effectiveness in protecting the computation cannot be ascertained without accurate and quantitative estimation of system reliability. Vulnerability -- a metric that defines the probability of system-failure (reliability) through analytical models -- is the most effective mechanism for our current estimation and early design space exploration needs. Previous vulnerability estimation tools are based around the Sim-Alpha simulator which has been to shown to have several limitations. In this thesis, I …

Contributors
Tanikella, Srinivas Karthik, Shrivastava, Aviral, Bazzi, Rida, et al.
Created Date
2016

Designers employ a variety of modeling theories and methodologies to create functional models of discrete network systems. These dynamical models are evaluated using verification and validation techniques throughout incremental design stages. Models created for these systems should directly represent their growing complexity with respect to composition and heterogeneity. Similar to software engineering practices, incremental model design is required for complex system design. As a result, models at early increments are significantly simpler relative to real systems. While experimenting (verification or validation) on models at early increments are computationally less demanding, the results of these experiments are less trustworthy and less …

Contributors
Gholami, Soroosh, Sarjoughian, Hessam S, Fainekos, Georgios, et al.
Created Date
2017

Performance improvements have largely followed Moore's Law due to the help from technology scaling. In order to continue improving performance, power-efficiency must be reduced. Better technology has improved power-efficiency, but this has a limit. Multi-core architectures have been shown to be an additional aid to this crusade of increased power-efficiency. Accelerators are growing in popularity as the next means of achieving power-efficient performance. Accelerators such as Intel SSE are ideal, but prove difficult to program. FPGAs, on the other hand, are less efficient due to their fine-grained reconfigurability. A middle ground is found in CGRAs, which are highly power-efficient, but …

Contributors
Pager, Jared, Shrivastava, Aviral, Gupta, Sandeep, et al.
Created Date
2011

Soft errors are considered as a key reliability challenge for sub-nano scale transistors. An ideal solution for such a challenge should ultimately eliminate the effect of soft errors from the microprocessor. While forward recovery techniques achieve fast recovery from errors by simply voting out the wrong values, they incur the overhead of three copies execution. Backward recovery techniques only need two copies of execution, but suffer from check-pointing overhead. In this work I explored the efficiency of integrating check-pointing into the application and the effectiveness of recovery that can be performed upon it. After evaluating the available fine-grained approaches to …

Contributors
Lokam, Sai Ram Dheeraj, Shrivastava, Aviral, Clark, Lawrence T, et al.
Created Date
2016

With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures. First, not all parallel threads have a uniform amount of workload to fully utilize GPU’s computation ability, leading to a sub-optimal performance problem, called warp criticality. To mitigate the degree of warp criticality, I propose a Criticality-Aware Warp Acceleration mechanism, called CAWA. CAWA predicts and accelerates the critical warp execution by allocating larger execution …

Contributors
Lee, Shin-Ying, Wu, Carole-Jean, Chakrabarti, Chaitali, et al.
Created Date
2017

The availability of a wide range of general purpose as well as accelerator cores on modern smartphones means that a significant number of applications can be executed on a smartphone simultaneously, resulting in an ever increasing demand on the memory subsystem. While the increased computation capability is intended for improving user experience, memory requests from each concurrent application exhibit unique memory access patterns as well as specific timing constraints. If not considered, this could lead to significant memory contention and result in lowered user experience. This work first analyzes the impact of memory degradation caused by the interference at the …

Contributors
SHINGARI, DAVESH, Wu, Carole-Jean, Vrudhula, Sarma, et al.
Created Date
2016

General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions. Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in …

Contributors
Arunkumar, Akhil, Wu, Carole-Jean, Shrivastava, Aviral, et al.
Created Date
2018

Error correcting systems have put increasing demands on system designers, both due to increasing error correcting requirements and higher throughput targets. These requirements have led to greater silicon area, power consumption and have forced system designers to make trade-offs in Error Correcting Code (ECC) functionality. Solutions to increase the efficiency of ECC systems are very important to system designers and have become a heavily researched area. Many such systems incorporate the Bose-Chaudhuri-Hocquenghem (BCH) method of error correcting in a multi-channel configuration. BCH is a commonly used code because of its configurability, low storage overhead, and low decoding requirements when compared …

Contributors
Dill, Russell, Shrivastava, Aviral, Oh, Hyunok, et al.
Created Date
2015

Caches pose a serious limitation in scaling many-core architectures since the demand of area and power for maintaining cache coherence increases rapidly with the number of cores. Scratch-Pad Memories (SPMs) provide a cheaper and lower power alternative that can be used to build a more scalable many-core architecture. The trade-off of substituting SPMs for caches is however that the data must be explicitly managed in software. Heap management on SPM poses a major challenge due to the highly dynamic nature of of heap data access. Most existing heap management techniques implement a software caching scheme on SPM, emulating the behavior …

Contributors
Lin, Jinn-Pean, Shrivastava, Aviral, Ren, Fengbo, et al.
Created Date
2017

Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of achieving high performance at low power consumption. While CGRAs can efficiently accelerate loop kernels, accelerating loops with control flow (loops with if-then-else structures) is quite challenging. Techniques that handle control flow execution in CGRAs generally use predication. Such techniques execute both branches of an if-then-else structure and select outcome of either branch to commit based on the result of the conditional. This results in poor utilization of CGRA s computational resources. Dual-issue scheme which is the state of the art technique for control flow fetches instructions from both paths of …

Contributors
Rajendran Radhika, Shri Hari, Shrivastava, Aviral, Christen, Jennifer Blain, et al.
Created Date
2014

Rapid technology scaling, the main driver of the power and performance improvements of computing solutions, has also rendered our computing systems extremely susceptible to transient errors called soft errors. Among the arsenal of techniques to protect computation from soft errors, Control Flow Checking (CFC) based techniques have gained a reputation of effective, yet low-cost protection mechanism. The basic idea is that, there is a high probability that a soft-fault in program execution will eventually alter the control flow of the program. Therefore just by making sure that the control flow of the program is correct, significant protection can be achieved. …

Contributors
Rhisheekesan, Abhishek, Shrivastava, Aviral, Colbourn, Charles Joseph, et al.
Created Date
2013

Coarse-Grained Reconfigurable Architectures (CGRA) are a promising fabric for improving the performance and power-efficiency of computing devices. CGRAs are composed of components that are well-optimized to execute loops and rotating register file is an example of such a component present in CGRAs. Due to the rotating nature of register indexes in rotating register file, it is very challenging, if at all possible, to hold and properly index memory addresses (pointers) and static values. In this Thesis, different structures for CGRA register files are investigated. Those structures are experimentally compared in terms of performance of mapped applications, design frequency, and area. …

Contributors
SALUJA, Dipal, Shrivastava, Aviral, Lee, Yann-Hang, et al.
Created Date
2014

Coarse-grained Reconfigurable Arrays (CGRAs) are promising accelerators capable of accelerating even non-parallel loops and loops with low trip-counts. One challenge in compiling for CGRAs is to manage both recurring and nonrecurring variables in the register file (RF) of the CGRA. Although prior works have managed recurring variables via rotating RF, they access the nonrecurring variables through either a global RF or from a constant memory. The former does not scale well, and the latter degrades the mapping quality. This work proposes a hardware-software codesign approach in order to manage all the variables in a local nonrotating RF. Hardware provides modulo …

Contributors
Dave, Shail, Shrivastava, Aviral, Ren, Fengbo, et al.
Created Date
2016

Caches have long been used to reduce memory access latency. However, the increased complexity of cache coherence brings significant challenges in processor design as the number of cores increases. While making caches scalable is still an important research problem, some researchers are exploring the possibility of a more power-efficient SRAM called scratchpad memories or SPMs. SPMs consume significantly less area, and are more energy-efficient per access than caches, and therefore make the design of on-chip memories much simpler. Unlike caches, which fetch data from memories automatically, an SPM requires explicit instructions for data transfers. SPM-only architectures are thus named as …

Contributors
Cai, Jian, Shrivastava, Aviral, Wu, Carole, et al.
Created Date
2017

Thanks to continuous technology scaling, intelligent, fast and smaller digital systems are now available at affordable costs. As a result, digital systems have found use in a wide range of application areas that were not even imagined before, including medical (e.g., MRI, remote or post-operative monitoring devices, etc.), automotive (e.g., adaptive cruise control, anti-lock brakes, etc.), security systems (e.g., residential security gateways, surveillance devices, etc.), and in- and out-of-body sensing (e.g., capsule swallowed by patients measuring digestive system pH, heart monitors, etc.). Such computing systems, which are completely embedded within the application, are called embedded systems, as opposed to general …

Contributors
Jeyapaul, Reiley, Shrivastava, Aviral, Vrudhula, Sarma, et al.
Created Date
2012

Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can provide commercial-off-the-shelf microprocessors with adaptive and scalable reliability. Among all software-level error resilience solutions, in-application instruction replication based approaches have been widely used and are deemed to be the most effective. However, existing instruction-based replication schemes only protect some part of computations i.e. arithmetic and logical instructions and leave …

Contributors
Didehban, Moslem, Shrivastava, Aviral, Wu, Carole-Jean, et al.
Created Date
2018

Software has a great impact on the energy efficiency of any computing system--it can manage the components of a system efficiently or inefficiently. The impact of software is amplified in the context of a wearable computing system used for activity recognition. The design space this platform opens up is immense and encompasses sensors, feature calculations, activity classification algorithms, sleep schedules, and transmission protocols. Design choices in each of these areas impact energy use, overall accuracy, and usefulness of the system. This thesis explores methods software can influence the trade-off between energy consumption and system accuracy. In general the more energy …

Contributors
Boyd, Jeffrey, Sundaram, Hari, Li, Baoxin, et al.
Created Date
2014

Limited Local Memory (LLM) multicore architectures are promising powerefficient architectures will scalable memory hierarchy. In LLM multicores, each core can access only a small local memory. Accesses to a large shared global memory can only be made explicitly through Direct Memory Access (DMA) operations. Standard Template Library (STL) is a powerful programming tool and is widely used for software development. STLs provide dynamic data structures, algorithms, and iterators for vector, deque (double-ended queue), list, map (red-black tree), etc. Since the size of the local memory is limited in the cores of the LLM architecture, and data transfer is not automatically …

Contributors
Lu, Di, Shrivastava, Aviral, Chatha, Karamvir, et al.
Created Date
2012

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements. This work presents StreamWorks, …

Contributors
Panda, Amrit Kumar, Chatha, Karam S., Wu, Carole-Jean, et al.
Created Date
2014

Energy consumption of the data centers worldwide is rapidly growing fueled by ever-increasing demand for Cloud computing applications ranging from social networking to e-commerce. Understandably, ensuring energy-efficiency and sustainability of Cloud data centers without compromising performance is important for both economic and environmental reasons. This dissertation develops a cyber-physical multi-tier server and workload management architecture which operates at the local and the global (geo-distributed) data center level. We devise optimization frameworks for each tier to optimize energy consumption, energy cost and carbon footprint of the data centers. The proposed solutions are aware of various energy management tradeoffs that manifest due …

Contributors
Abbasi, Zahra, Gupta, Sandeep K. S., Chakrabarti, Chaitali, et al.
Created Date
2014

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a constellation of heterogeneous processing elements (PEs) (general purpose PEs and application-specific integrated circuits (ASICS)). A typical MPSoC will be composed of a application processor, such as an ARM Coretex-A9 with cache coherent memory hierarchy, and several application sub-systems. Each of these sub-systems are composed of highly optimized instruction processors, graphics/DSP …

Contributors
Leary, Glenn, Chatha, Karamvir S, Vrudhula, Sarma, et al.
Created Date
2013

Threshold logic has been studied by at least two independent group of researchers. One group of researchers studied threshold logic with the intention of building threshold logic circuits. The earliest research to this end was done in the 1960's. The major work at that time focused on studying mathematical properties of threshold logic as no efficient circuit implementations of threshold logic were available. Recently many post-CMOS (Complimentary Metal Oxide Semiconductor) technologies that implement threshold logic have been proposed along with efficient CMOS implementations. This has renewed the effort to develop efficient threshold logic design automation techniques. This work contributes to …

Contributors
Linge Gowda, Tejaswi, Vrudhula, Sarma, Shrivastava, Aviral, et al.
Created Date
2012

Cyber-physical systems and hard real-time systems have strict timing constraints that specify deadlines until which tasks must finish their execution. Missing a deadline can cause unexpected outcome or endanger human lives in safety-critical applications, such as automotive or aeronautical systems. It is, therefore, of utmost importance to obtain and optimize a safe upper bound of each task’s execution time or the worst-case execution time (WCET), to guarantee the absence of any missed deadline. Unfortunately, conventional microarchitectural components, such as caches and branch predictors, are only optimized for average-case performance and often make WCET analysis complicated and pessimistic. Caches especially have …

Contributors
Kim, Yooseong, Shrivastava, Aviral, Broman, David, et al.
Created Date
2017