Skip to main content

ASU Electronic Theses and Dissertations


This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.


Subject
Date Range
2011 2019


Performance improvements have largely followed Moore's Law due to the help from technology scaling. In order to continue improving performance, power-efficiency must be reduced. Better technology has improved power-efficiency, but this has a limit. Multi-core architectures have been shown to be an additional aid to this crusade of increased power-efficiency. Accelerators are growing in popularity as the next means of achieving power-efficient performance. Accelerators such as Intel SSE are ideal, but prove difficult to program. FPGAs, on the other hand, are less efficient due to their fine-grained reconfigurability. A middle ground is found in CGRAs, which are highly power-efficient, but …

Contributors
Pager, Jared, Shrivastava, Aviral, Gupta, Sandeep, et al.
Created Date
2011

Limited Local Memory (LLM) multicore architectures are promising powerefficient architectures will scalable memory hierarchy. In LLM multicores, each core can access only a small local memory. Accesses to a large shared global memory can only be made explicitly through Direct Memory Access (DMA) operations. Standard Template Library (STL) is a powerful programming tool and is widely used for software development. STLs provide dynamic data structures, algorithms, and iterators for vector, deque (double-ended queue), list, map (red-black tree), etc. Since the size of the local memory is limited in the cores of the LLM architecture, and data transfer is not automatically …

Contributors
Lu, Di, Shrivastava, Aviral, Chatha, Karamvir, et al.
Created Date
2012

Stream processing has emerged as an important model of computation especially in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements. This work presents StreamWorks, …

Contributors
Panda, Amrit Kumar, Chatha, Karam S., Wu, Carole-Jean, et al.
Created Date
2014

Coarse Grain Reconfigurable Arrays (CGRAs) are promising accelerators capable of achieving high performance at low power consumption. While CGRAs can efficiently accelerate loop kernels, accelerating loops with control flow (loops with if-then-else structures) is quite challenging. Techniques that handle control flow execution in CGRAs generally use predication. Such techniques execute both branches of an if-then-else structure and select outcome of either branch to commit based on the result of the conditional. This results in poor utilization of CGRA s computational resources. Dual-issue scheme which is the state of the art technique for control flow fetches instructions from both paths of …

Contributors
Rajendran Radhika, Shri Hari, Shrivastava, Aviral, Christen, Jennifer Blain, et al.
Created Date
2014

Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as the part of its stack. The I/O scheduler, is a part of the Linux kernel, responsible for scheduling data requests to the internal and the external memory devices that are attached to the mobile systems. The usage of solid state drives in the Android tablet has also seen a rise …

Contributors
Sivasankaran, Jeevan Prasath, Lee, Yann Hang, Wu, Carole-Jean, et al.
Created Date
2014

A benchmark suite that is representative of the programs a processor typically executes is necessary to understand a processor's performance or energy consumption characteristics. The first contribution of this work addresses this need for mobile platforms with MobileBench, a selection of representative smartphone applications. In smartphones, like any other portable computing systems, energy is a limited resource. Based on the energy characterization of a commercial widely-used smartphone, application cores are found to consume a significant part of the total energy consumption of the device. With this insight, the subsequent part of this thesis focuses on the portion of energy that …

Contributors
Pandiyan, Dhinakaran, Wu, Carole-Jean, Shrivastava, Aviral, et al.
Created Date
2014

As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a …

Contributors
Rawat, Tushar Shishpal, Shrivastava, Aviral, Dasgupta, Partha, et al.
Created Date
2014

There has been exciting progress in the area of Unmanned Aerial Vehicles (UAV) in the last decade, especially for quadrotors due to their nature of easy manipulation and simple structure. A lot of research has been done on achieving autonomous and robust control for quadrotors. Recently researchers have been utilizing linear temporal logic as mission specification language for robot motion planning due to its expressiveness and scalability. Several algorithms have been proposed to achieve autonomous temporal logic planning. Also, several frameworks are designed to compose those discrete planners and continuous controllers to make sure the actual trajectory also satisfies the …

Contributors
Zhang, Xiaotong, Fainekos, Georgios, Ben Amor, Heni, et al.
Created Date
2016

The availability of a wide range of general purpose as well as accelerator cores on modern smartphones means that a significant number of applications can be executed on a smartphone simultaneously, resulting in an ever increasing demand on the memory subsystem. While the increased computation capability is intended for improving user experience, memory requests from each concurrent application exhibit unique memory access patterns as well as specific timing constraints. If not considered, this could lead to significant memory contention and result in lowered user experience. This work first analyzes the impact of memory degradation caused by the interference at the …

Contributors
SHINGARI, DAVESH, Wu, Carole-Jean, Vrudhula, Sarma, et al.
Created Date
2016

Soft errors are considered as a key reliability challenge for sub-nano scale transistors. An ideal solution for such a challenge should ultimately eliminate the effect of soft errors from the microprocessor. While forward recovery techniques achieve fast recovery from errors by simply voting out the wrong values, they incur the overhead of three copies execution. Backward recovery techniques only need two copies of execution, but suffer from check-pointing overhead. In this work I explored the efficiency of integrating check-pointing into the application and the effectiveness of recovery that can be performed upon it. After evaluating the available fine-grained approaches to …

Contributors
Lokam, Sai Ram Dheeraj, Shrivastava, Aviral, Clark, Lawrence T, et al.
Created Date
2016

Coarse-grained Reconfigurable Arrays (CGRAs) are promising accelerators capable of accelerating even non-parallel loops and loops with low trip-counts. One challenge in compiling for CGRAs is to manage both recurring and nonrecurring variables in the register file (RF) of the CGRA. Although prior works have managed recurring variables via rotating RF, they access the nonrecurring variables through either a global RF or from a constant memory. The former does not scale well, and the latter degrades the mapping quality. This work proposes a hardware-software codesign approach in order to manage all the variables in a local nonrotating RF. Hardware provides modulo …

Contributors
Dave, Shail, Shrivastava, Aviral, Ren, Fengbo, et al.
Created Date
2016

Cyber-physical systems and hard real-time systems have strict timing constraints that specify deadlines until which tasks must finish their execution. Missing a deadline can cause unexpected outcome or endanger human lives in safety-critical applications, such as automotive or aeronautical systems. It is, therefore, of utmost importance to obtain and optimize a safe upper bound of each task’s execution time or the worst-case execution time (WCET), to guarantee the absence of any missed deadline. Unfortunately, conventional microarchitectural components, such as caches and branch predictors, are only optimized for average-case performance and often make WCET analysis complicated and pessimistic. Caches especially have …

Contributors
Kim, Yooseong, Shrivastava, Aviral, Broman, David, et al.
Created Date
2017

Caches pose a serious limitation in scaling many-core architectures since the demand of area and power for maintaining cache coherence increases rapidly with the number of cores. Scratch-Pad Memories (SPMs) provide a cheaper and lower power alternative that can be used to build a more scalable many-core architecture. The trade-off of substituting SPMs for caches is however that the data must be explicitly managed in software. Heap management on SPM poses a major challenge due to the highly dynamic nature of of heap data access. Most existing heap management techniques implement a software caching scheme on SPM, emulating the behavior …

Contributors
Lin, Jinn-Pean, Shrivastava, Aviral, Ren, Fengbo, et al.
Created Date
2017

With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures. First, not all parallel threads have a uniform amount of workload to fully utilize GPU’s computation ability, leading to a sub-optimal performance problem, called warp criticality. To mitigate the degree of warp criticality, I propose a Criticality-Aware Warp Acceleration mechanism, called CAWA. CAWA predicts and accelerates the critical warp execution by allocating larger execution …

Contributors
Lee, Shin-Ying, Wu, Carole-Jean, Chakrabarti, Chaitali, et al.
Created Date
2017

User satisfaction is pivotal to the success of mobile applications. At the same time, it is imperative to maximize the energy efficiency of the mobile device to ensure optimal usage of the limited energy source available to mobile devices while maintaining the necessary levels of user satisfaction. However, this is complicated due to user interactions, numerous shared resources, and network conditions that produce substantial uncertainty to the mobile device's performance and power characteristics. In this dissertation, a new approach is presented to characterize and control mobile devices that accurately models these uncertainties. The proposed modeling framework is a completely data-driven …

Contributors
Gaudette, Benjamin David, Vrudhula, Sarma, Wu, Carole-Jean, et al.
Created Date
2017

Cyber-Physical Systems (CPS) are being used in many safety-critical applications. Due to the important role in virtually every aspect of human life, it is crucial to make sure that a CPS works properly before its deployment. However, formal verification of CPS is a computationally hard problem. Therefore, lightweight verification methods such as testing and monitoring of the CPS are considered in the industry. The formal representation of the CPS requirements is a challenging task. In addition, checking the system outputs with respect to requirements is a computationally complex problem. In this dissertation, these problems for the verification of CPS are …

Contributors
Dokhanchi, Adel, Fainekos, Georgios, Lee, Yann-Hang, et al.
Created Date
2017

General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions. Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in …

Contributors
Arunkumar, Akhil, Wu, Carole-Jean, Shrivastava, Aviral, et al.
Created Date
2018

Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can provide commercial-off-the-shelf microprocessors with adaptive and scalable reliability. Among all software-level error resilience solutions, in-application instruction replication based approaches have been widely used and are deemed to be the most effective. However, existing instruction-based replication schemes only protect some part of computations i.e. arithmetic and logical instructions and leave …

Contributors
Didehban, Moslem, Shrivastava, Aviral, Wu, Carole-Jean, et al.
Created Date
2018

The Internet of Things ecosystem has spawned a wide variety of embedded real-time systems that complicate the identification and resolution of bugs in software. The methods of concurrent checkpoint provide a means to monitor the application state with the ability to replay the execution on like hardware and software, without holding off and delaying the execution of application threads. In this thesis, it is accomplished by monitoring physical memory of the application using a soft-dirty page tracker and measuring the various types of overhead when employing concurrent checkpointing. The solution presented is an advancement of the Checkpoint and Replay In …

Contributors
Prinke, Michael L, Lee, Yann-Hang, Shrivastava, Aviral, et al.
Created Date
2018

Automated driving systems are in an intensive research and development stage, and the companies developing these systems are targeting to deploy them on public roads in a very near future. Guaranteeing safe operation of these systems is crucial as they are planned to carry passengers and share the road with other vehicles and pedestrians. Yet, there is no agreed-upon approach on how and in what detail those systems should be tested. Different organizations have different testing approaches, and one common approach is to combine simulation-based testing with real-world driving. One of the expectations from fully-automated vehicles is never to cause …

Contributors
Tuncali, Cumhur Erkan, Fainekos, Georgios, Ben Amor, Heni, et al.
Created Date
2019