Skip to main content

An Information Based Optimal Subdata Selection Algorithm for Big Data Linear Regression and a Suitable Variable Selection Algorithm


Abstract This article proposes a new information-based subdata selection (IBOSS) algorithm, Squared Scaled Distance Algorithm (SSDA). It is based on the invariance of the determinant of the information matrix under orthogonal transformations, especially rotations. Extensive simulation results show that the new IBOSS algorithm retains nice asymptotic properties of IBOSS and gives a larger determinant of the subdata information matrix. It has the same order of time complexity as the D-optimal IBOSS algorithm. However, it exploits the advantages of vectorized calculation avoiding for loops and is approximately 6 times as fast as the D-optimal IBOSS algorithm in R. The robustness of SSDA is studied from three aspects: nonorthogonality, including interac... (more)
Created Date 2017
Contributor Zheng, Yi (Author) / Stufken, John (Advisor) / Reiser, Mark (Committee member) / McCulloch, Robert (Committee member) / Arizona State University (Publisher)
Subject Statistics / Computer science / Big Data / IBOSS / Subdata Selection / Variable Selection
Type Masters Thesis
Extent 46 pages
Language English
Copyright
Reuse Permissions All Rights Reserved
Note Masters Thesis Statistics 2017
Collaborating Institutions Graduate College / ASU Library
Additional Formats MODS / OAI Dublin Core / RIS


  Full Text
651.7 KB application/pdf
Download Count: 636

Description Dissertation/Thesis