Skip to main content

Minimizing Dataset Size Requirements for Machine Learning


Abstract Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately label the data into different classes. With abundance of data, it becomes necessary that all the data should be labeled for its proper utilization and this work focuses on reducing the labeling effort for large dataset. The thesis presents a comparison of different classifiers performance to test if small set of labeled data can be utilized to build accurate models for high prediction rate. The use of small dataset ... (more)
Created Date 2017
Contributor Batra, Salil (Author) / Femiani, John (Advisor) / Amresh, Ashish (Advisor) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)
Subject Computer science / Active Learning / Machine Learning / One Class Classification
Type Masters Thesis
Extent 60 pages
Language English
Copyright
Reuse Permissions All Rights Reserved
Note Masters Thesis Engineering 2017
Collaborating Institutions Graduate College / ASU Library
Additional Formats MODS / OAI Dublin Core / RIS


  Full Text
2.4 MB application/pdf
Download Count: 345

Description Dissertation/Thesis