Improved Feature Selection Using Neural Networkson Human Gene Expression Data
With advances in computational power and storage there are now thousands of data sets available for almost any domain. However, not all data sets are created equal. It is a well known problem in machine learning that as the number of features increase, the volume of space which the data is projected onto increases so fast that the data becomes sparse; producing datasets which are hard to fit using conventional models. This is referred to as the curse of dimensionality and it is one of the most fundamental problems in computation. This is further compounded in datasets with few labeled records. The most common solution to this problem is using feature selection techniques to reduce the number of dimensions. This thesis will explore the effectiveness of several of these techniques as applied to gene expression data. In addition a new model designed for the purpose of training on high dimensional data will be presented and compared to those of known techniques.