example of dimensionality reduction improving classification accuracy (not just computational efficiency):

" For these reasons, we use information gain criteria to select the most informative attributes that represent each author as a class. This reduces vector size and sparsity while increasing accuracy and machine learning model training speed. For example, we get 200,000 features from the 900 executable binary samples of 100 programmers. If we use all of these features in classification, the accuracy is slightly above 20% because the random forest might be randomly selecting features with values of zero in the sparse feature vectors. Once the dimension of the feature vector is reduced, we get less than 500 information gain features. Extracting less than 500 features or training a machine learning model where each instance has less than 500 attributes is computationally efficient. On the other hand, no sparsity remains in the feature vectors after dimensionality reduction which is one reason for the performance benefits of dimensionality reduction. After dimensionality reduction, the correct classification accuracy of 100 programmers increases from 20% to close to 80%. ... We employed the dimensionality reduction step using WEKA’s [23] information gain [35] attribute selection criterion, which evaluates the difference between the entropy of the distribution of classes and the entropy of the conditional distribution of classes given a particular feature .. we retained only those features that individually had non-zero information gain. We refer to these features as IG-features throughout the rest of the paper. Note that...information gain is always non-negative" --


(rec. by )