Describe principal component analysis.
Describe principal component analysis.
PCA reduces the dimensions of a data set by projecting the data onto a lower-dimensional subspace. In general, an n-dimensional dataset can be reduced by projecting the dataset onto a k-dimensional subspace, where k is less than n. More formally, PCA can be used to find a set of vectors that span a subspace, which minimizes the sum of the squared errors of the projected data. This projection will retain the greatest proportion of the original data set's variance.
Each subsequent principal component preserves the maximum amount of the remaining variance; the only constraint is that each must be orthogonal to the other principal components.
PCA is most useful when the variance in a data set is distributed unevenly across the dimensions.