Principal Component Analysis (PCA) is a data analysis technique used for dimensionality reduction; it is frequently used when the dataset has a very large number of features. Each principal component attempts to capture the largest amount of variance in the dataset; successive components contain the largest possible variance not accounted for by previous components. In other words, PCA transforms the data into features that are minimally correlated by taking linear combinations of the input features.
PCA in VIP
To access PCA, click on the icon in the toolbar. Drag the features you would like to include in the routine into the Features field. You can also click on Input All to use all the features. Hovering on a feature and clicking on the X will let you remove it from the list. Then enter the number of principal components you would like to compute. This is set to 3 by default to allow intuitive spatial visualization. PCA works with numerical features but does not allow features where all the values are the same.
Clicking Run computes the principal components (note that rows with missing values will be ignored). Each component will have a score corresponding to the variance from the input variables that the component captures. Click Apply to visualize the first 3 components on X, Y and Z. If you have a feature on Color then this will persist in the new visualization; all other dimensions will be reset when you click Apply.
You can select individual components to view the relative importance of each input variable for that component.