PCA

Principal Component Analysis (PCA) is a data analysis technique used for dimensionality reduction; it is frequently used when the dataset has a very large number of features. Each principal component attempts to capture the largest amount of variance in the dataset; successive components contain the largest possible variance not accounted for by previous components. In other words, PCA transforms the data into features that are minimally correlated by taking linear combinations of the input features.

PCA in VIP

To access PCA, click on the icon in the toolbar. Drag the features you would like to include in the routine into the Features field. You can also click on Input All to use all the features. Hovering on a feature and clicking on the X will let you remove it from the list, and you can remove all features with many missing values by clicking on the “Remove sparse features” button next to Input All. The trash can icon  allows you to remove all input features to start fresh. Then enter the number of principal components you would like to compute. This is set to 3 by default to allow intuitive spatial visualization. PCA works with numerical features but does not allow features where all the values are the same.

PCA Results

Clicking Run computes the principal components (note that rows with missing values will be ignored) and suggests two different plots: the first one is based on the principal components, and the second one is based on the most relevant original features among the ones used as input (choosing the most relevant for each of the first three components). Each component will have a score corresponding to the variance from the input variables that the component captures. The first suggested plot will be automatically displayed. If you have a feature on Color then this will persist in the new visualization; all other dimensions will be reset when the visualization is shown.

Relative Importances

You can view the relative importance of all input variables for each component by selecting a component from the Show Importance By Component section in the PCA menu .