Principal Component Analysis (PCA) reduces the complexity of a dataset by condensing information from many different features into just a few representative features. It is especially helpful when the dataset has a very large number of features.
The components in PCA aim to capture the largest amount of variance in the dataset. PCA is an unsupervised machine learning routine, which means that you do not need a target feature in mind when running the routine. This can be very helpful to quickly get a broader understanding of the trends and relationships in your data.
Using PCA
-
Click the PCA icon ( ) in the left-side toolbar to open the PCA panel.
-
Under Number of Components, choose the number of principal components to reduce the many different features into.
-
Select the features on which to run PCA by either choosing them from the Select Features drop-down or using the Input All button.
-
(Optional) Remove unwanted features by hovering over the feature and clicking the Remove ( ) button.
-
Click Run at the bottom of the panel.
Reviewing PCA Outputs
Visualizations
A visualization will be generated to show you the first three principal components on the X, Y and Z axes. You will also find a visualization of the key features identified as strongly influencing the first three principal components by toggling through the Suggested Mappings at the bottom of the PCA panel.
Feature Importance
You can view the aggregate rank of the features' importance within the PCA routine:
- Click the Menu icon at the top right of the PCA panel.
- Select Show Feature Importance.
This gives an idea of which features are contributing the most to differences between points in your dataset.
To view the feature importance per principal component:
- Click the Menu icon at the top right of the PCA panel.
- Hover over Show Importance By Component.
- Select the component you would like to investigate.
Successive components capture the largest possible variance not accounted for by previous components.
Previous Article |
Next Article |