Normalization transforms tightly clustered data points or reels in skewed points, helping to create clearer visualizations and allow you to read data quickly.
For example, if you have a few outliers in your data, the significant trends in the bulk of the data might be obscured before applying a normalization. This is very common with data quality errors. Normalization is also very helpful when you have data that is expected to be heavily skewed, and it allows you to obtain insights from data that would otherwise be difficult to interpret.
Using Normalization
Applying Normalization can be done in 3 ways:
- Option 1: Right-click on the plot, then select Axes Normalization. This will also apply the same method to all three axes at once.
- Option 2: Click on the desired Dimension Icon in the Mapping Panel. Select a Normalization from the Norm drop down options, which will only apply the selected method to selected feature.
- Option 3: Create a new normalized feature by right-clicking a numerical feature and selecting an option from the Normalize section.
Additionally, if a feature has been applied to the Color dimension in the Mapping Panel, you can apply Normalization to the gradient to more easily see the separation between points.
- Related Article: Setting Key Parameters in the Mapping Panel
Working with Normalization
There are four types of normalization techniques that you can use to improve visualization of the data:
Log10 | Maps the values of a feature to a logarithmic scale. This transformation can be particularly helpful when visualizing data that has a very wide range between the minimum and maximum values (multiple orders of magnitude). It is not applicable for features that contain a value of 0 or less. |
IHST | Inverse Hyperbolic Sine Transformation. This normalization option is commonly used in financial applications. Useful for visualizing data that is heavily skewed, particularly when a feature contains 0 or negative values. |
Softmax | One of the most robust normalization options, Softmax tends to reduce the impact of outliers on the spread of points while maintaining some visual similarity to the unnormalized distribution. Also useful for visualizing data that is heavily skewed, particularly when a feature contains 0 or negative values. |
CDF | This applies a cumulative distribution function to the feature being normalized. Also useful for visualizing data that is heavily skewed, particularly when a feature contains 0 or negative values. |
Previous Article |
Next Article |