What Is It?
Normalization transforms tightly clustered data points or reels in skewed points to create clear visualizations and aid in understanding.
Why Is This Important?
Normalization transformations are important for creating visualizations that are clear, and allow consumers to read data quickly. For example, if you have a few outliers in your data, the significant trends in the bulk of the data might be obscured before applying a normalization. This is very common with data quality errors. Normalization is also very helpful when you have data that is expected to be heavily skewed (such as household income), and it allows the user to obtain insights from data that would otherwise be difficult to interpret.
Applying Normalization can be done in 4 ways:
- Click on the Normalization button in the toolbar, which allows you to apply the same Normalization method to all three axes at once.
- From the right-click Contextual Menu, you can select Axis Normalization. This will also apply the same method to all three axes at once.
- Click on the desired Dimension Icon in the Mapping panel. Select a Normalization from the “Norm” drop down options, which will only apply the selected method to selected feature.
- Create a new normalized feature by right-clicking a numerical feature and selecting an option from the Normalize section.
- Note that options 1-3 are visual normalizations only (data values are not being changed). Option 4 creates a new column in the dataset with normalized values.
There are four types of normalization techniques that you can use to improve visualization of the data:
- Log10 – Maps the values of a feature to a logarithmic scale. This transformation can be particularly helpful when visualizing data that has a very wide range between the minimum and maximum values (multiple orders of magnitude). It is not applicable for features that contain a value of 0 or less.
- IHST – Inverse hyperbolic sine transformation. This normalization option is commonly used in financial applications. Useful for visualizing data that is heavily skewed, particularly when a feature contains 0 or negative values.
- Softmax – One of the most robust normalization options, softmax tends to reduce the impact of outliers on the spread of points while maintaining some visual similarity to the unnormalized distribution. Also useful for visualizing data that is heavily skewed, particularly when a feature contains 0 or negative values.
- CDF – This applies a cumulative distribution function to the feature being normalized. Also useful for visualizing data that is heavily skewed, particularly when a feature contains 0 or negative values.