Anomaly Detection examines your data and automatically identifies points that deviate significantly from the norm. It's especially useful when extreme data points are not visually obvious, which may be the case with datasets containing a large number of features.
There are two Anomaly Detection routines available in Virtualitics Explore Desktop: Threshold-based and Standard Deviation-based. The Threshold method is well-suited for finding data points with extreme values across any number of different features, while the Standard Deviation method will help find data points that are statistical outliers for one or a few features.
Using Anomaly Detection
-
Click the Anomaly Detection icon ( ) in the top toolbar to open the Anomaly Detection panel.
-
Under the Methods drop-down, select the method you would like to use:
Threshold (default) -
The top N% of points are returned, which are ranked based on how far away they are from the "average" point.
-
You can choose the value of N by using the Select Number of Outliers slider or by typing an exact percentage value in the input box. The default threshold is 1%.
Standard Deviation -
Data that falls more than N standard deviations from the mean for the selected input features will be labeled as outliers.
-
Under the Define Range dropdown:
- Select + (upper extremes), - (lower extremes), or +/- (both extremes) to determine which extremes are considered.
-
Choose the value of N, which can range from 0.5 to 5.5.
- Under Options:
-
If “And” is selected, points that are outside the Nth standard deviation for all input features will be identified as outliers. Note that rows with missing values will always be identified as non-outliers.
-
If “Or” is selected, points that are outside the Nth standard deviation for one or more of the input features will be identified as outliers, even if that point contains missing values for some of the input features.
-
-
-
Add the features you would like determine outliers within by clicking and dragging them from the Features Panel to the Add Features section or using the Input All button.
-
(Optional) Remove unwanted features by hovering over the feature and clicking the Remove button ( ).
-
Click Run at the bottom of the panel.
Reviewing Anomaly Detection Outputs
Haloed Results
Anomalies are automatically shown with halos. A new (binary) data feature is also generated called “(X) Anomaly Result”, where X corresponds to the number of times you have run Anomaly Detection. You can simply click and drag the newly-created feature to any dimension or to the Features Panel, allowing you use this feature in future mappings.
To view just the points flagged as anomalies, right-click anywhere on the plot (not on a data point) to open the Contextual Menu. Within the Contextual Menu, select Show Only Haloed.
Adding Pulsation
You can make specific anomalies even easier to see in Mapping View by adding a pulsation animation to those data points. To do so, once you've already run Anomaly Detection:
- Navigate to the Mapping Panel.
- Locate Pulsation.
- From the Features Panel, click and drag the data feature you'd like to be displayed with a pulsation animation to the Pulsation box.
- Click Apply.
- On the dropdown that appears under Pulsation, choose the data value you'd like to see represented.
Data point(s) on the plot within the selected parameter will begin to pulsate.
Previous Article |
Next Article |