Whoops…Nothing found

Try other keywords in your search

Clustering

 1 Minute

 0 Likes

 4885 Views

What Is It? 

Clustering is a machine learning technique used to group points by numerical similarity. The computed clusters are automatically mapped to the Color dimension and can be saved as a feature. 


Why Is This Important?

Use Clustering to quickly reveal distinct groups within your data. Clustering is highly applicable in a variety of use cases:

  • Identify customer segments within sales data.
  • Categorize new products based on a set of product features to price products for the market.
  • Characterize target patient populations for a new health program. 
  • Categorize IT tickets to identify common problems and spot opportunities to automate.
  • Improve the performance of recommender systems.


How?

The underlying algorithm used in Explore is the k-means algorithm. Clustering works with numerical features but does not allow features where all the values are the same.



Steps for running Clustering:

  1. Open the Clustering panel by clicking the icon in the toolbar or selecting Clustering from the Data Analytics menu.
  2. The Clustering routine in Explore will automatically determine the best number of clusters using the elbow method.
    1. If you would like to specify the exact number of clusters to find, you may enter a number between 2 and 16 in the Number of Clusters input.
  3. Select the features you would like to use to determine the clusters by either dragging and dropping them into the Add Features area or using the Input All button. 
  4. (Optional) Remove unwanted features by hovering over the feature and clicking the "x" button.
    • Tip: Removing sparse features (with many missing values) will produce more complete results. Click the red pound sign to remove sparse features.
  5. Click the "Run" button. 


Additional Details

The computed clusters are automatically mapped to the Color dimension. A new feature is generated called "(X) Cluster Result", where X corresponds to the number of times you have run Clustering. If you would like to save the resulting clusters as a new column in your dataset, you can drag the newly created Cluster Result feature into the Feature List. 

Was this article helpful?