Differences Between KMeans and KMedoids algorithms.
Installing sklearn-extra.
!pip install scikit-learn-extra
Imports.
import numpy as np
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn_extra.cluster import KMedoids
First, create a random sample of 2-d points. Add a single outlier far from every other point in the sample.
sample = np.random.choice(50,(100,2))
outlier = np.array([[100],[100]]).T
sample = np.append(sample,outlier, axis=0)
Initialize a KMeans function and a KMedoids function. I'm using the ones from SKLearn and SKLearn_extra.
kmeans = KMeans(n_clusters=4, random_state=37).fit(sample)
kmedoids = KMedoids(n_clusters=4,random_state=37).fit(sample)
Plot the clusters resulting from each clustering algorithm.
sns.scatterplot(data = sample, x = sample[:,0], y = sample[:,1], hue = kmeans.predict(sample))
sns.scatterplot(data = sample, x = sample[:,0], y = sample[:,1], hue = kmedoids.predict(sample))
Results
The KMeans() algorithm places the outlier into a cluster containing only itself, and sorts the remaining points into three distinct clusters. The KMedoids() algorithm puts the outlier into a cluster with the central points that are closest to it, rather than placing it into its own distinct group.