Differences Between KMeans and KMedoids algorithms.
Installing sklearn-extra.
!pip install scikit-learn-extra
Requirement already satisfied: scikit-learn-extra in /root/venv/lib/python3.9/site-packages (0.2.0)
Requirement already satisfied: numpy>=1.13.3 in /shared-libs/python3.9/py/lib/python3.9/site-packages (from scikit-learn-extra) (1.23.4)
Requirement already satisfied: scikit-learn>=0.23.0 in /shared-libs/python3.9/py/lib/python3.9/site-packages (from scikit-learn-extra) (1.1.2)
Requirement already satisfied: scipy>=0.19.1 in /shared-libs/python3.9/py/lib/python3.9/site-packages (from scikit-learn-extra) (1.9.3)
Requirement already satisfied: joblib>=1.0.0 in /shared-libs/python3.9/py/lib/python3.9/site-packages (from scikit-learn>=0.23.0->scikit-learn-extra) (1.2.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /shared-libs/python3.9/py/lib/python3.9/site-packages (from scikit-learn>=0.23.0->scikit-learn-extra) (3.1.0)
WARNING: You are using pip version 22.0.4; however, version 22.3.1 is available.
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.
Imports.
import numpy as np
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn_extra.cluster import KMedoids
First, create a random sample of 2-d points. Add a single outlier far from every other point in the sample.
sample = np.random.choice(50,(100,2))
outlier = np.array([[100],[100]]).T
sample = np.append(sample,outlier, axis=0)
Initialize a KMeans function and a KMedoids function. I'm using the ones from SKLearn and SKLearn_extra.
kmeans = KMeans(n_clusters=4, random_state=37).fit(sample)
kmedoids = KMedoids(n_clusters=4,random_state=37).fit(sample)
Plot the clusters resulting from each clustering algorithm.
sns.scatterplot(data = sample, x = sample[:,0], y = sample[:,1], hue = kmeans.predict(sample))
sns.scatterplot(data = sample, x = sample[:,0], y = sample[:,1], hue = kmedoids.predict(sample))
Results
The KMeans() algorithm places the outlier into a cluster containing only itself, and sorts the remaining points into three distinct clusters. The KMedoids() algorithm puts the outlier into a cluster with the central points that are closest to it, rather than placing it into its own distinct group.