Flowers Classificator Using K-Means

1. Exploring the dataset

import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.cluster import KMeans import seaborn as sns

iris = sns.load_dataset('iris')

iris['species'].value_counts()

sns.pairplot(iris, hue = 'species')

sns.heatmap(iris.corr(), annot=True)

2. Constructing our ML Model

As we could see in the previous section, the most significant column is the petal_lenght.

Also, the petal length and petal width are very correlated (0.96), so we could easily choose to work with only one of them.

A candidate for improving our predictions would be the sepal_length, as we can see in the pair plot of petal_length x sepal_length.

x = iris[['petal_length', 'sepal_length']].values

2.1. Using the Elbow Method to find the best K

ks = np.arange(2,21) inertias = [] for k in ks: model = KMeans(n_clusters=k, max_iter=1000) model.fit(x) inertias.append(model.inertia_) # print(ks) # print(inertias)

plt.title('K vs Intertia') plt.xlabel('Inertia') plt.ylabel('K') plt.plot(ks, inertias, marker='o')

2.2. Using the best k to create the model

# Construct our model model = KMeans(n_clusters=3, max_iter=1000) model.fit(x)

# Plotting the result y_predicted = model.predict(x) sns.scatterplot(iris['petal_length'], iris['sepal_length'], hue=y_predicted)

3. Accuray

y_predicted

# codes = {'virginica':2, 'versicolor':1, 'setosa':0} # y_expected = np.array([codes[s] for s in iris['species']]) # y_expected