df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 137880 entries, 0 to 137879
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 main_id 137880 non-null int64
1 age 137880 non-null int64
2 r 137873 non-null float64
3 f 137880 non-null int64
4 m 137880 non-null float64
5 cost 137880 non-null float64
dtypes: float64(3), int64(3)
memory usage: 6.3 MB
df.describe()
main_idfloat64
agefloat64
count
137873
137873
mean
84945668864
683.2416717
std
9789204147
447.5840812
min
8400000000
156
25%
84835191146
302
50%
84938088566
589
75%
84974821269
955
max
842873096767
2507
<class 'pandas.core.frame.DataFrame'>
Int64Index: 137871 entries, 0 to 137879
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 main_id 137871 non-null int64
1 recency 137871 non-null float64
2 frequency 137871 non-null int64
3 monetary 137871 non-null float64
4 cost 137871 non-null float64
dtypes: float64(3), int64(2)
memory usage: 6.3 MB
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
recencyfloat64
frequencyfloat64
count
137871
137871
mean
2.85813223
2.110887973
std
1.138551005
1.121942274
min
0
0.6931471806
25%
1.791759469
1.098612289
50%
3.091042453
1.945910149
75%
3.713572067
2.995732274
max
7.523481313
5.579729826
recency frequency monetary cost
count 137871.00 137871.00 137871.00 137871.00
mean 0.00 0.00 0.00 0.00
std 1.00 1.00 1.00 1.00
min -2.51 -1.26 -1.19 -0.98
25% -0.94 -0.90 -0.90 -0.98
50% 0.20 -0.15 -0.18 -0.09
75% 0.75 0.79 0.68 0.82
max 4.10 3.09 6.40 3.46
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
/shared-libs/python3.7/py/lib/python3.7/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 137871 entries, 0 to 137879
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 recency 137871 non-null float64
1 frequency 137871 non-null float64
2 monetary 137871 non-null float64
3 cost 137871 non-null float64
dtypes: float64(4)
memory usage: 5.3 MB
# from sklearn.metrics import silhouette_score
# k = [2, 3, 4, 5, 6, 7]
# score=[]
# for n_cluster in k:
# kmeans = KMeans(n_clusters=n_cluster).fit(df_normalized)
# score.append(silhouette_score(df_normalized,kmeans.labels_))
# plt.plot(k, score, 'o-')
# plt.xlabel("Value for k")
# plt.ylabel("Silhouette score")
# plt.title('Silhouette Method')
# plt.show()
main_idint64
recencyfloat64
0
84777306699
15
1
84974127126
3
2
84978443555
5
3
84967382555
1
4
84968422222
1
labelint64
cluster_sizeint64
0
0
23735
1
1
56343
2
2
34533
3
3
23260
df_clustered.to_csv('final_clustering.csv')