K-Means clustering is an unsupervised learning algorithm. There is no labeled data for this clustering, unlike in supervised learning. K-Means performs the division of objects into clusters that share similarities and are dissimilar to the objects belonging to another cluster.

The term ‘K’ is a number. You need to tell the system how many clusters you need to create. For example, K = 2 refers to two clusters. There is a way of finding out what is the best or optimum value of K for a given data.

import pandas as pd
import seaborn as sns

df = pd.read_csv('https://raw.githubusercontent.com/theleadio/datascience_demo/master/rfm-agg.csv')

df

sns.scatterplot(data=df, x='Frequency', y='AmountSpent')

<matplotlib.axes._subplots.AxesSubplot at 0x7f2845b77e10>

k=4 
from sklearn.cluster import KMeans
X=['Frequency','AmountSpent']
kmeans=KMeans(n_clusters=k).fit(df[X])

df['cluster'] = kmeans.labels_.astype(str)

df

sns.scatterplot(data=df, x='Frequency', y='AmountSpent', hue='cluster')

<matplotlib.axes._subplots.AxesSubplot at 0x7f28429eddd0>

df.to_csv('cluster.csv')

distortions = []
K = range(1,10)
for k in K:
    kmeanModel = KMeans(n_clusters=k)
    kmeanModel.fit(df)
    distortions.append(kmeanModel.inertia_)

distortions

[12423553328.08133,
 3352212193.438077,
 1632058999.1078076,
 1046448933.0939847,
 759111212.95807,
 561101201.1299284,
 418427812.44608366,
 327194165.33339405,
 272561781.22020674]

import matplotlib.pyplot as plt
plt.figure(figsize=(16,8))
plt.plot(K, distortions, 'bx-')
plt.xlabel('k')
plt.ylabel('Distortion')
plt.title('The Elbow Method showing the optimal k')
plt.show()

	CustomerID	Frequency	AmountSpent
0	12347	27	75.54
1	12348	5	81.13
2	12349	16	53.57
3	12350	3	3.95
4	12352	17	109.70
...	...	...	...
4089	18280	3	12.25
4090	18281	2	2.07
4091	18282	2	19.50
4092	18283	132	227.15
4093	18287	15	20.37

	CustomerID	Frequency	AmountSpent	cluster
0	12347	27	75.54	0
1	12348	5	81.13	0
2	12349	16	53.57	0
3	12350	3	3.95	0
4	12352	17	109.70	0
...	...	...	...	...
4089	18280	3	12.25	0
4090	18281	2	2.07	0
4091	18282	2	19.50	0
4092	18283	132	227.15	3
4093	18287	15	20.37	0