
One of the simplest methods is K-means clustering. In this
method, the number of clusters is initialized and the center of each of the
cluster is randomly chosen. The Euclidean distance between each data point and
all the center of the clusters is computed and based on the minimum distance
each data point is assigned to certain cluster. The new center for the cluster
is defined and the Euclidean distance is calculated. This procedure iterates
till convergence is reached.
Let’s see how to generate some random data points and some
random cluster points
%Generate sample data
points and assign random centre for each cluster
%Number of data points
sz=100;
sz1=250;
X = random('unid',sz1,[sz 1]); %Value
Xp = random('unid',sz1,[sz 1]); %Position
%Number of clusters
c=6;
V=random('unid',sz1,[c 1]); %Value
Vp=random('unid',sz1,[c 1]); %Position
figure,plot(Xp,X,'*',Vp,V,'r+');title('Data points
and the initial Cluster centers');
Explanation:
100 data points are generated and the number of clusters are
assumed to be 6 and 6 random cluster points are generated.
Group the data points:
MATLAB CODE:
%Preallocate the vectors
V1=zeros(size(V));
Vp1=zeros(size(Vp));
flag=1;
while flag==1
%Find the euclidean
distance between the data points and all the center of
%the clusters
J=sqrt(abs(repmat(Xp,[1 c])-repmat(Vp,[1
sz])').^2+abs(repmat(X,[1 c])-repmat(V,[1 sz])').^2);
%Find the minimum distance
between the data point and the cluster
[mv,Gpos]=min(J,[],2);
CGroup=zeros([sz c]);
colr=colormap(jet(c));
figure(3),
for i = 1:c
Temp = find(Gpos==i);
CGroup(1:numel(Temp),i)=Temp;
V1(i,:)=mean(X(Temp));
Vp1(i,:)=mean(Xp(Temp));
Pos=ones(numel(Temp)*2,1)*Vp1(i);
Pos(2:2:end)=Xp(Temp);
Value=ones(numel(Temp)*2,1)*V1(i);
Value(2:2:end)=X(Temp);
%Define the new centre for each
cluster
plot(Pos,Value,'Color',colr(i,:),'LineStyle','-','Marker','o');hold on;
plot(Vp1(i),V1(i),'k+');
end
hold off;
Diffv=abs(V-V1);
DiffVp=abs(Vp-Vp1);
%Iterate the process till
there is no change in the cluster position
if(Diffv < 1)
flag=0;
else
V=V1;
Vp=Vp1;
end
end
figure,plot(Xp,X,'*',Vp,V,'g+');title('Data points
and the Final Cluster centers');
Fig. The new position(red circle)
of the clusters after the final iteration.
Explanation:
In the above figure, the data
points are represented in blue color stars and the cluster centers are
represented in red color cross shape.
Let’s consider one particular
data point and all the cluster centers.
Data point position X = 13, Y =
20
Cluster 1 position X = 8,
Y = 19
Cluster 2 position X = 13, Y = 15
Step
1: Find the Euclidean Distance:
Find the Euclidean distance(D1)
between data point and the cluster 1 similarly, find the Euclidean distance(D2)
between data point and the cluster 2
Distance D1 = sqrt((13-8).^2+(20-19).^2))
= 5.0990
Distance D2 =
sqrt((13-13).^2+(20-15).^2))= 5.0000
Step
2: Find the minimum and assign the data point to a cluster
Now the minimum distance among
the two results is for the cluster 2.
So the data point with
(X,Y)=(13,20) is assigned to the cluster/group 2.
Step
3: Perform the step 1 and step 2 for all the data points and assign
group accordingly.
Step
4: Assign a new position for the clusters based on the clustering
done.
Find the average position of the newly
assigned data points to a particular cluster and use that average as the new
position for the cluster.
Step
5: Iterate this procedure till the position of the clusters are
unchanged.
Number of clusters = 3
Number of clusters = 15
0 comments:
Enjoyed Reading? Share Your Views