Clustering problem
I have a matrix 150x2 (data set) and I'd like to determine the number of clusters without visualize the data set. Is there a way of doing that? I had a look at wikipedia but K=(n/2)^(1/2) (approximately) does not work. Because I had plot the data and noticed that K=3.
0
Comments
the K-by-P matrix C.
[IDX, C, SUMD] = KMEANS(X, K) returns the within-cluster sums of
point-to-centroid distances in the 1-by-K vector sumD.
These are commands in Matlab. I used the second call to find the SUMD , however this requires K(=number od clusters). The problem is that if i have a number of trials witk k=1,2... what is the maximum number ok K when I am going to stop. Have in mind that the purpose of my assignment is to identify two features (2 columns) from the training data set of 150x5 which give the best result. So i have to make trials using the different combinations of columns. I KNOW THAT PLOTTING THE DATA WOULD SOLVE BOTH PROBLEMS. This is the first step. The second step is to find an alternative way of identifying the number of clusters and hence the 2 features which give the best result...