Clustering problem

edited March 2009 in Science & Tech
I have a matrix 150x2 (data set) and I'd like to determine the number of clusters without visualize the data set. Is there a way of doing that? I had a look at wikipedia but K=(n/2)^(1/2) (approximately) does not work. Because I had plot the data and noticed that K=3.

Comments

  • shwaipshwaip bluffin' with my muffin Icrontian
    edited March 2009
    Choosing k is often one of the harder tasks. The correct number for k depends on your problem. If you scatter plot the data you may be able to see approximately how many clusters you should use
  • edited March 2009
    Yeah I know. However, this assingment I have asked me at the end to write an alternative way to determine the number of clusters rather than visualize data by plotting the data points on the same figure
  • shwaipshwaip bluffin' with my muffin Icrontian
    edited March 2009
    maybe minimize (sum of the distances to the centers of their cluster + some penalty on higher K).
  • edited March 2009
    [IDX, C] = KMEANS(X, K) returns the K cluster centroid locations in
    the K-by-P matrix C.

    [IDX, C, SUMD] = KMEANS(X, K) returns the within-cluster sums of
    point-to-centroid distances in the 1-by-K vector sumD.


    These are commands in Matlab. I used the second call to find the SUMD , however this requires K(=number od clusters). The problem is that if i have a number of trials witk k=1,2... what is the maximum number ok K when I am going to stop. Have in mind that the purpose of my assignment is to identify two features (2 columns) from the training data set of 150x5 which give the best result. So i have to make trials using the different combinations of columns. I KNOW THAT PLOTTING THE DATA WOULD SOLVE BOTH PROBLEMS. This is the first step. The second step is to find an alternative way of identifying the number of clusters and hence the 2 features which give the best result...
  • shwaipshwaip bluffin' with my muffin Icrontian
    edited March 2009
    if you're allowed to, try using PCA or LDA.
  • edited March 2009
    I am not allowed... I have found the Agglomerative Hierarchical Clustering. The problem is that I don't know how to implement this in matlab...
Sign In or Register to comment.