
On 4/22/14, Richard Tsai <richard9404@gmail.com> wrote:
2014-03-21 22:18 GMT+08:00 Richard Tsai <richard9404@gmail.com>:
Hi all,
I've posted my proposal to melange but there's still some potential features to the package (cluster) I want to discuss here.
The first one is about the stopping criterion of kmeans/kmeans. These two functions are using the average distance from observations to their corresponding centroids currently. But a more accurate exiting condition will be the average *squared* distance. Besides, the average centroids moving distance, and the changes of the results of vq are both better than the original one. Second, finding convex hulls of hierarchical clustering seems interesting but I'm not sure if there's a demand for it. The third one is gap statistics for automatic determination of k in kmeans. David supposed that it should be scikit-learn territory and I plan to put it to the end.
I'm not sure if these features are proper to be integrated into cluster and Ralf doubts that there's some overlap with scikit-learn so I post them here to discuss at his suggestion. I've also made my proposal public: http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/richardts... Comments/suggestions are welcome.
Regards, Richard
Hi all,
I've received emails from GSoC saying that my proposal has been accepted. Thanks to those who have help me with my application!
I'll submit the required materials soon then make a more detailed plan and prepare for coding. If you have any thoughts about my project, please discuss with me!
Richard
Congratulations, Richard! That's great news. Warren