Data Mining in Python
Aleks Jakulin
jakulin@acm.org
Mon, 7 Jan 2002 08:45:31 +0100
I have completed a small collection of libraries useful for machine learning and
data mining. It requires win32, Python 2.1 and the Orange machine learning
framework library, both freely downloadable.
Features:
Clustering / Unsupervised Learning:
- k-means (medoid) clustering
- fuzzy clustering
- hierarchical agglomerative clustering
Supervised Learning:
- multiclass logistic regression
- multiclass SVM for classification, regression, and density estimation
- wrapped multiclass SVM classifier which outputs class probabilities
- general wrappers for multiclass classification with binary classifiers
- several ensemble construction methods
- support for "merging" the final classification from ensembles of
classifiers that output probability distributions
Note that Orange itself provides tremendously many features: discretization,
k-NN, classification and regression trees, naive Bayes classifiers, evaluation
techniques (stratified cross validation, random sampling, aROC, etc),
constructive induction, etc. Check out http://magix.fri.uni-lj.si/orange/
It can all be found at http://ai.fri.uni-lj.si/~aleks/orng/ I'm sorry for not
supporting Python 2.2, and platforms other than win32 at the moment.
But all that will come provided sufficient user stimulation.
Best regards,
Aleks
Aleks Jakulin (jakulin@@ieee.org)
Faculty of Computer and Information Science
University of Ljubljana
Slovenia
+386 41 379 137