Data Mining in Python

Aleks Jakulin jakulin@acm.org
Mon, 7 Jan 2002 08:45:31 +0100


I have completed a small collection of libraries useful for machine learning and
data mining. It requires win32, Python 2.1 and the Orange machine learning
framework library, both freely downloadable.

Features:
    Clustering / Unsupervised Learning:
        - k-means (medoid) clustering
        - fuzzy clustering
        - hierarchical agglomerative clustering

    Supervised Learning:
        - multiclass logistic regression
        - multiclass SVM for classification, regression, and density estimation
        - wrapped multiclass SVM classifier which outputs class probabilities
        - general wrappers for multiclass classification with binary classifiers
        - several ensemble construction methods
        - support for "merging" the final classification from ensembles of
          classifiers that output probability distributions

Note that Orange itself provides tremendously many features: discretization,
k-NN, classification and regression trees, naive Bayes classifiers, evaluation
techniques (stratified cross validation, random sampling, aROC, etc),
constructive induction, etc. Check out http://magix.fri.uni-lj.si/orange/

It can all be found at http://ai.fri.uni-lj.si/~aleks/orng/  I'm sorry for not
supporting Python 2.2, and platforms other than win32 at the moment.
But all that will come provided sufficient user stimulation.

Best regards,
                Aleks

Aleks Jakulin (jakulin@@ieee.org)
Faculty of Computer and Information Science
University of Ljubljana
Slovenia
+386 41 379 137