[scikit-learn] MPLclassifier

Fri Jan 12 11:40:21 EST 2018

Hi,

I try to apply the MPLclassifier to a subset (100 data points, 2 classes)
of the 20newsgroup dataset. I created (ok, copied) the following pipeline

model_MLP = Pipeline([('vect', CountVectorizer()),
                      ('tfidf', TfidfTransformer()),
                     ('model_MLP', MLPClassifier(solver='lbfgs',
                                                 alpha=1e-5,
                                                 hidden_layer_sizes=(5, 2),
                                                 random_state=1)
                      )
                      ])

model_MLP.fit(twenty_train.data, twenty_train.target)

predicted_MLP = model_MLP.predict(twenty_test.data)

print(metrics.classification_report(twenty_test.target, predicted_MLP,
                                    target_names=twenty_test.target_names))

The numbers I get are hopeless,

                      precision    recall  f1-score   support
   alt.atheism         0.00      0.00      0.00        34
sci.electronics       0.66      1.00      0.80        66

The only reason I can think of is that the dictionaries of the training and
the test set are not the same (testset: 5204 words, training set: 5402
words). That should not be a problem (if I understand Bayes correctly), but
it certainly gives rubbish (see the numbers).

The same setup with the SVD routine works great, all values are around .95

thanks,

Andreas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180112/8a865ea4/attachment.html>