[scikit-learn] MPLclassifier

Joel Nothman joel.nothman at gmail.com
Sat Jan 13 21:56:51 EST 2018


I don't think this is an issue directly related to scikit-learn. Your
classifier is learning to always predict the majority class. If you do not
have good training performance, then you either need more data or your
model is in appropriate. You're trying to learn lots of parameters from 100
examples. Use a simpler model. Use stronger regularisation (higher alpha).
Work through some tutorials on machine learning diagnostics and modelling
choices.

On 13 Jan 2018 3:42 am, "andreas heiner" <ap.heiner at gmail.com> wrote:

> Hi,
>
> I try to apply the MPLclassifier to a subset (100 data points, 2 classes)
> of the 20newsgroup dataset. I created (ok, copied) the following pipeline
>
> model_MLP = Pipeline([('vect', CountVectorizer()),
>                       ('tfidf', TfidfTransformer()),
>                      ('model_MLP', MLPClassifier(solver='lbfgs',
>                                                  alpha=1e-5,
>                                                  hidden_layer_sizes=(5, 2),
>                                                  random_state=1)
>                       )
>                       ])
>
> model_MLP.fit(twenty_train.data, twenty_train.target)
>
> predicted_MLP = model_MLP.predict(twenty_test.data)
>
> print(metrics.classification_report(twenty_test.target, predicted_MLP,
>                                     target_names=twenty_test.target_names))
>
> The numbers I get are hopeless,
>
>                       precision    recall  f1-score   support
>    alt.atheism         0.00      0.00      0.00        34
> sci.electronics       0.66      1.00      0.80        66
>
> The only reason I can think of is that the dictionaries of the training
> and the test set are not the same (testset: 5204 words, training set: 5402
> words). That should not be a problem (if I understand Bayes correctly), but
> it certainly gives rubbish (see the numbers).
>
> The same setup with the SVD routine works great, all values are around .95
>
> thanks,
>
> Andreas
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180114/ada3fdc7/attachment.html>


More information about the scikit-learn mailing list