[scikit-learn] HashingVectorizer slow in version 0.18
Piotr Bialecki
piotr.bialecki at hotmail.de
Tue Oct 11 10:03:29 EDT 2016
I just tested it on my Ubuntu machine and could not see any performance
issues (5.68 seconds in scikit-learn 0.17 vs. 6.67 seconds in
scikit-learn 0.18)
However, on another Windows 10 machine I could indeed see this issue:
scikit-learn 0.17.1. Numpy 1.11.1. Python 2.7.12 AMD64
Vectorizing 20newsgroup 11314 documents
('Vectorization completed in ', 5.608999967575073, ' seconds, resulting
shape ', (11314, 1048576))
scikit-learn 0.18. Numpy 1.11.1. Python 2.7.12 AMD64
Vectorizing 20newsgroup 11314 documents
('Vectorization completed in ', 27.924000024795532, ' seconds, resulting
shape ', (11314, 1048576))
On 11.10.2016 15:44, Olivier Grisel wrote:
> That's really weird. I don't have a windows machine handy at the
> moment. It would be nice if someone else could confirm.
>
> Could you please run the Python profiler on this to see where the time
> is spent on the slow setup?
>
More information about the scikit-learn
mailing list