[scikit-learn] HashingVectorizer slow in version 0.18

Piotr Bialecki piotr.bialecki at hotmail.de
Tue Oct 11 10:03:29 EDT 2016


I just tested it on my Ubuntu machine and could not see any performance 
issues (5.68 seconds in scikit-learn 0.17 vs. 6.67 seconds in 
scikit-learn 0.18)

However, on another Windows 10 machine I could indeed see this issue:

scikit-learn 0.17.1. Numpy 1.11.1. Python 2.7.12 AMD64
Vectorizing 20newsgroup 11314 documents
('Vectorization completed in ', 5.608999967575073, ' seconds, resulting 
shape ', (11314, 1048576))


scikit-learn 0.18. Numpy 1.11.1. Python 2.7.12 AMD64
Vectorizing 20newsgroup 11314 documents
('Vectorization completed in ', 27.924000024795532, ' seconds, resulting 
shape ', (11314, 1048576))

On 11.10.2016 15:44, Olivier Grisel wrote:
> That's really weird. I don't have a windows machine handy at the
> moment. It would be nice if someone else could confirm.
>
> Could you please run the Python profiler on this to see where the time
> is spent on the slow setup?
>



More information about the scikit-learn mailing list