[scikit-learn] clustering on big dataset