[scikit-learn] MemoryError when evaluate clustering with gridsearchcv

lampahome pahome.chen at mirlab.org
Thu May 30 04:42:20 EDT 2019

I read a large data into memory and it cost about 2GB ram(I have 4GB ram)

Size get from sys.getsizeof(train_X)

And I evalute clustering with gridsearchcv below:
 def grid_search_clu(X):
def cv_scorer(estimator, X):
cluster_labels = estimator.labels_ if hasattr(estimator, 'labels_') else
num_labels = len(set(cluster_labels))
num_samples = len(X)
if num_labels == 1 or num_labels == num_samples:
return -1
return -metrics.davies_bouldin_score(X, cluster_labels)

m = cluster.Birch(n_clusters=None, compute_labels=True)
m_param = {'branching_factor' : range(10,60,10), 'threshold' :
np.arange(0.1, 0.6, 0.1).round(decimals=3) }

clf = GridSearchCV(m, m_param, cv=[(slice(None), slice(None))],
scoring=cv_scorer, verbose=1, n_jobs=1, return_train_score=False).fit(X)

And I got memoryerror, how should I do to solve this?
Adjust the parameters' range?

