[scikit-learn] Need help in dealing with large dataset
CHETHAN MURALI
chethanmuralisv at gmail.com
Mon Mar 5 11:19:39 EST 2018
Dear All,
I am working on building a CNN model for image classification problem.
As par of it I have converted all my test images to numpy array.
Now when I am trying to split the array into training and test set I am
getting memory error.
Details are as below:
X = np.load("./data/X_train.npy", mmap_mode='r')
train_pct_index = int(0.8 * len(X))
X_train, X_test = X[:train_pct_index], X[train_pct_index:]
X_train = X_train.reshape(X_train.shape[0], 256, 256, 3)
X_train = X_train.astype('float32')
-------------------------------------------------MemoryError
Traceback (most recent call
last)<ipython-input-46-9180807e01dc> in <module>()
2 print("Normalizing Data")
3 ----> 4 X_train = X_train.astype('float32')
*More information:*
*1. my python version is*
python --versionPython 3.6.4 :: Anaconda custom (64-bit)
*2. I am running the code in ubuntu ubuntu 16.04.*
*3. I have 32GB RAM*
*4. X_train.npy file that I have loaded to np.array is of size 20GB*
print("X_train Shape: ", X_train.shape)
X_train Shape: (85108, 256, 256, 3)
I would be really glad if you can help me to overcome this problem.
Regards,
-
Chethan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180305/21225329/attachment-0001.html>
More information about the scikit-learn
mailing list