[SciPy-User] scipy.io.loadmat throws TypeError with large files
Matthew Brett
matthew.brett at gmail.com
Sat Aug 10 21:28:39 EDT 2013
Hi,
On Thu, Aug 8, 2013 at 4:41 PM, Richard Llewellyn <llewelr at gmail.com> wrote:
> Hi Matthew,
>
> A short script below that shows that increasing the density triggers the
> error, on my machine, at file sizes over 4GB. Originally I had increased
> either M and N to trigger the error as well.
> I suspect you'll run into a problem with available RAM. I run this on my
> 32GB machine with 64GB swap, and it swaps, so this takes several minutes to
> process at least. Pain, I know. Once I get more RAM it would be easier for
> me to test various permutations, but that will be awhile.
>
> Maybe a generator could be used to build the matrix? Still, I think RAM
> will be an issue.
Aha - thanks for tracking that down a little further.
The problem is that the matlab 5-7 file format (non-HDF) has a uint32
to store the number of bytes that the matrix takes up on disk. Your
matrices causing the error are a little larger than 2**32, hence the
error.
Here's a relevant thread:
http://www.mathworks.de/matlabcentral/newsreader/view_thread/307845
It's not hard to reproduce the error on non-sparse (appended script).
We certainly need a better error for this - I'll try putting one in,
Cheers,
Matthew
from io import BytesIO
import numpy as np
from scipy.io import loadmat,savemat
fobj = BytesIO()
m = np.empty(2**32, dtype=np.int8)
n = np.arange(10).reshape((2, 5))
savemat(fobj, {'mat': m, 'n': n})
# fails here with TypeError
m = loadmat(fobj)['mat']
More information about the SciPy-User
mailing list