[SciPy-User] scipy.io.loadmat throws TypeError with large files

Matthew Brett matthew.brett at gmail.com
Sat Aug 10 21:28:39 EDT 2013


Hi,

On Thu, Aug 8, 2013 at 4:41 PM, Richard Llewellyn <llewelr at gmail.com> wrote:
> Hi Matthew,
>
> A short script below that shows that increasing the density triggers the
> error, on my machine, at file sizes over 4GB.  Originally I had increased
> either M and N to trigger the error as well.
> I suspect you'll run into a problem with available RAM.  I run this on my
> 32GB machine with 64GB swap, and it swaps, so this takes several minutes to
> process at least.  Pain, I know.  Once I get more RAM it would be easier for
> me to test various permutations, but that will be awhile.
>
> Maybe a generator could be used to build the matrix?  Still, I think RAM
> will be an issue.

Aha - thanks for tracking that down a little further.

The problem is that the matlab 5-7 file format (non-HDF) has a uint32
to store the number of bytes that the matrix takes up on disk.   Your
matrices causing the error are a little larger than 2**32, hence the
error.

Here's a relevant thread:

http://www.mathworks.de/matlabcentral/newsreader/view_thread/307845

It's not hard to reproduce the error on non-sparse (appended script).

We certainly need a better error for this - I'll try putting one in,

Cheers,

Matthew

from io import BytesIO
import numpy as np
from scipy.io import loadmat,savemat

fobj = BytesIO()

m = np.empty(2**32, dtype=np.int8)
n = np.arange(10).reshape((2, 5))

savemat(fobj, {'mat': m, 'n': n})

# fails here with TypeError
m = loadmat(fobj)['mat']



More information about the SciPy-User mailing list