[Numpy-discussion] Memory error with numpy.loadtxt()

Chris Colbert sccolbert at gmail.com
Fri Feb 25 13:38:34 EST 2011


On Fri, Feb 25, 2011 at 12:52 PM, Joe Kington <jkington at wisc.edu> wrote:

> Do you expect to have very large integer values, or only values over a
> limited range?
>
> If your integer values will fit in into 16-bit range (or even 32-bit, if
> you're on a 64-bit machine, the default dtype is float64...) you can
> potentially halve your memory usage.
>
> I.e. Something like:
> data = numpy.loadtxt(filename, dtype=numpy.int16)
>
> Alternately, if you're already planning on using a (scipy) sparse array
> anyway, it's easy to do something like this:
>
> import numpy as np
> import scipy.sparse
> I, J, V = [], [], []
> with open('infile.txt') as infile:
>     for i, line in enumerate(infile):
>         line = np.array(line.strip().split(), dtype=np.int)
>         nonzeros, = line.nonzero()
>         I.extend([i]*nonzeros.size)
>         J.extend(nonzeros)
>         V.extend(line[nonzeros])
> data = scipy.sparse.coo_matrix((V,(I,J)), dtype=np.int, shape=(i+1,
> line.size))
>
> This will be much slower than numpy.loadtxt(...), but if you're just
> converting the output of loadtxt to a sparse array, regardless, this would
> avoid memory usage problems (assuming the array is mostly sparse, of
> course).
>
> Hope that helps,
> -Joe
>
>
>
> On Fri, Feb 25, 2011 at 9:37 AM, Jaidev Deshpande <
> deshpande.jaidev at gmail.com> wrote:
>
>> Hi
>>
>> Is it possible to load a text file 664 MB large with integer values and
>> about 98% sparse? numpy.loadtxt() shows a memory error.
>>
>> If it's not possible, what alternatives could I have?
>>
>> The usable RAM on my machine running Windows 7 is 3.24 GB.
>>
>> Thanks.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
In addition to this is helpful to remember that just becuase you have 3.24
GB available, doesn't mean that 664MB of that is contiguous, which is what
NumPy would need to hold it all in memory.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110225/7005cd99/attachment.html>


More information about the NumPy-Discussion mailing list