[Numpy-discussion] Memory error with numpy.loadtxt()

Joe Kington jkington at wisc.edu
Fri Feb 25 12:52:27 EST 2011


Do you expect to have very large integer values, or only values over a
limited range?

If your integer values will fit in into 16-bit range (or even 32-bit, if
you're on a 64-bit machine, the default dtype is float64...) you can
potentially halve your memory usage.

I.e. Something like:
data = numpy.loadtxt(filename, dtype=numpy.int16)

Alternately, if you're already planning on using a (scipy) sparse array
anyway, it's easy to do something like this:

import numpy as np
import scipy.sparse
I, J, V = [], [], []
with open('infile.txt') as infile:
    for i, line in enumerate(infile):
        line = np.array(line.strip().split(), dtype=np.int)
        nonzeros, = line.nonzero()
        I.extend([i]*nonzeros.size)
        J.extend(nonzeros)
        V.extend(line[nonzeros])
data = scipy.sparse.coo_matrix((V,(I,J)), dtype=np.int, shape=(i+1,
line.size))

This will be much slower than numpy.loadtxt(...), but if you're just
converting the output of loadtxt to a sparse array, regardless, this would
avoid memory usage problems (assuming the array is mostly sparse, of
course).

Hope that helps,
-Joe



On Fri, Feb 25, 2011 at 9:37 AM, Jaidev Deshpande <
deshpande.jaidev at gmail.com> wrote:

> Hi
>
> Is it possible to load a text file 664 MB large with integer values and
> about 98% sparse? numpy.loadtxt() shows a memory error.
>
> If it's not possible, what alternatives could I have?
>
> The usable RAM on my machine running Windows 7 is 3.24 GB.
>
> Thanks.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110225/468abc60/attachment.html>


More information about the NumPy-Discussion mailing list