Mailman 3 Memory error with numpy.loadtxt() - NumPy-Discussion - python.org

newer
Porting numpy to Python3

Memory error with numpy.loadtxt()

older
When memory access is a bottleneck

Jaidev Deshpande

25 Feb 2011 25 Feb '11

3:37 p.m.

Hi Is it possible to load a text file 664 MB large with integer values and about 98% sparse? numpy.loadtxt() shows a memory error. If it's not possible, what alternatives could I have? The usable RAM on my machine running Windows 7 is 3.24 GB. Thanks.

Attachments:

attachment.htm (text/html — 358 bytes)

Reply

Sign in to reply online Use email software

Show replies by date

Joe Kington

25 Feb 25 Feb

5:52 p.m.

Do you expect to have very large integer values, or only values over a limited range? If your integer values will fit in into 16-bit range (or even 32-bit, if you're on a 64-bit machine, the default dtype is float64...) you can potentially halve your memory usage. I.e. Something like: data = numpy.loadtxt(filename, dtype=numpy.int16) Alternately, if you're already planning on using a (scipy) sparse array anyway, it's easy to do something like this: import numpy as np import scipy.sparse I, J, V = [], [], [] with open('infile.txt') as infile: for i, line in enumerate(infile): line = np.array(line.strip().split(), dtype=np.int) nonzeros, = line.nonzero() I.extend([i]*nonzeros.size) J.extend(nonzeros) V.extend(line[nonzeros]) data = scipy.sparse.coo_matrix((V,(I,J)), dtype=np.int, shape=(i+1, line.size)) This will be much slower than numpy.loadtxt(...), but if you're just converting the output of loadtxt to a sparse array, regardless, this would avoid memory usage problems (assuming the array is mostly sparse, of course). Hope that helps, -Joe On Fri, Feb 25, 2011 at 9:37 AM, Jaidev Deshpande < deshpande.jaidev@gmail.com> wrote:

Hi

Is it possible to load a text file 664 MB large with integer values and about 98% sparse? numpy.loadtxt() shows a memory error.

If it's not possible, what alternatives could I have?

The usable RAM on my machine running Windows 7 is 3.24 GB.

Thanks.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply

Sign in to reply online Use email software

Chris Colbert

6:38 p.m.

On Fri, Feb 25, 2011 at 12:52 PM, Joe Kington wrote:

Do you expect to have very large integer values, or only values over a limited range?

If your integer values will fit in into 16-bit range (or even 32-bit, if you're on a 64-bit machine, the default dtype is float64...) you can potentially halve your memory usage.

I.e. Something like: data = numpy.loadtxt(filename, dtype=numpy.int16)

Alternately, if you're already planning on using a (scipy) sparse array anyway, it's easy to do something like this:

import numpy as np import scipy.sparse I, J, V = [], [], [] with open('infile.txt') as infile: for i, line in enumerate(infile): line = np.array(line.strip().split(), dtype=np.int) nonzeros, = line.nonzero() I.extend([i]*nonzeros.size) J.extend(nonzeros) V.extend(line[nonzeros]) data = scipy.sparse.coo_matrix((V,(I,J)), dtype=np.int, shape=(i+1, line.size))

This will be much slower than numpy.loadtxt(...), but if you're just converting the output of loadtxt to a sparse array, regardless, this would avoid memory usage problems (assuming the array is mostly sparse, of course).

Hope that helps, -Joe

On Fri, Feb 25, 2011 at 9:37 AM, Jaidev Deshpande < deshpande.jaidev@gmail.com> wrote:

...
Hi

Is it possible to load a text file 664 MB large with integer values and about 98% sparse? numpy.loadtxt() shows a memory error.

If it's not possible, what alternatives could I have?

The usable RAM on my machine running Windows 7 is 3.24 GB.

Thanks.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

In addition to this is helpful to remember that just becuase you have 3.24 GB available, doesn't mean that 664MB of that is contiguous, which is what NumPy would need to hold it all in memory.

Reply

Sign in to reply online Use email software

4807

Age (days ago)

4807

Last active (days ago)

Download

2 comments

3 participants

tags

participants (3)

Chris Colbert
Jaidev Deshpande
Joe Kington