[Numpy-discussion] Create numpy array from a list error

David Warde-Farley dwf at cs.toronto.edu
Wed Sep 23 13:26:25 EDT 2009


On 23-Sep-09, at 10:06 AM, Dave Wood wrote:

> Hi all,
>
> I've got a fairly large (but not huge, 58mb) tab seperated text  
> file, with
> approximately 200 columns and 56k rows of numbers and strings.
>
> Here's a snippet of my code to create a numpy matrix from the data  
> file...
>
> ####
>
> data = map(lambda x : x.strip().split('\t'), sys.stdin.readlines())
> data = array(data)

In general I have found that the pattern your using is a bad one,  
because it's first reading the entire file into memory and then making  
a complete copy of it when you call map.

I would instead use

	data = [x.strip().split('\t') for x in sys.stdin]

or even defer the loop until array() is called, with a generator:

	data = (x.strip().split('\t') for x in sys.stdin)

This difference still shouldn't be resulting in a memory error with  
only 57 MB of data, but it'll make things go faster at least.

David



More information about the NumPy-Discussion mailing list