python: ascii read
Brian van den Broek
bvande at po-box.mcgill.ca
Thu Sep 16 17:56:35 CEST 2004
Alex Martelli said unto the world upon 2004-09-16 07:22:
> Sebastian Krause <canopus at gmx.net> wrote:
>>I tried to read in some large ascii files (200MB-2GB) in Python using
>>scipy.io.read_array, but it did not work as I expected. The whole idea
>>was to find a fast Python routine to read in arbitrary ascii files, to
>>replace Yorick (which I use right now and which is really fast, but not
>>as general as Python). The problem with scipy.io.read_array was, that it
>>is really slow, returns errors when trying to process large files and it
>>also changes (cuts) the files (after scipy.io.read_array processed a 2GB
>>file its size was only 64MB).
>>Can someone give me hint how to use Python to do this job correctly and
>>fast? (Maybe with another read-in routine.)
> If all you need is what you say -- read a huge amount of ASCII data into
> memory -- it's hard to beat
> data = open('thefile.txt').read()
> mmap may in fact be preferable for many uses, but it doesn't actually
> read (it _maps_ the file into memory instead).
[neophyte question warning]
I'd not been aware of mmap until this post. Looking at the Library
Reference and my trusty copy of Python in a Nutshell, I've gotten some
idea of the differences between using mmap and the .read() method on a
file object -- such as it returns a mutable object vs an immutable
string, constraint on slice assignment that len(oldslice) must be equal
to len(newslice), etc.
But I don't really feel I've a handle on the significance of saying it
maps the file into memory versus reading the file. The naive thought is
that since the data gets into memory, the file must be read. But this
makes me sure I'm missing a distinction in the terminology. Explanations
and pointers for what to read gratefully received.
And, since mmap behave differently on different platforms, I'm mostly a
win32 user looking to transition to Linux.
Best to all,
More information about the Python-list