reading large file

Bengt Richter bokr at oz.net
Fri Sep 5 12:12:32 EDT 2003


On Fri, 5 Sep 2003 08:26:12 +0200, "Sophie Alléon" <alleon at club-internet.fr> wrote:

<toppost moved to preferred location below ;-) />

>"Bengt Richter" <bokr at oz.net> a écrit dans le message de news:
>bj5e61$pjr$0 at 216.39.172.122...
>> On 3 Sep 2003 05:00:39 -0700, g_alleon at yahoo.fr (guillaume) wrote:
>>
>> >I have to read and process a large ASCII file containing a mesh : a
>> >list of points and triangles.
>> >The file is 100 MBytes.
>> >
>> >I first tried to do it in memory but I think I am running out of
>> >memory therefore I decide to use the shelve
>> >module to store my points and elements on disks.
>> >Despite the fact it is slow ... Any hint ? I think I have the same
>> >memory problem but I don't understand why
>> >since  my aPoint should be removed by the gc.
>> >
>> >Have you any idea ?
>> >
>> Since your data is very homogeneous, why don't you store it in a couple of
>> homogeneous arrays? You could easily create a class to give you convenient
>> access via indices or iterators etc. Also you could write load and store
>> methods that could write both arrays in binary to a file. You could
>> consider doing this as a separate conversion from your source file, and
>> then run your app using the binary files and wrapper class.
>>
>> Arrays are described in the array module docs ;-)
>> I imagine you'd want to use the 'd' type for ponts and 'l' for faces.
>>
>> Regards,
>> Bengt Richter
>
>
<topPostText>
>Thanks to your comments, it is now possible to read my large file in a
>couple of minutes
>on my machine.
>
>Guillaume
</topPostText>

Well, so long as you're happy, glad to have played a role ;-)

But I would think that time could still be cut a fair amount. E.g., I imagine just copying
your file at the command line might take 20-25 sec, depending on your system,
and if you have a fast processor, you should be i/o bound a lot, so a lot of
the conversions etc. should be able to happen mostly while waiting for the disk.

There doesn't seem to be any way to tell the array module an estimated full (over or exact)capacity
for an array yet to be populated, but I would think such a feature in the array module would be good
for your kind of application. (Of course, hopefully the fromfile method increases size with a single
memory allocation, but you can't use that if your data requires conversion or filtering (scanf/printf
per-line conversion from/to ascii files might be another useful feature?)).

Anyway, even as is, I'd bet we could get the time down to under a minute, if it was important.
Of course, a couple of minutes is not bad if you're not going to do it over and over.

Regards,
Bengt Richter




More information about the Python-list mailing list