Memory Hungry in Windows

Thu Nov 27 02:19:39 EST 2003

On Thu, Nov 27, 2003 at 02:00:27PM +0800, Geiger Ho wrote:
> Hi,
> 
>   I have a program. It opens a text file about 2 MB. It reads in every
> line and appends them to a string. It then does a re.sub() to replace the
> contents of the long string and then write to a file.

Repeatedly appending strings is slow, and probably likely to cause memory
fragmentation.

Does your re.sub have to operate on the whole file at once, or can it work
one line at a time?  If it can, I would apply the re.sub to each line as I
read it in, and immediately write it out to the output file.

If you do need to operate on the whole file, you should probably do:
    long_string = open('some_file').read()

rather than what it sounds like you're doing:
    long_string = ''
    for line in open('some_file'):
        long_string += line

This should be faster and require less memory.

>   In Linux, it consumes about 5 MB, but in W2k, it concumes 20 MB! Why
> there is so much difference for the same piece of code? This has
> frightened me that I don't know if the program will crash for no memeory
> available somedays.

I expect this is a result of the differences between how linux and windows
reports memory usage (particularly with things like shared libraries), as
well as differences between the platform malloc implementations (although
with pymalloc the default in 2.3, that shouldn't be as large a factor).

-Andrew.