[BangPypers] How to handle files efficiently in python

Vishal vsapre80 at gmail.com
Thu Mar 24 06:36:14 CET 2011


On Thu, Mar 24, 2011 at 11:03 AM, Vishal <vsapre80 at gmail.com> wrote:

>
>
> On Thu, Mar 24, 2011 at 7:56 AM, Senthil Kumaran <orsenthil at gmail.com>wrote:
>
>> On Thu, Mar 24, 2011 at 02:25:04AM +0530, Vishal wrote:
>> > if you could read the entire file in one go...(i.e. unless your file is
>> more
>> > than 50MB)...how about the following?
>>
>> >>> for line in reversed((open('filename').readlines()[-1:-n:-1])):
>> ...     print line
>>
>> Some comments:
>>
>> > # n is the number of lines you want to read.
>> > l = open(filename).read().rsplit('\n', n+1)
>>
>> - readlines would be better.
>>
>> > # following is to keep the memory requirement low.
>> > # but this is optional, if you only want to print the lines, and then
>> end
>> > the python process.
>> > l[0] = None
>>
>> - Could not get why you are setting the first item to None.
>>
>> > gc.collect()
>>
>> This does not free anything. Where is something un-referenced for it
>> to garbage collect?
>>
>> --
>> Senthil
>> _______________________________________________
>> BangPypers mailing list
>> BangPypers at python.org
>> http://mail.python.org/mailman/listinfo/bangpypers
>>
>
> setting l[0] to None, un-references the earlier string data associated with
> that name, which is then (force) collected by the collect() call.
> I have tried it multiple times, (on my windows box) and it works perfectly.
>
> In fact, I found it to be the only way to make sure memory consumption
> stays low when I have to deal with reading data columns from files that are
> in hundreds of megabytes.
>
> Would love to know of other deterministic ways of freeing memory in Python.
>
>
> Ok, I should have explained a little bit more.

the way rsplit() is used in the my code snippet, makes the last five lines
to be sent to l[1:], where as all the other lines (from the entire file) to
be present in l[0]. That is why I thought of deleting all that memory, in
case you use this snippet as part of a larger program.

Now,

setting l[0] to None, un-references the earlier string data associated with
that name, which is then (force) collected by the collect() call.
I have tried it multiple times, (on my windows box) and it works perfectly.
In fact, I found it to be the only way to make sure memory consumption stays
low when I have to deal with reading data columns from files that are in
hundreds of megabytes.

Would love to know of other deterministic ways of freeing memory in Python.

-- 
Thanks and best regards,
Vishal Sapre


More information about the BangPypers mailing list