Python garbage collector/memory manager behaving strangely
Dave Angel
d at davea.name
Sun Sep 16 22:12:46 EDT 2012
On 09/16/2012 09:07 PM, Jadhav, Alok wrote:
> Hi Everyone,
>
>
>
> I have a simple program which reads a large file containing few million
> rows, parses each row (`numpy array`) and converts into an array of
> doubles (`python array`) and later writes into an `hdf5 file`. I repeat
> this loop for multiple days. After reading each file, i delete all the
> objects and call garbage collector. When I run the program, First day
> is parsed without any error but on the second day i get `MemoryError`. I
> monitored the memory usage of my program, during first day of parsing,
> memory usage is around **1.5 GB**. When the first day parsing is
> finished, memory usage goes down to **50 MB**. Now when 2nd day starts
> and i try to read the lines from the file I get `MemoryError`. Following
> is the output of the program.
>
>
>
>
>
> source file extracted at C:\rfadump\au\2012.08.07.txt
>
> parsing started
>
> current time: 2012-09-16 22:40:16.829000
>
> 500000 lines parsed
>
> 1000000 lines parsed
>
> 1500000 lines parsed
>
> 2000000 lines parsed
>
> 2500000 lines parsed
>
> 3000000 lines parsed
>
> 3500000 lines parsed
>
> 4000000 lines parsed
>
> 4500000 lines parsed
>
> 5000000 lines parsed
>
> parsing done.
>
> end time is 2012-09-16 23:34:19.931000
>
> total time elapsed 0:54:03.102000
>
> repacking file
>
> done
>
> > s:\users\aaj\projects\pythonhf\rfadumptohdf.py(132)generateFiles()
>
> -> while single_date <= self.end_date:
>
> (Pdb) c
>
> *** 2012-08-08 ***
>
> source file extracted at C:\rfadump\au\2012.08.08.txt
>
> cought an exception while generating file for day 2012-08-08.
>
> Traceback (most recent call last):
>
> File "rfaDumpToHDF.py", line 175, in generateFile
>
> lines = self.rawfile.read().split('|\n')
>
> MemoryError
>
>
>
> I am very sure that windows system task manager shows the memory usage
> as **50 MB** for this process. It looks like the garbage collector or
> memory manager for Python is not calculating the free memory correctly.
> There should be lot of free memory but it thinks there is not enough.
>
>
>
> Any idea?
>
>
>
> Thanks.
>
>
>
>
>
> Alok Jadhav
>
> CREDIT SUISSE AG
>
> GAT IT Hong Kong, KVAG 67
>
> International Commerce Centre | Hong Kong | Hong Kong
>
> Phone +852 2101 6274 | Mobile +852 9169 7172
>
> alok.jadhav at credit-suisse.com | www.credit-suisse.com
> <http://www.credit-suisse.com/>
>
>
>
Don't blame CPython. You're trying to do a read() of a large file,
which will result in a single large string. Then you split it into
lines. Why not just read it in as lines, in which case the large string
isn't necessary. Take a look at the readlines() function. Chances are
that even that is unnecessary, but i can't tell without seeing more of
the code.
lines = self.rawfile.read().split('|\n')
lines = self.rawfile.readlines()
When a single large item is being allocated, it's not enough to have
sufficient free space, the space also has to be contiguous. After a
program runs for a while, its space naturally gets fragmented more and
more. it's the nature of the C runtime, and CPython is stuck with it.
--
DaveA
More information about the Python-list
mailing list