Suitability for long-running text processing?
tsuraan
tsuraan at gmail.com
Mon Jan 8 10:41:23 EST 2007
I have a pair of python programs that parse and index files on my computer
to make them searchable. The problem that I have is that they continually
grow until my system is out of memory, and then things get ugly. I
remember, when I was first learning python, reading that the python
interpreter doesn't gc small strings, but I assumed that was outdated and
sort of forgot about it. Unfortunately, it seems this is still the case. A
sample program (to type/copy and paste into the python REPL):
a=[]
for i in xrange(33,127):
for j in xrange(33,127):
for k in xrange(33,127):
for l in xrange(33, 127):
a.append(chr(i)+chr(j)+chr(k)+chr(l))
del(a)
import gc
gc.collect()
The loop is deep enough that I always interrupt it once python's size is
around 250 MB. Once the gc.collect() call is finished, python's size has
not changed a bit. Even though there are no locals, no references at all to
all the strings that were created, python will not reduce its size. This
example is obviously artificial, but I am getting the exact same behaviour
in my real programs. Is there some way to convince python to get rid of all
the data that is no longer referenced, or do I need to use a different
language?
This has been tried under python 2.4.3 in gentoo linux and python 2.3 under
OS X.3. Any suggestions/work arounds would be much appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070108/bbd7fa2c/attachment.html>
More information about the Python-list
mailing list