Leakage vs Cyclops

Sun Jan 5 00:57:29 EST 2003

[Robin Becker]
> Hi I've a question relating to leakage. A user says his 40000 page
> document is using large amounts of memory even when he splits it into
> bite size chunks.
>
> I dummy tested with the code below and the output of the cyclops tests
> seem to indicate that no obvious cycles are being created. Even so my
> resource usage seems to increase monotonically.

Beyond what Martin said, Cyclops can only chase pointers in objects of types
it knows (or has been taught) about.  Objects created by extension modules
stop it cold (they appear to be sinks), unless Cyclops is taught how to find
and extract all the references they contain.  I understand you don't think
extensions are relevant to your problem.

> My first thoought was that one or more of our extensions could be the
> cause, but they are only accelerators and aren't required. The
> resources increase with or without them.
>
> With upper limit of 900 pages the peak working set is 15M with 2700 it
> rises to 45M. This seems to be a real leak, but how can I track it
> down?
>
> I used Cyclops to great effect in the past, but has it lost its touch
> with GC or 2.2? I'm using Python 2.2.2 with win32.

I wrote Cyclops, but haven't used it since cyclic gc was added.  I don't
know of any reason it would fail now, as the fundamentals of objects
pointing to other objects haven't changed, and Cyclops's magic was shallow.

Using 2.2.2 on Windows here, and after installing from

    ReportLab_1_17.zip

I'm not seeing what you're seeing in this program:

> from reportlab.lib.styles import ParagraphStyle
> from reportlab.platypus import *
> def main():
>         i = 0
>         story = []
>         for x in xrange(100,900):
>                 story.append(Paragraph(str(x),ParagraphStyle
>			             ('normal')))
>                 story.append(PageBreak())
>                 if i % 100==0:
>                         fn = "test_%03d.pdf"%i
>                         SimpleDocTemplate(fn).build(story)
>                         print 'Build',fn
>                         story = []
>                 i += 1
>         if story:
>                 fn = "test_%03d.pdf"%i
>                 SimpleDocTemplate(fn).build(story)
>                 print 'Build',fn
>                 story = []
>
> if __name__=='__main__':
>         #import Cyclops
>         #z = Cyclops.CycleFinder()
>         #z.run(main)
>         #z.find_cycles(purge_dead_roots=0)
>         #z.show_stats()
>         #z.show_cycles()
>         main()

Note that I commented out the Cyclops calls.  When you say "with an upper
limit of 900 pages", I'm not sure what that corresponds to in the code
above.  I'm guessing it refers to the upper limit in xrange(100,900).  When
I run that, the Python process stays well under 6MB, and ditto if I change
900 to 2700.  Also if I boost it to 10000.  It does seem to grow very slowly
for me, but nothing like the growth rate you report.

I bet I'm seeing a message at the start you don't see, and *maybe* that's a
clue:

C:\Python22>python temp.py
Warn: Python Imaging Library not available
Build test_000.pdf
Build test_100.pdf
Build test_200.pdf
Build test_300.pdf

... etc ...

Build test_9700.pdf
Build test_9800.pdf
Build test_9900.pdf

C:\Python22>

I don't know whether this program would use PIL if it were installed.  If
so, that's one of the only two differences I can see between what you
described and what I did.

The other difference is that I didn't run Cyclops.  Note this part of the
Cyclops module docstring:

"""
+ A (at least one) reference to each root-set object is maintained
  internally, so roots cannot die before invoking .clear() (or the
  CycleFinder is finalized).  This can distort the truth of your
  program, if a __del__ method of some root object that's not involved
  in a cycle could have caused cycles to get broken (this is unusual,
  but possible!).
"""

IOW, every object created via an __init__ method will stay alive, simply
because you *are* using Cyclops.  The test program's

         story = []

line will clear that list from time to time, but won't free any instance
objects.  So it's not surprising to me if the test program exactly as you
gave it shows larger memory use the larger the upper limit in the xrange().