[Guppy-pe-list] An iteration idiom (Was: Re: loading files containing multiple dumps)

Chris Withers chris at simplistix.co.uk
Wed Sep 9 14:47:01 CEST 2009

Sverker Nilsson wrote:
> But I don't think I would want to risk breaking someone's code just for
> this when we could just add a new method.

I don't think anyone will be relying on StopIteration being raised.
If you're worried, do the next release as a 0.10.0 release and explain 
the backwards incompatible change in the release announcement.

> Or we could have an option to hpy() to redefine load() as loadall(), but
> I think it is cleaner (and easier) to just define a new method...

-1 to options to hpy, +1 to loadall but also -1 to lead load() as broken 
as it is...

> As the enclosing class or frame is deallocated, so is its attribute h
> itself. 

Right, but as long as the h hangs around, it hangs on to all the memory 
it's used to build its stats, right? This caused me problems in my most 
recent use of guppy...

> themselves, but I am talking about more severe data that can be hundreds
> of megabytes or more).

Me too ;-) I've been profiling situations where the memory usage was 
over 1GB for processing a 30MB file when I started ;-)

> For example, the setref() method sets a reference point somewhere in h.
> Further calls to heap() would report only objects allocated after that
> call. But you could use a new hpy() instance to see all objects again.
> Multiple threads come to mind, where each thread would have its own
> hpy() object. (Thread safety may still be a problem but at least it
> should be improved by not sharing the hpy() structures.)
> Even in the absence of multiple threads, you might have an outer
> invocation of hpy() that is used for global analysis, with its specific
> options, setref()'s etc, and inner invocations that make some local
> analysis perhaps in a single method.

Fair points :-)

>>> http://guppy-pe.sourceforge.net/heapy-thesis.pdf
>> I'm afraid, while I'd love to, I don't have the time to read a thesis...
> But it is (an important) part of the documentation. 

That may be, but I'd wager a fair amount of beer that buy far the most 
common uses for heapy are:

- finding out what's using the memory consumed by a python process

- log how what the memory consumption is made up of while running a 
large python process

- finding out how much memory is being used

...in that order. Usually on a very tight deadline and with unhappy 
users breathing down their necks. At times like that, reading a thesis 
doesn't really figure into it ;-)

> I'm afraid, while I'd love to, I don't have the time to duplicate the
> thesis here...;-)

I don't think that would help. Succinct help and easy to use functions 
to get those 3 cases above solved is all that's needed ;-)

> Do you mean we should actually _remove_ features to create a new
> standalone system?

Absolutely, why provide more than is used or needed?

> You are free to wrap functions as you find suitable; a minimal wrapper
> module could be just like this:
> # Module heapyheap
> from guppy import hpy
> h=hpy()
> heap=heap()

I don't follow this.. did you mean heap = h.heap()? If so, isn't that 
using all the gubbinz in Use, etc, anyway?

>>>> Less minor rant: this applies to most things to do with heapy... Having 
>>>> __repr__ return the same as __str__ and having that be a long lump of 
>>>> text is rather annoying. If you really must, make __str__ return the big 
>>>> lump of text but have __repr__ return a simple, short, item containing 
>>>> the class, the id, and maybe the number of contained objects...
>>> I thought it was cool to not have to use print but get the result
>>> directly at the prompt.
>> That's fine, that's what __str__ is for. __repr__ should be short.
> No, it's the other way around: __repr__ is used when evaluating directly
> at the prompt.

The docs give the idea:


I believe you "big strings" would be classed as "informal" and so would 
be computed by __str__.

>> Yeah, but an item in a set is not a set. __getitem__ should return an 
>> item, not a subset...
> Usually I think it is called an 'element' of a set rather than an
> 'item'. Python builtin sets can't even do indexing at all.

...'cos it doesn't make sense ;-)

> Likewise, Heapy IdentitySet objects don't support indexing to get at the
> elements directly. 

...then they shouldn't have a __getitem__ method!

> The index (__getitem__) method was available so I
> used it to take the subset of the i'ths row in the partition defined by
> its equivalence order.

That should have another name... I don't know what a partition or 
equivalence order are in the contexts you're using them, but I do know 
that hijacking __getitem__ for this is wrong.

> The subset indexing, being the more well-defined operation, and also
> IMHO more generally useful, thus got the honor to have the [] syntax.

Except it misleads anyone who's programmed in Python for a significant 
period of time and causes problems when combined with the bug in .load :-(

> It would just be another syntax. I don't see the conceptual problem
> since e.g. indexing works just fine like this with strings.

Strings are a bad example...

>>> objects. Each row is still an IdentitySet, and has the same attributes.
>> Why? It's semantically different. 
> No, it's semantically identical. :-)
> Each row is an IdentitySet just like the top level set, but one which
> happens to contain elements being of one particular kind as defined by
> the equivalence relation in use. So it has only 1 row. The equivalence
> relation can be changed by creating a new set by using some of
> the .byxxx attribute: then the set could be made to contain many kinds
> of objects again, getting more rows albeit the objects themselves don't
> change.

Fine, I'll stop arguing, but just be aware that this is confusing and 
you're likely the only person who understands what's really going on or 
how it's supposed to work...


Simplistix - Content Management, Batch Processing & Python Consulting
            - http://www.simplistix.co.uk

More information about the Python-list mailing list