[Guppy-pe-list] An iteration idiom (Was: Re: loading files containing multiple dumps)

Thu Sep 10 07:18:32 EDT 2009

On Wed, 2009-09-09 at 13:47 +0100, Chris Withers wrote:
> Sverker Nilsson wrote:
> > As the enclosing class or frame is deallocated, so is its attribute h
> > itself. 
> 
> Right, but as long as the h hangs around, it hangs on to all the memory 
> it's used to build its stats, right? This caused me problems in my most 
> recent use of guppy...

If you just use heap(), and only want total memory not relative to a
reference point, you can just use hpy() directly. So rather than:

CASE 1:

h=hpy()
h.heap().dump(...)
#other code, the data internal to h is still around
h.heap().dump(...)

you'd do:

CASE 2:

hpy().heap().dump(...)
#other code. No data from Heapy is hanging around
hpy().heap().dump(...)

The difference is that in case 1, the second call to heap() could reuse
the internal data in h, whereas in case 2, it would have to be recreated
which would take longer time. (The data would be such things as the
dictionary owner map.)

However, if you measure memory relative to a reference point, you would
have to keep h around, as in case 1.

[snip]

> > Do you mean we should actually _remove_ features to create a new
> > standalone system?
> 
> Absolutely, why provide more than is used or needed?

How should we understand this? Should we have to support 2 or more
systems depending on what functionality you happen to need? Or do
you mean most functionality is actually _never_ used by
_anybody_ (and will not be in the future)? That would be quite gross
wouldn't it.

I'd be hard pressed to support several versions just for the sake
of some of them would have only the most common methods used in 
certain situations.

That's would be like to create an additional Python dialect that
contained say only the 10 % functionality that is used 90 % of the time.
Quite naturally this is not done anytime soon. Even though one could
perhaps argue it would be easier to use for children etc, the extra
work to support this has not been deemed meaningful.

> 
> > You are free to wrap functions as you find suitable; a minimal wrapper
> > module could be just like this:
> > 
> > # Module heapyheap
> > from guppy import hpy
> > h=hpy()
> > heap=heap()
> 
> I don't follow this.. did you mean heap = h.heap()? 

Actually I meant heap=h.heap

> If so, isn't that using all the gubbinz in Use, etc, anyway?

Depends on what you mean with 'using', but I would say no. 

> >>>> Less minor rant: this applies to most things to do with heapy... Having 
> >>>> __repr__ return the same as __str__ and having that be a long lump of 
> >>>> text is rather annoying. If you really must, make __str__ return the big 
> >>>> lump of text but have __repr__ return a simple, short, item containing 
> >>>> the class, the id, and maybe the number of contained objects...
> >>> I thought it was cool to not have to use print but get the result
> >>> directly at the prompt.
> >> That's fine, that's what __str__ is for. __repr__ should be short.
> > 
> > No, it's the other way around: __repr__ is used when evaluating directly
> > at the prompt.
> 
> The docs give the idea:
> 
> http://docs.python.org/reference/datamodel.html?highlight=__repr__#object.__repr__
> 
> I believe you "big strings" would be classed as "informal" and so would 
> be computed by __str__.

Informal or not, they contain the information I thought was most useful
and are created by __str__, but also with __repr__ because that is used
when evaluated at the prompt.

According to the doc you linked to above, __repr__ should preferably be
a Python expression that could be used to recreate it. I think this has
been discussed and criticized before and in general there is no way to
create such an expression. For example, for the result of h.heap(),
there is no expression that can recreate it later (since the heap
changes) and the object returned is just an IdentitySet, which doesn't
know how it was created.

It also gives as an alternative, "If this is not possible, a string of
the form <...some useful description...> should be returned"

The __repr__ I use don't have the enclosing <>, granted, maybe I missed
this or it wasn't in the docs in 2005 or I didn't think it was important
(still don't) but was that really what the complain was about?

The docs also say that "it is important that the representation is
information-rich and unambiguous."

I thought it was more useful to actually get information of what was
contained in the object directly at the prompt, than try to show how to
recreate it which wasn't possible anyway.

[snip]

> The index (__getitem__) method was available so I
> > used it to take the subset of the i'ths row in the partition defined by
> > its equivalence order.
> 
> That should have another name... I don't know what a partition or 
> equivalence order are in the contexts you're using them, but I do know 
> that hijacking __getitem__ for this is wrong.

Opinions may differ, I'd say one can in principle never 'know' if such a
thing is 'right' or 'wrong', but that gets us into philosophical territory. Anyway...

To get a tutorial provided by someone who did not seem to share your
conviction about indexing, but seemed to regard the way Heapy does it natural
(although has other valid complaints, though it is somewhat outdated i.e.
wrt 64 bit) see:

http://www.pkgcore.org/trac/pkgcore/doc/dev-notes/heapy.rst

which is also available from the Documentation section of the guppy-pe
home page.

Cheers,

Sverker

-- 
Expertise in Linux, embedded systems, image processing, C, Python...
        http://sncs.se