[Guppy-pe-list] An iteration idiom (Was: Re: loading files containing multiple dumps)

Fri Sep 11 11:32:42 EDT 2009

Sverker Nilsson wrote:
> If you just use heap(), and only want total memory not relative to a
> reference point, you can just use hpy() directly. So rather than:
> 
> CASE 1:
> 
> h=hpy()
> h.heap().dump(...)
> #other code, the data internal to h is still around
> h.heap().dump(...)
> 
> you'd do:
> 
> CASE 2:
> 
> hpy().heap().dump(...)
> #other code. No data from Heapy is hanging around
> hpy().heap().dump(...)
> 
> The difference is that in case 1, the second call to heap() could reuse
> the internal data in h, 

But that internal data would have to hang around, right? (which might, 
in itself, cause memory problems?)

> whereas in case 2, it would have to be recreated
> which would take longer time. (The data would be such things as the
> dictionary owner map.)

How long is longer? Do you have any metrics that would help make good 
decisions about when to keep a hpy() instance around and when it's best 
to save memory?

>>> Do you mean we should actually _remove_ features to create a new
>>> standalone system?
>> Absolutely, why provide more than is used or needed?
> 
> How should we understand this? Should we have to support 2 or more
> systems depending on what functionality you happen to need? Or do
> you mean most functionality is actually _never_ used by
> _anybody_ (and will not be in the future)? That would be quite gross
> wouldn't it.

I'm saying have one project and dump all the excess stuff that no-one 
but you uses ;-)

Or, maybe easier, have a core, separate, package that just has the 
essentials in a simply, clean fashion and then another package that 
builds on this to add all the other stuff...

> It also gives as an alternative, "If this is not possible, a string of
> the form <...some useful description...> should be returned"
> 
> The __repr__ I use don't have the enclosing <>, granted, maybe I missed
> this or it wasn't in the docs in 2005 or I didn't think it was important
> (still don't) but was that really what the complain was about?

No, it was about the fact that when I do repr(something_from_heapy) I 
get a shedload of text.

> I thought it was more useful to actually get information of what was
> contained in the object directly at the prompt, than try to show how to
> recreate it which wasn't possible anyway.

Agreed, but I think the stuff you currently have in __repr__ would be 
better placed in its own method:

 >>> heap()
<IdentitySet object at 0x0000 containing 10 items>
 >>> _.show()
... all the current __repr__ output

>> That should have another name... I don't know what a partition or 
>> equivalence order are in the contexts you're using them, but I do know 
>> that hijacking __getitem__ for this is wrong.
> 
> Opinions may differ, I'd say one can in principle never 'know' if such a
> thing is 'right' or 'wrong', but that gets us into philosophical territory. Anyway...

I would bet that if you asked 100 experienced python programmers, most 
of them would tell you that what you're doing with __getitem__ is wrong, 
some might even say evil ;-)

> To get a tutorial provided by someone who did not seem to share your
> conviction about indexing, but seemed to regard the way Heapy does it natural
> (although has other valid complaints, though it is somewhat outdated i.e.
> wrt 64 bit) see:
> 
> http://www.pkgcore.org/trac/pkgcore/doc/dev-notes/heapy.rst

This link has become broken recently, but I don't remember reading the 
author's comments as liking the indexing stuff...

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
            - http://www.simplistix.co.uk