[Guppy-pe-list] An iteration idiom (Was: Re: loading files containing multiple dumps)
chris at simplistix.co.uk
Wed Sep 9 14:47:01 CEST 2009
Sverker Nilsson wrote:
> But I don't think I would want to risk breaking someone's code just for
> this when we could just add a new method.
I don't think anyone will be relying on StopIteration being raised.
If you're worried, do the next release as a 0.10.0 release and explain
the backwards incompatible change in the release announcement.
> Or we could have an option to hpy() to redefine load() as loadall(), but
> I think it is cleaner (and easier) to just define a new method...
-1 to options to hpy, +1 to loadall but also -1 to lead load() as broken
as it is...
> As the enclosing class or frame is deallocated, so is its attribute h
Right, but as long as the h hangs around, it hangs on to all the memory
it's used to build its stats, right? This caused me problems in my most
recent use of guppy...
> themselves, but I am talking about more severe data that can be hundreds
> of megabytes or more).
Me too ;-) I've been profiling situations where the memory usage was
over 1GB for processing a 30MB file when I started ;-)
> For example, the setref() method sets a reference point somewhere in h.
> Further calls to heap() would report only objects allocated after that
> call. But you could use a new hpy() instance to see all objects again.
> Multiple threads come to mind, where each thread would have its own
> hpy() object. (Thread safety may still be a problem but at least it
> should be improved by not sharing the hpy() structures.)
> Even in the absence of multiple threads, you might have an outer
> invocation of hpy() that is used for global analysis, with its specific
> options, setref()'s etc, and inner invocations that make some local
> analysis perhaps in a single method.
Fair points :-)
>> I'm afraid, while I'd love to, I don't have the time to read a thesis...
> But it is (an important) part of the documentation.
That may be, but I'd wager a fair amount of beer that buy far the most
common uses for heapy are:
- finding out what's using the memory consumed by a python process
- log how what the memory consumption is made up of while running a
large python process
- finding out how much memory is being used
...in that order. Usually on a very tight deadline and with unhappy
users breathing down their necks. At times like that, reading a thesis
doesn't really figure into it ;-)
> I'm afraid, while I'd love to, I don't have the time to duplicate the
> thesis here...;-)
I don't think that would help. Succinct help and easy to use functions
to get those 3 cases above solved is all that's needed ;-)
> Do you mean we should actually _remove_ features to create a new
> standalone system?
Absolutely, why provide more than is used or needed?
> You are free to wrap functions as you find suitable; a minimal wrapper
> module could be just like this:
> # Module heapyheap
> from guppy import hpy
I don't follow this.. did you mean heap = h.heap()? If so, isn't that
using all the gubbinz in Use, etc, anyway?
>>>> Less minor rant: this applies to most things to do with heapy... Having
>>>> __repr__ return the same as __str__ and having that be a long lump of
>>>> text is rather annoying. If you really must, make __str__ return the big
>>>> lump of text but have __repr__ return a simple, short, item containing
>>>> the class, the id, and maybe the number of contained objects...
>>> I thought it was cool to not have to use print but get the result
>>> directly at the prompt.
>> That's fine, that's what __str__ is for. __repr__ should be short.
> No, it's the other way around: __repr__ is used when evaluating directly
> at the prompt.
The docs give the idea:
I believe you "big strings" would be classed as "informal" and so would
be computed by __str__.
>> Yeah, but an item in a set is not a set. __getitem__ should return an
>> item, not a subset...
> Usually I think it is called an 'element' of a set rather than an
> 'item'. Python builtin sets can't even do indexing at all.
...'cos it doesn't make sense ;-)
> Likewise, Heapy IdentitySet objects don't support indexing to get at the
> elements directly.
...then they shouldn't have a __getitem__ method!
> The index (__getitem__) method was available so I
> used it to take the subset of the i'ths row in the partition defined by
> its equivalence order.
That should have another name... I don't know what a partition or
equivalence order are in the contexts you're using them, but I do know
that hijacking __getitem__ for this is wrong.
> The subset indexing, being the more well-defined operation, and also
> IMHO more generally useful, thus got the honor to have the  syntax.
Except it misleads anyone who's programmed in Python for a significant
period of time and causes problems when combined with the bug in .load :-(
> It would just be another syntax. I don't see the conceptual problem
> since e.g. indexing works just fine like this with strings.
Strings are a bad example...
>>> objects. Each row is still an IdentitySet, and has the same attributes.
>> Why? It's semantically different.
> No, it's semantically identical. :-)
> Each row is an IdentitySet just like the top level set, but one which
> happens to contain elements being of one particular kind as defined by
> the equivalence relation in use. So it has only 1 row. The equivalence
> relation can be changed by creating a new set by using some of
> the .byxxx attribute: then the set could be made to contain many kinds
> of objects again, getting more rows albeit the objects themselves don't
Fine, I'll stop arguing, but just be aware that this is confusing and
you're likely the only person who understands what's really going on or
how it's supposed to work...
Simplistix - Content Management, Batch Processing & Python Consulting
More information about the Python-list