[Guppy-pe-list] An iteration idiom (Was: Re: loading files containing multiple dumps)
Chris Withers
chris at simplistix.co.uk
Wed Sep 9 08:47:01 EDT 2009
Sverker Nilsson wrote:
> But I don't think I would want to risk breaking someone's code just for
> this when we could just add a new method.
I don't think anyone will be relying on StopIteration being raised.
If you're worried, do the next release as a 0.10.0 release and explain
the backwards incompatible change in the release announcement.
> Or we could have an option to hpy() to redefine load() as loadall(), but
> I think it is cleaner (and easier) to just define a new method...
-1 to options to hpy, +1 to loadall but also -1 to lead load() as broken
as it is...
> As the enclosing class or frame is deallocated, so is its attribute h
> itself.
Right, but as long as the h hangs around, it hangs on to all the memory
it's used to build its stats, right? This caused me problems in my most
recent use of guppy...
> themselves, but I am talking about more severe data that can be hundreds
> of megabytes or more).
Me too ;-) I've been profiling situations where the memory usage was
over 1GB for processing a 30MB file when I started ;-)
> For example, the setref() method sets a reference point somewhere in h.
> Further calls to heap() would report only objects allocated after that
> call. But you could use a new hpy() instance to see all objects again.
>
> Multiple threads come to mind, where each thread would have its own
> hpy() object. (Thread safety may still be a problem but at least it
> should be improved by not sharing the hpy() structures.)
>
> Even in the absence of multiple threads, you might have an outer
> invocation of hpy() that is used for global analysis, with its specific
> options, setref()'s etc, and inner invocations that make some local
> analysis perhaps in a single method.
Fair points :-)
>>> http://guppy-pe.sourceforge.net/heapy-thesis.pdf
>> I'm afraid, while I'd love to, I don't have the time to read a thesis...
>
> But it is (an important) part of the documentation.
That may be, but I'd wager a fair amount of beer that buy far the most
common uses for heapy are:
- finding out what's using the memory consumed by a python process
- log how what the memory consumption is made up of while running a
large python process
- finding out how much memory is being used
...in that order. Usually on a very tight deadline and with unhappy
users breathing down their necks. At times like that, reading a thesis
doesn't really figure into it ;-)
> I'm afraid, while I'd love to, I don't have the time to duplicate the
> thesis here...;-)
I don't think that would help. Succinct help and easy to use functions
to get those 3 cases above solved is all that's needed ;-)
> Do you mean we should actually _remove_ features to create a new
> standalone system?
Absolutely, why provide more than is used or needed?
> You are free to wrap functions as you find suitable; a minimal wrapper
> module could be just like this:
>
> # Module heapyheap
> from guppy import hpy
> h=hpy()
> heap=heap()
I don't follow this.. did you mean heap = h.heap()? If so, isn't that
using all the gubbinz in Use, etc, anyway?
>>>> Less minor rant: this applies to most things to do with heapy... Having
>>>> __repr__ return the same as __str__ and having that be a long lump of
>>>> text is rather annoying. If you really must, make __str__ return the big
>>>> lump of text but have __repr__ return a simple, short, item containing
>>>> the class, the id, and maybe the number of contained objects...
>>> I thought it was cool to not have to use print but get the result
>>> directly at the prompt.
>> That's fine, that's what __str__ is for. __repr__ should be short.
>
> No, it's the other way around: __repr__ is used when evaluating directly
> at the prompt.
The docs give the idea:
http://docs.python.org/reference/datamodel.html?highlight=__repr__#object.__repr__
I believe you "big strings" would be classed as "informal" and so would
be computed by __str__.
>> Yeah, but an item in a set is not a set. __getitem__ should return an
>> item, not a subset...
>
> Usually I think it is called an 'element' of a set rather than an
> 'item'. Python builtin sets can't even do indexing at all.
...'cos it doesn't make sense ;-)
> Likewise, Heapy IdentitySet objects don't support indexing to get at the
> elements directly.
...then they shouldn't have a __getitem__ method!
> The index (__getitem__) method was available so I
> used it to take the subset of the i'ths row in the partition defined by
> its equivalence order.
That should have another name... I don't know what a partition or
equivalence order are in the contexts you're using them, but I do know
that hijacking __getitem__ for this is wrong.
> The subset indexing, being the more well-defined operation, and also
> IMHO more generally useful, thus got the honor to have the [] syntax.
Except it misleads anyone who's programmed in Python for a significant
period of time and causes problems when combined with the bug in .load :-(
> It would just be another syntax. I don't see the conceptual problem
> since e.g. indexing works just fine like this with strings.
Strings are a bad example...
>>> objects. Each row is still an IdentitySet, and has the same attributes.
>> Why? It's semantically different.
>
> No, it's semantically identical. :-)
>
> Each row is an IdentitySet just like the top level set, but one which
> happens to contain elements being of one particular kind as defined by
> the equivalence relation in use. So it has only 1 row. The equivalence
> relation can be changed by creating a new set by using some of
> the .byxxx attribute: then the set could be made to contain many kinds
> of objects again, getting more rows albeit the objects themselves don't
> change.
Fine, I'll stop arguing, but just be aware that this is confusing and
you're likely the only person who understands what's really going on or
how it's supposed to work...
Chris
--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
More information about the Python-list
mailing list