[Guppy-pe-list] An iteration idiom (Was: Re: loading files containing multiple dumps)

Mon Sep 7 11:53:31 EDT 2009

Sverker Nilsson wrote:
> I hope the new loadall method as I wrote about before will resolve this.
> 
> def loadall(self,f):
>     ''' Generates all objects from an open file f or a file named f'''
>     if isinstance(f,basestring):
>         f=open(f)
>     while True:
>         yield self.load(f)

It would be great if load either returned just one result ever, or 
properly implemented the iterator protocol, rather than half 
implementing it...

> Should we call it loadall? It is a generator so it doesn't really load
> all immedietally, just lazily. Maybe call it iload? Or redefine load,
> but that might break existing code so would not be good.

loadall works for me, iload doesn't.

>> Minor rant, why do I have to instantiate a
>> <class 'guppy.heapy.Use._GLUECLAMP_'>
>> to do anything with heapy?
>> Why doesn't heapy just expose load, dump, etc?
> 
> Basically, the need for the h=hpy() idiom is to avoid any global
> variables. 

Eh? What's h then? (And h will reference whatever globals you were 
worried about, surely?)

> Heapy uses some rather big internal data structures, to cache
> such things as dict ownership. I didn't want to have all those things in
> global variables. 

What about attributes of a class instance of some sort then?

> the other objects you created. Also, it allows for several parallel
> invocations of Heapy.

When is that helpful?

> However, I am aware of the extra initial overhead to do h=hpy(). I
> discussed this in my thesis. "Section 4.7.8 Why not importing Use
> directly?" page 36, 
> 
> http://guppy-pe.sourceforge.net/heapy-thesis.pdf

I'm afraid, while I'd love to, I don't have the time to read a thesis...

> Try sunglasses:) (Well, I am aware of this, it was a
> research/experimental system and could have some refactoring :-)

I would suggest creating a minimal system that allows you to do heap() 
and then let other people build what they need from there. Simple is 
*always* better...

>> Less minor rant: this applies to most things to do with heapy... Having 
>> __repr__ return the same as __str__ and having that be a long lump of 
>> text is rather annoying. If you really must, make __str__ return the big 
>> lump of text but have __repr__ return a simple, short, item containing 
>> the class, the id, and maybe the number of contained objects...
> 
> I thought it was cool to not have to use print but get the result
> directly at the prompt.

That's fine, that's what __str__ is for. __repr__ should be short.

>> Hmmm, I'm sure there's a good reason why an item in a set has the exact 
>> same class and iterface as a whole set?
> 
> Um, perhaps no very good reason but... a subset of a set is still a set,
> isn't it?

Yeah, but an item in a set is not a set. __getitem__ should return an 
item, not a subset...

I really think that, by the sounds of it, what is currently implemented 
as __getitem__ should be a `filter` or `subset` method on IdentitySets 
instead...

> objects. Each row is still an IdentitySet, and has the same attributes.

Why? It's semantically different. .load() returns a set of measurements, 
each measurement contains a set of something else, but I don't know what...

> This is also like Python strings work, there is no special character
> type, a character is just a string of length 1.

Strings are *way* more simple in terms of what they are though...

cheers,

Chris

-- 
Simplistix - Content Management, Batch Processing & Python Consulting
            - http://www.simplistix.co.uk