[Python-Dev] PySet API

Thu Mar 30 17:54:09 CEST 2006

On Wed, 2006-03-29 at 23:09 -0500, Raymond Hettinger wrote:

> Yes, _PySet_Next() is a good compromise for you and me -- it saves you from 
> writing a hack and saves my API from including a bug factory.  The only issue is 
> that Martin thinks it to be a crummy idea.  Personally, I have no problem with 
> putting-in an undocumented hook for daring people who aspire to swim in 
> quicksand ;-)

Of course if it was "just" a bug factory I might agree.  But since it's
instead a powerful tool that can be misused if misunderstood, I'd tend
to want to document it and explain to people where and why it might or
might not be the right hammer for the nail you're trying to pound in.
But that's just me. :)

> The idea is not yet ready for prime-time.  If I do it for one of the named 
> operations, I will do it for all (to keep the interface uniform).  

Which named operations are you thinking of?

> I haven't yet 
> had the time to work-out the math on whether it would be worthwhile and provide 
> some differential advantage over simply repeating the same operation several 
> times over.  My research question is whether work can be saved by controlling 
> the order of operations -- the concept is somewhat like optimizing multi-term 
> matrix multiplication where the total work effort can vary dramatically 
> depending on which matrices are multiplied together first, A((BC)D) vs (AB)(CD) 
> vs (A(BC))D etc.  Put in business terms, the question is whether I'm able to 
> leverage the associative and commutative properties of some chained set 
> operations.   FWIW, the module already has optimizations to take advantage of 
> the commutative property of binary AND, OR, and SYMMETRIC_DIFFERENCE operations. 
> However, the multi-term optimization probably wait until Py2.6 -- it is too 
> experimental for now.

Does that mean you want to make sure the function is insanely fast
before you'll add it?  Shouldn't you instead decide whether there's even
a need for vararg update first and then figure out how to optimize it?
IOW, if there's a need for vararg update, let's add the API now so that
people can start using it, even if it's not as fast as it could be.
Then they'll be especially grateful when you figure out how to make it
insanely fast in Python 2.6.  

If vararg update isn't useful, then there's no point in adding the API,
even if it can be made insanely fast.  You'd just be wasting your time
because no one would use it.

It seems backwards to design the implementation first and then the API.
An API represents how you want people to use your objects, what
operations and semantics you want it to have, what contracts you're
guaranteeing and so on.  Optimization then is a very nice side benefit.

Let me ask this: if you can't make vararg PySet_Update() insanely fast,
does that mean you won't add a vararg version?  Or you won't add the
function at all?  I'm all for making things fast, but I just don't
believe that in general that should be the primary driver for how you
want people to /use/ your objects.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 309 bytes
Desc: This is a digitally signed message part
Url : http://mail.python.org/pipermail/python-dev/attachments/20060330/5b9db9d8/attachment.pgp