[Python-Dev] PySet API

Wed Mar 22 03:31:13 CET 2006

[Barry]
> Is it your intent to push for more use of the abstract API instead of
> the concrete APIs for all of Python's C data structures?  Current API
> aside, are you advocating this approach for all new built-in types?
> Would you argue that Python 3.0's C API be stripped of everything but
> the abstract API and the bare essentials of the concrete API?

It's not up to me.  Perhaps someone else can chime-in about the philosophy
of how the C API is supposed to balance abstract and concrete APIs.

For concrete APIs, it does make sense to have methods for fine-grained
access to containers (for the O(1) ops that tend to appear inside loops).

I know that the more one uses the abstract API, the more likely the code
is going to be able to accept duck typed inputs.  Also, most things that
have tp_slots have a corresponding abstract method instead of tons
a concrete access points; hence, I would be supportive if you proposed a
PyObject_Clear(o) function (for calling tp_clear slots when they exist and
returning an error code when they don't).

For setobject.c, if I still have a say in the matter, my strong preference is to
keep the API minimal, expose fine-grained functions for efficiency, use
PyNumber methods for direct access to operator style set operations,
and use the abstract API for everything else.

Though you have a different world view about fat vs thin APIs and on
whether the C API should track the Python API, perhaps you can agree
with my reasoning in this case.  There is a semantic difference between
code like s+=t and s.update(t).  The former only works when t is a set
and the latter works for any iterable.  When the C code corresponds to
the Python code, that knowledge is kept intact and there is no confusion between
PyNumber_InPlaceAdd(s,t) vs PyObject_CallMethod(s, "update", "(O)", t).
I did not want to fill-up the C API with methods corresponding to every
named method in the Python API -- besides making the API unnecessarily fat,
IMO, it introduces ambiguity about whether PySet_Update would be the
set-only operation or the generator iteratable version.

With respect to clear(), I do not not want to further enshrine that method.
IMO, most uses of it are misguided and the code would be better-off allowing
the existing set to be decreffed away and using PySet_New() instead.
I realize that is a matter of taste, but it is a basic of API design that the
module author provide clues about how the module is intended to be used.

With respect to PySet_Next(), I do agree that it is both a bit faster and easier 
to use
than the iterator API.  That is more of a reflection of issues with the iterator 
API
than it is an indication that PySet_Next() would be worthwhile.

That being said, I stand by my rejection of that method.  For dictionaries, the
problems with the approach were some offset by the advantages of getting
a simultaneous key/value lookup and by direct access to the memory where the
value was stored.  Sets, of course, do not have these offseting benefits.

The problems with the approach however do apply to sets.   I do not want to be
passing around pointers to the private internal hash table -- programs have no 
business
altering that memory directly.  You have to be careful with even simple cases
such as using PyString_AS_STRING which returns a pointer and then the
underlying object can get decreffed away leaving a pointer to an invalid
memory location (that happened to me recently but I caught it during code 
review).

Also, there is the safety issue of having the table mutate during iteration.
While your note downplayed the issue noting that the risks are fully disclosed,
I speak from experience in saying that it is all too easy to accidentally allow
the table to mutate during iteration -- I learned this the hard-way and had
to undo a number of my uses of set_next() which had been used for speed
and for being simpler than the iteator api.  A single decref or call to a key's
PyObject_Hash() is enough to trigger arbitrary Python code running during
the middle of iteration, possibly resulting in a resize or value mutation.
Take a look at the code for set_clear() to see the dance required to avoid
this risk.

IOW, the safety considerations alone preclude PySet_Next().  Instead, use
PyObject_GetIter() and enjoy the benefits of safety, duck typing, and re-usable 
code.

Because of the safety issues and passing internal pointers, I prefer that the 
_Next()
api NOT get replicated throughout the language.  It was needed for dicts, but
shouldn't start sprouting up everywhere.

All that being said, I understand that some of it comes down to taste and
philosophical differences.  As the module author, the existing API of course 
reflects my
tastes and world-view.  I have given you simple, one line alternatives to each
proposal and listed the reasons for each choice, so it can't be argued that the
API is somehow crippling your work.

I'm sympathethic to your reluctance to use PyObject_CallMethod().  But, do
understand that it is simply an aversion.  It works just fine and makes no
speed difference on coarse-grained, O(n) methods.  I like its clarity and
direct correspondence to the pure Python api for sets.  Because the methods
in question would all be METH_O or METH_NOARGS, the only static
type checking you've lost is verifying the number of arguments.  I would suggest
that if you have the wrong number of arguments for s.update(t) or s.clear(),
then you have problems a C API can't solve ;-)

Cheers,

Raymond

P.S.  One other thought:  I don't want to crystalize the API in a way that 
precludes
future development of the module.  One possibility for the future is for updates 
to take
multiple arguments such as s.update(t,u,v)  causing three updates to be 
folded-in at once.