[Python-Dev] PySet API

Raymond Hettinger raymond.hettinger at verizon.net
Sun Mar 26 05:05:59 CEST 2006


[Barry]
> Maybe it will help you to understand why I want a richer concrete API.
> I work on an app that is deeply integrated with Python.  It's hard to
> say whether we embed or extend -- it's a lot of both.  We use Python
> data structures such as lists, dicts, and sets in many places as our
> fundamental tracking objects.

In such an app, it would be trival to write a header:
    #define BarrySet_Clear(s)  PyObject_CallMethod(s, "clear", NULL)

Still, PyObject_Clear(s) would be better.  Better still would be to examine the 
actual uses in the app.  I suspect that most code that clears a set and then 
rebuilds it would be better-off starting with a new empty set (and because of 
freelisting, that is a very fast operation).

Likewise, it only takes a one-line header to define BarrySet_Update(s).  I do 
not want that part of the C API exposed yet.  It is still under development and 
may eventually become a function with a variable length argument list.

It's bogus to say there is some app critical need.  Afterall, these are both 
one-line defines if you personally crave them.  There's no speed argument here 
either -- saving an O(1) dispatch step in an O(n) operation.



> there's no questions that snaking through the abstract API causes us
> real debugging pain

I honestly don't follow you here.  Doesn't your debugger have options for 
step-over and step-into?  Are you debugging the set module or your client code? 
Besides, these are all high volume functions -- do you really want to trace 
through the internal mechanics of set_clear?  Internally, this code has special 
cases for small and large table sizes, it does a pointer swap with an empty 
table to avoid mid-stream resize issues, it treats dummy entries and active 
entries as being the same, and it's not at all beautiful.  Ergo, it is not 
something you want to be tracing through.  The debugging argument is bogus.



> You agree that PySet_Next() is easier to use than the iterator API.
> We will definitely not use the latter, and if your position stands, then
> we'll just have to hack Python to add it (or implement it in an auxiliary 
> module).

If you're dead-set against using the iterator API, then maybe there is something 
wrong with the API.  You should probably start a new thread on why you detest 
the iterator API and see if there are ways to improve it.

Avoidance of the iterator protocol is no reason to proliferate the _Next() api 
across other collections.  That would be a mistake.  It is a bug-factory.  Any 
operation which could potentially call back arbitrary Python  code can also 
potentially trigger a resize or table update,  leaving an invalid pointer. 
Something as simple as PyObject_Hash(k) can trigger a callback.  Usually with 
code like this, it would take Armin less than five minutes to write a pure 
Python crasher.

If you absolutely must go against my recommendation, can we compromise with a 
semi-private _PySet_Next() so that you have a hook but without mucking-up the 
public API for the rest of the world?



> You talk about duck typing, but I don't care about that here.

It's one of the virtues of Python that gets reflected in the abstract API.  IMO, 
it's nice that PyObject_Dir(o) corresponds to "dir(o)" and the same for hash(o), 
repr(o), etc.  I just hope that by hardwiring data types in stone, that your app 
doesn't become rigid and impossible to change.  I certainly do not recommend 
that other people adopt this coding style (avoidance of iterators, duplication 
of abstact api functions in concrete form, etc.)  If you're experiencing 
debugging pain, it may be that avoidance of abstraction is the root cause.



>> I would be supportive if you proposed a PyObject_Clear(o) function
>> (for calling tp_clear slots when they exist and
>> returning an error code when they don't).
>
> I wouldn't object to that, but it wouldn't change my mind about
> PySet_Clear().

This is plain evidence that something is wrong with your approach.  While 
possibly necessary in your environment, the rest of mankind should not have to 
stomach this kind of API clutter. 



More information about the Python-Dev mailing list