[Python-Dev] PySet API
Raymond Hettinger
raymond.hettinger at verizon.net
Mon Mar 20 09:44:02 CET 2006
[Raymond]
>> What are you proposing to add to the PySet API?
[Barry]
> PySet_Clear(), PySet_Next(), PySet_Update(), and PySet_AsList().
PySet_Clear()
-------------
Use PyObject_CallMethod(s, "clear", NULL).
Or if you need to save a millisecond on an O(n) operation, use
PyNumber_InPlaceSubtract(s,s) as shown in the docs. If the name bugs you, it
only takes a one-line macro to define a wrapper. The set API should not be
cluttered with unnecessary and redundant functions.
PySet_Next()
------------
This is also redundant. The preferred way to iterate over a set should be
PyObject_GetIter(s). The iter api is generic and works for all containers. It
ought to be the one-way-to-do-it.
Further, it doesn't make sense to model this after the dictionary API where the
next function is needed to avoid double lookups by returning pointers to both
the key and value fields at the same time (allowing for modification of the
value field). In contrast, for sets, there is no value field to look-up or
mutate (the key should not be touched). So, we shouldn't be passing around
pointers to the internal structure. I want to keep the internal structure of
sets much more private than they were for dictionaries -- all access should be
through the provided C API functions -- that keeps the implementation flexible
and allows for future improvements without worrying that we've broken code for
someone who has touched the internal structure directly.
Also, the _Next() api is not as safe as the _GetIter api which checks for
mutation during iteration. The safety should not be tossed aside without good
reason.
PySet_Update()
---------------
Use PyObject_CallMethod(s, "update", "O", iterable). That is the preferred way
to access all of the high volume methods. Only the fine grained methods (like
contains, add, pop, or discard) have a need for a direct call. Adding
unnecessary functions for the many-at-once methods gains you nothing -- perhaps
saving a tiny O(1) look-up step in an O(n) operation.
FWIW, the same reasoning also applies to why the list API defines
PyList_Append() but not PyList_Extend().
PySet_AsList()
---------------
There is already a function expressly for this purpose, PySequence_List(s). It
is clear, readable, and is the one-way-to-do-it for turning arbitrary iterables
into a list. FWIW, that function already has an optimization to pre-size the
result list to the correct size, so it runs pretty fast (no over-allocate /
resize dance).
I had considered putting a further optimization inside PySequence_List to have a
special case path for sets (like it does for tuples); however, it occurred to me
that this can never be the time critical part of a program since the time to
convert a set to a list is small compared to the time to construct the set in
the first place (many times longer). IOW, further micro-optimization here is
pointless.
[Raymond]
>> IOW, if I have I still have a say in the matter, the patch will most likely
>> not
>> be accepted.
[Barry]
> Can you explain why the additions above would not be obvious improvements?
Yes. A fatter api is not a better api. The PyObject_CallMethod() approach is
highly readable and assures direct correspondence with a Python set method of
the same name. Trying to save the method name lookup time on O(n) methods is a
false optimization. The concrete API should avoid duplicating services provided
by the abstract API such as PySequence_List(). Also, the set API should not
model parts of the dict API that grant direct access to the internal structure
or were needed because dictionaries had a value field.
As it stands now, it is possible to use sets in C programs and access them in a
way that has a direct correspondence to pure Python code -- using
PyObject_CallMethod() for things we would usually access by name, using the
PyNumber API for things we would access using operators, using other parts of
the abstract API exactly as we would in Python (PyObject_Repr, PyObject_GetIter,
PySequence_List, PyObject_Print, etc.), and using a handful of direct access
functions for the fine grained methods like (add, pop, contains, etc.). IOW, I
like the way the C code looks now and I like the minimal, yet complete API.
Let's don't muck it up.
FWIW, the C implementation in Py2.5 already provides nice speed-ups for many
operations. Likewise, its memory requirements have been reduced by a third.
Try to enjoy the improvements without gilding the lily.
Cheers,
Raymond
More information about the Python-Dev
mailing list