
I posted a message to c.l.p about the upcoming alpha 1. Just in case anybody here's been snoozing, 2.5 alpha 1 is coming up real quick, hopefully within a couple of weeks. If you have any *major* features (particularly implemented in C) that you want to see in 2.5, bring it up now. I want to strive for feature completeness by alpha 1. I know we will have some .py modules that won't make it into alpha 1, but they really should make it in by alpha 2 or be deferred to 2.6. For any important bugs or patches, add comments to them. There's still a bunch that I want to see go in. If all goes well, with alphas 1 & 2, maybe we could even skip an alpha 3? We can make a beta 3 if things don't go as well as I anticipate. I'm hoping the buildbot helped shake out a lot of the bugs we might have seen in the alphas. We'll have to wait and see what makes sense. n

"Neal Norwitz" <nnorwitz@gmail.com> wrote:
I posted a message to c.l.p about the upcoming alpha 1.
Just in case anybody here's been snoozing, 2.5 alpha 1 is coming up real quick, hopefully within a couple of weeks. If you have any *major* features (particularly implemented in C) that you want to see in 2.5, bring it up now. I want to strive for feature completeness by alpha 1. I know we will have some .py modules that won't make it into alpha 1, but they really should make it in by alpha 2 or be deferred to 2.6.
(as previously requested August 12, 2005) (http://mail.python.org/pipermail/python-dev/2005-August/055356.html)
For 2.5a1...
Some exposure of _PyLong_AsByteArray() and _PyLong_FromByteArray() to Python. There was a discussion about this almost a year ago (http://python.org/sf/1023290), and no mechanism (struct format code addition, binascii.tolong/fromlong, long.tostring/fromstring, ...) actually made it into Python 2.4 . At this point, I'd be happy to get /any/ mechanism, with a preference to struct and/or binascii (I'd put them in both, if only because different groups of people people may look for them in both places, and people who use one tend to like to use that one for as much as possible, and because the code additions in both are minor).
Raymond followed up with the following: (http://mail.python.org/pipermail/python-dev/2005-August/055358.html)
Assign 1023290 to me and I'll get it done in the next month or so.
It was assigned, but he didn't get around to it at the time. I can easily update the patch/test/documentation for struct, but my CPython abilities are somewhat lacking, and I wouldn't be comfortable writing two new functions in the binascii module. - Josiah

Neal Norwitz wrote:
Just in case anybody here's been snoozing, 2.5 alpha 1 is coming up real quick, hopefully within a couple of weeks. If you have any *major* features (particularly implemented in C) that you want to see in 2.5, bring it up now. I want to strive for feature completeness by alpha 1. I know we will have some .py modules that won't make it into alpha 1, but they really should make it in by alpha 2 or be deferred to 2.6.
the xmlplus/xmlcore situation needs to be sorted out. </F>

[Neal]
I posted a message to c.l.p about the upcoming alpha 1.
Just in case anybody here's been snoozing, 2.5 alpha 1 is coming up real quick, hopefully within a couple of weeks. If you have any *major* features (particularly implemented in C) that you want to see in 2.5, bring it up now. I want to strive for feature completeness by alpha 1. I know we will have some .py modules that won't make it into alpha 1, but they really should make it in by alpha 2 or be deferred to 2.6.
FYI, I have several non-major C components to go in but not in time for alpha 1. They include some minor fix-ups in the sets module, the str.partition function, add gc to itertools.tee, a couple of functions in binascii, add itertools.izip_longest, Crutcher's patch to make exec take dictionary arguments, move the peepholer to just before the assembler, and implement Alex's iterator copier for a number of iterables (xrange, repeat, count, reversed, list, tuple, deque, dict, and set). All of these have been previously discussed/approved and should go in to the second alpha. The only one that is borderline is Crutcher's patch. I will try to free up some time to get that into alpha 1. It touches critical parts of the interpreted and needs to be reviewed, tested, timed, and exercised thoroughly. Raymond

Raymond Hettinger <python@rcn.com> wrote:
They include [...] the str.partition function,
Where is the current version of this patch? After reharsing the archives, I have an additional suggestion which I didn't find already mentioned in the discussion. What about: str.partition() -> (left, right or None) str.rparition() -> (left or None, right) which means: "foo:bar".partition(":") -> ("foo", "bar") "foo:bar".rpartition(":") -> ("foo", "bar") "foo:".partition(":") -> ("foo", "") "foo:".rpartition(":") -> ("foo", "") ":foo".partition(":") -> ("", "foo") ":foo".rpartition(":") -> ("", "foo") "foo".partition(":") -> ("foo", None) "foo".rpartition(":") -> (None, "foo") Notice that None-checking can be done as a way to know if the separator was found. I mentally went through the diff here (http://mail.python.org/pipermail/python-dev/2005-August/055781.html) and found out that most (all?) the usages of '_' disappears with this semantic. -- Giovanni Bajo

On 3/18/06, Raymond Hettinger <python@rcn.com> wrote:
FYI, I have several non-major C components to go in but not in time for alpha 1. They include some minor fix-ups in the sets module, the str.partition function, add gc to itertools.tee, a couple of functions in binascii, add itertools.izip_longest, Crutcher's patch to make exec take dictionary arguments, move the peepholer to just before the assembler, and implement Alex's iterator copier for a number of iterables (xrange, repeat, count, reversed, list, tuple, deque, dict, and set). All of these have been previously discussed/approved and should go in to the second alpha.
The only one that is borderline is Crutcher's patch. I will try to free up some time to get that into alpha 1. It touches critical parts of the interpreted and needs to be reviewed, tested, timed, and exercised thoroughly.
All those seem reasonable. Do the functions you mention address Josiah's patch? n

[Raymond]
FYI, I have several non-major C components to go in but not in time for alpha 1. They include some minor fix-ups in the sets module, the str.partition function, add gc to itertools.tee, a couple of functions in binascii, add itertools.izip_longest, Crutcher's patch to make exec take dictionary arguments, move the peepholer to just before the assembler, and implement Alex's iterator copier for a number of iterables (xrange, repeat, count, reversed, list, tuple, deque, dict, and set). All of these have been previously discussed/approved and should go in to the second alpha.
The only one that is borderline is Crutcher's patch. I will try to free up some time to get that into alpha 1. It touches critical parts of the interpreted and needs to be reviewed, tested, timed, and exercised thoroughly.
[Neal]
All those seem reasonable. Do the functions you mention address Josiah's patch?
I believe so. They are binascii.b2long() and binascii.long2b(). And, no, the name of the latter wasn't taken from a song ;-) Raymond

On Fri, 2006-03-17 at 23:48 -0800, Neal Norwitz wrote:
Just in case anybody here's been snoozing, 2.5 alpha 1 is coming up real quick, hopefully within a couple of weeks. If you have any *major* features (particularly implemented in C) that you want to see in 2.5, bring it up now. I want to strive for feature completeness by alpha 1. I know we will have some .py modules that won't make it into alpha 1, but they really should make it in by alpha 2 or be deferred to 2.6.
I'd like to get some feedback on my PEP 352 comments, and if there's general agreement on the hierarchy I proposed (so far so good :), then I'd like to take a crack at implementing them. -Barry

On 3/18/06, Barry Warsaw <barry@python.org> wrote:
On Fri, 2006-03-17 at 23:48 -0800, Neal Norwitz wrote:
Just in case anybody here's been snoozing, 2.5 alpha 1 is coming up real quick, hopefully within a couple of weeks. If you have any *major* features (particularly implemented in C) that you want to see in 2.5, bring it up now. I want to strive for feature completeness by alpha 1. I know we will have some .py modules that won't make it into alpha 1, but they really should make it in by alpha 2 or be deferred to 2.6.
I'd like to get some feedback on my PEP 352 comments, and if there's general agreement on the hierarchy I proposed (so far so good :), then I'd like to take a crack at implementing them.
-1. See my response in the other thread. The focus on 'Error' is mistaken, and we have a large body of existing code that derives from Exception. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sun, 2006-03-19 at 19:45 -0800, Guido van Rossum wrote:
-1. See my response in the other thread. The focus on 'Error' is mistaken, and we have a large body of existing code that derives from Exception.
Just to be clear, are you saying -1 only for Python 2.5 or -1 also for Python 3.0? If the former, as I mentioned before, that would be fine with me. If the latter too, then I won't waste my time following up on the thread or writing a PEP. -Barry

On 3/19/06, Barry Warsaw <barry@python.org> wrote:
On Sun, 2006-03-19 at 19:45 -0800, Guido van Rossum wrote:
-1. See my response in the other thread. The focus on 'Error' is mistaken, and we have a large body of existing code that derives from Exception.
Just to be clear, are you saying -1 only for Python 2.5 or -1 also for Python 3.0? If the former, as I mentioned before, that would be fine with me. If the latter too, then I won't waste my time following up on the thread or writing a PEP.
I'm -0 for changing this in 3.0; a larger-scale reorganization could be undertaken but it's not a big priority. Before you spend more energy on this, I'd like to address the process for Python 3000, which is too chaotic right now. See my new thread titled "Python 3000 Process". -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Sun, 2006-03-19 at 20:40 -0800, Guido van Rossum wrote:
I'm -0 for changing this in 3.0; a larger-scale reorganization could be undertaken but it's not a big priority.
Before you spend more energy on this, I'd like to address the process for Python 3000, which is too chaotic right now. See my new thread titled "Python 3000 Process".
Fair enough. -Barry

On Fri, 2006-03-17 at 23:48 -0800, Neal Norwitz wrote:
Just in case anybody here's been snoozing, 2.5 alpha 1 is coming up real quick, hopefully within a couple of weeks. If you have any *major* features (particularly implemented in C) that you want to see in 2.5, bring it up now. I want to strive for feature completeness by alpha 1. I know we will have some .py modules that won't make it into alpha 1, but they really should make it in by alpha 2 or be deferred to 2.6.
Oh, also, we have a couple of additions to the PySet C API. I'll work on putting together an SF patch for them over the weekend. -Barry

[Barry Warsaw]
Oh, also, we have a couple of additions to the PySet C API. I'll work on putting together an SF patch for them over the weekend.
What are you proposing to add to the PySet API? I designed an API that was both minimal and complete. The idea was to provide direct access to fined grained functions and access the rest through the existing abstract API for PyObject and PyNumber as detailed in the Set API docs. I tried out the API to translate a number of set algorithms and found that the API was easy-to-use and sufficient as-is. There may be room for variants of the type checking macros, but I would like the rest of the C API to remain as-is unless some compelling deficiency can be shown. It is easy to expand the API later but almost impossible to take anything back once in the field. IOW, if I have I still have a say in the matter, the patch will most likely not be accepted. Raymond

On Sat, 2006-03-18 at 19:22 -0500, Raymond Hettinger wrote:
[Barry Warsaw]
Oh, also, we have a couple of additions to the PySet C API. I'll work on putting together an SF patch for them over the weekend.
What are you proposing to add to the PySet API?
PySet_Clear(), PySet_Next(), PySet_Update(), and PySet_AsList().
I designed an API that was both minimal and complete. The idea was to provide direct access to fined grained functions and access the rest through the existing abstract API for PyObject and PyNumber as detailed in the Set API docs.
We use the above functions quite a bit in our embedded app, so we want them to be as efficient as possible. They should also be obvious to C programmers (e.g. using PyNumber_InPlaceSubtract() is much less obvious than PySet_Clear()).
I tried out the API to translate a number of set algorithms and found that the API was easy-to-use and sufficient as-is. There may be room for variants of the type checking macros, but I would like the rest of the C API to remain as-is unless some compelling deficiency can be shown. It is easy to expand the API later but almost impossible to take anything back once in the field.
The above mirrors what's available for dict objects, and if you are using sets for collections of objects, I believe they make the most sense.
IOW, if I have I still have a say in the matter, the patch will most likely not be accepted.
Can you explain why the additions above would not be obvious improvements? -Barry

[Raymond]
What are you proposing to add to the PySet API?
[Barry]
PySet_Clear(), PySet_Next(), PySet_Update(), and PySet_AsList().
PySet_Clear() ------------- Use PyObject_CallMethod(s, "clear", NULL). Or if you need to save a millisecond on an O(n) operation, use PyNumber_InPlaceSubtract(s,s) as shown in the docs. If the name bugs you, it only takes a one-line macro to define a wrapper. The set API should not be cluttered with unnecessary and redundant functions. PySet_Next() ------------ This is also redundant. The preferred way to iterate over a set should be PyObject_GetIter(s). The iter api is generic and works for all containers. It ought to be the one-way-to-do-it. Further, it doesn't make sense to model this after the dictionary API where the next function is needed to avoid double lookups by returning pointers to both the key and value fields at the same time (allowing for modification of the value field). In contrast, for sets, there is no value field to look-up or mutate (the key should not be touched). So, we shouldn't be passing around pointers to the internal structure. I want to keep the internal structure of sets much more private than they were for dictionaries -- all access should be through the provided C API functions -- that keeps the implementation flexible and allows for future improvements without worrying that we've broken code for someone who has touched the internal structure directly. Also, the _Next() api is not as safe as the _GetIter api which checks for mutation during iteration. The safety should not be tossed aside without good reason. PySet_Update() --------------- Use PyObject_CallMethod(s, "update", "O", iterable). That is the preferred way to access all of the high volume methods. Only the fine grained methods (like contains, add, pop, or discard) have a need for a direct call. Adding unnecessary functions for the many-at-once methods gains you nothing -- perhaps saving a tiny O(1) look-up step in an O(n) operation. FWIW, the same reasoning also applies to why the list API defines PyList_Append() but not PyList_Extend(). PySet_AsList() --------------- There is already a function expressly for this purpose, PySequence_List(s). It is clear, readable, and is the one-way-to-do-it for turning arbitrary iterables into a list. FWIW, that function already has an optimization to pre-size the result list to the correct size, so it runs pretty fast (no over-allocate / resize dance). I had considered putting a further optimization inside PySequence_List to have a special case path for sets (like it does for tuples); however, it occurred to me that this can never be the time critical part of a program since the time to convert a set to a list is small compared to the time to construct the set in the first place (many times longer). IOW, further micro-optimization here is pointless. [Raymond]
IOW, if I have I still have a say in the matter, the patch will most likely not be accepted.
[Barry]
Can you explain why the additions above would not be obvious improvements?
Yes. A fatter api is not a better api. The PyObject_CallMethod() approach is highly readable and assures direct correspondence with a Python set method of the same name. Trying to save the method name lookup time on O(n) methods is a false optimization. The concrete API should avoid duplicating services provided by the abstract API such as PySequence_List(). Also, the set API should not model parts of the dict API that grant direct access to the internal structure or were needed because dictionaries had a value field. As it stands now, it is possible to use sets in C programs and access them in a way that has a direct correspondence to pure Python code -- using PyObject_CallMethod() for things we would usually access by name, using the PyNumber API for things we would access using operators, using other parts of the abstract API exactly as we would in Python (PyObject_Repr, PyObject_GetIter, PySequence_List, PyObject_Print, etc.), and using a handful of direct access functions for the fine grained methods like (add, pop, contains, etc.). IOW, I like the way the C code looks now and I like the minimal, yet complete API. Let's don't muck it up. FWIW, the C implementation in Py2.5 already provides nice speed-ups for many operations. Likewise, its memory requirements have been reduced by a third. Try to enjoy the improvements without gilding the lily. Cheers, Raymond

Is it your intent to push for more use of the abstract API instead of the concrete APIs for all of Python's C data structures? Current API aside, are you advocating this approach for all new built-in types? Would you argue that Python 3.0's C API be stripped of everything but the abstract API and the bare essentials of the concrete API? If so, then I think this is extremely misguided. C is not Python, and while the abstract API is useful for some things, so is the concrete API. In fact, the Python C API's clarity, utility, completeness, and discoverability has made Python one of the nicest languages to embed and extend, and I see no reason to deviate from that for the sake of blind TOOWTDI worship. We have a rich tradition of providing both concrete and abstract APIs at the C layer, and I think that's a good thing that we should continue here. On Mon, 2006-03-20 at 03:44 -0500, Raymond Hettinger wrote:
PySet_Clear() ------------- Use PyObject_CallMethod(s, "clear", NULL).
Or if you need to save a millisecond on an O(n) operation, use PyNumber_InPlaceSubtract(s,s) as shown in the docs. If the name bugs you, it only takes a one-line macro to define a wrapper. The set API should not be cluttered with unnecessary and redundant functions.
This is a great example of what I'm talking about. You lose some static C compiler checks when you use either of these alternatives. C is not Python and we shouldn't try to make it so. The documentation is much less concise too, and if macros are encouraged, then every extension will invent their own name, further reducing readability, or use the obvious choice of PySet_Clear() and then question why Python doesn't provide this itself. This also has a detrimental effect on debugging. Macros suck for debugging and going through all the abstract API layers sucks. A nice, clean, direct call is so much more embedder-friendly. In addition, you essentially have all the pieces for PySet_Clear() right there in front of you, so why not expose them to embedders and make their lives easier? Forcing them to go through the abstract API or use obscure alternatives does not improve the API. It seems a false economy to not include concrete API calls just to end up back in setobject.c after layers of indirection.
PySet_Next() ------------ This is also redundant. The preferred way to iterate over a set should be PyObject_GetIter(s). The iter api is generic and works for all containers. It ought to be the one-way-to-do-it.
For the C API, I disagree for the reasons stated above. In this specific case, using the iterator API actually imposes more pain on embedders because there are more things you have to keep track of and that can go wrong. PyDict_Next() is a very nice and direct API, where you often don't have to worry about reference counting (borrowed refs in this case are the right thing to return). You also don't have to worry about error conditions, and both of these things reduce bugs because it usually means less code. PySet_Next() would provide the same benefits. I don't buy the safety argument against PyDict_Next()/PySet_Next() because they are clearly documented as requiring no modification during iteration. Again, this is what I mean by useful concrete vs. abstract APIs. When you /know/ you have a set and you /know/ you won't be modifying it, PySet_Next() is the perfect interface. If you will be modifying the set, or don't know what kind of sequence you have, then the abstract API is the right thing to use.
Further, it doesn't make sense to model this after the dictionary API where the next function is needed to avoid double lookups by returning pointers to both the key and value fields at the same time (allowing for modification of the value field). In contrast, for sets, there is no value field to look-up or mutate (the key should not be touched). So, we shouldn't be passing around pointers to the internal structure. I want to keep the internal structure of sets much more private than they were for dictionaries -- all access should be through the provided C API functions -- that keeps the implementation flexible and allows for future improvements without worrying that we've broken code for someone who has touched the internal structure directly.
The implementation of PySet_Next() would not return setentrys, it would return PyObjects. Yes, those would be borrowed refs to setentry.keys, but you still avoid direct access to internal structures.
Also, the _Next() api is not as safe as the _GetIter api which checks for mutation during iteration. The safety should not be tossed aside without good reason.
PySet_Update() --------------- Use PyObject_CallMethod(s, "update", "O", iterable). That is the preferred way to access all of the high volume methods.
Again, I disagree, but I don't think I need to restate my reasons.
Only the fine grained methods (like contains, add, pop, or discard) have a need for a direct call. Adding unnecessary functions for the many-at-once methods gains you nothing -- perhaps saving a tiny O(1) look-up step in an O(n) operation.
FWIW, the same reasoning also applies to why the list API defines PyList_Append() but not PyList_Extend().
Personally, I think that's a bug in the PyList C API. I haven't complained because I've rarely needed it, but it /is/ a deficiency.
PySet_AsList() --------------- There is already a function expressly for this purpose, PySequence_List(s).
I'll grant you this one. ;) Forget PySet_AsList(). I'll try to answer the rest of your message without repeating myself too much. ;)
As it stands now, it is possible to use sets in C programs and access them in a way that has a direct correspondence to pure Python code -- using PyObject_CallMethod() for things we would usually access by name, using the PyNumber API for things we would access using operators, using other parts of the abstract API exactly as we would in Python (PyObject_Repr, PyObject_GetIter, PySequence_List, PyObject_Print, etc.), and using a handful of direct access functions for the fine grained methods like (add, pop, contains, etc.). IOW, I like the way the C code looks now and I like the minimal, yet complete API. Let's don't muck it up.
This is where you and I disagree. Again, C is not Python. I actually greatly dislike having to use things like PyObject_Call() for concrete objects. First, the C code does not look like Python at all, and is actually /less/ readable because now you have to look in two places to understand what the code does. Second, it imposes much more pain when debugging because of all the extra layers you have to step through. But of course, with a rich concrete and abstract API, as most Python types have, we both get to appease our aesthetic demons, and chose the right tool for the job.
FWIW, the C implementation in Py2.5 already provides nice speed-ups for many operations. Likewise, its memory requirements have been reduced by a third. Try to enjoy the improvements without gilding the lily.
Let's embrace C and continue to make life easier for the C coder. You can't argue that going through all the rigamarole of the iterator API would be faster than PySet_Next(), and it certainly won't be more readable or easier to debug. A foolish consistency, and all that... Cheers, -Barry

[Barry]
Is it your intent to push for more use of the abstract API instead of the concrete APIs for all of Python's C data structures? Current API aside, are you advocating this approach for all new built-in types? Would you argue that Python 3.0's C API be stripped of everything but the abstract API and the bare essentials of the concrete API?
It's not up to me. Perhaps someone else can chime-in about the philosophy of how the C API is supposed to balance abstract and concrete APIs. For concrete APIs, it does make sense to have methods for fine-grained access to containers (for the O(1) ops that tend to appear inside loops). I know that the more one uses the abstract API, the more likely the code is going to be able to accept duck typed inputs. Also, most things that have tp_slots have a corresponding abstract method instead of tons a concrete access points; hence, I would be supportive if you proposed a PyObject_Clear(o) function (for calling tp_clear slots when they exist and returning an error code when they don't). For setobject.c, if I still have a say in the matter, my strong preference is to keep the API minimal, expose fine-grained functions for efficiency, use PyNumber methods for direct access to operator style set operations, and use the abstract API for everything else. Though you have a different world view about fat vs thin APIs and on whether the C API should track the Python API, perhaps you can agree with my reasoning in this case. There is a semantic difference between code like s+=t and s.update(t). The former only works when t is a set and the latter works for any iterable. When the C code corresponds to the Python code, that knowledge is kept intact and there is no confusion between PyNumber_InPlaceAdd(s,t) vs PyObject_CallMethod(s, "update", "(O)", t). I did not want to fill-up the C API with methods corresponding to every named method in the Python API -- besides making the API unnecessarily fat, IMO, it introduces ambiguity about whether PySet_Update would be the set-only operation or the generator iteratable version. With respect to clear(), I do not not want to further enshrine that method. IMO, most uses of it are misguided and the code would be better-off allowing the existing set to be decreffed away and using PySet_New() instead. I realize that is a matter of taste, but it is a basic of API design that the module author provide clues about how the module is intended to be used. With respect to PySet_Next(), I do agree that it is both a bit faster and easier to use than the iterator API. That is more of a reflection of issues with the iterator API than it is an indication that PySet_Next() would be worthwhile. That being said, I stand by my rejection of that method. For dictionaries, the problems with the approach were some offset by the advantages of getting a simultaneous key/value lookup and by direct access to the memory where the value was stored. Sets, of course, do not have these offseting benefits. The problems with the approach however do apply to sets. I do not want to be passing around pointers to the private internal hash table -- programs have no business altering that memory directly. You have to be careful with even simple cases such as using PyString_AS_STRING which returns a pointer and then the underlying object can get decreffed away leaving a pointer to an invalid memory location (that happened to me recently but I caught it during code review). Also, there is the safety issue of having the table mutate during iteration. While your note downplayed the issue noting that the risks are fully disclosed, I speak from experience in saying that it is all too easy to accidentally allow the table to mutate during iteration -- I learned this the hard-way and had to undo a number of my uses of set_next() which had been used for speed and for being simpler than the iteator api. A single decref or call to a key's PyObject_Hash() is enough to trigger arbitrary Python code running during the middle of iteration, possibly resulting in a resize or value mutation. Take a look at the code for set_clear() to see the dance required to avoid this risk. IOW, the safety considerations alone preclude PySet_Next(). Instead, use PyObject_GetIter() and enjoy the benefits of safety, duck typing, and re-usable code. Because of the safety issues and passing internal pointers, I prefer that the _Next() api NOT get replicated throughout the language. It was needed for dicts, but shouldn't start sprouting up everywhere. All that being said, I understand that some of it comes down to taste and philosophical differences. As the module author, the existing API of course reflects my tastes and world-view. I have given you simple, one line alternatives to each proposal and listed the reasons for each choice, so it can't be argued that the API is somehow crippling your work. I'm sympathethic to your reluctance to use PyObject_CallMethod(). But, do understand that it is simply an aversion. It works just fine and makes no speed difference on coarse-grained, O(n) methods. I like its clarity and direct correspondence to the pure Python api for sets. Because the methods in question would all be METH_O or METH_NOARGS, the only static type checking you've lost is verifying the number of arguments. I would suggest that if you have the wrong number of arguments for s.update(t) or s.clear(), then you have problems a C API can't solve ;-) Cheers, Raymond P.S. One other thought: I don't want to crystalize the API in a way that precludes future development of the module. One possibility for the future is for updates to take multiple arguments such as s.update(t,u,v) causing three updates to be folded-in at once.

[Me]
There is a semantic difference between code like s+=t and s.update(t). The former only works when t is a set and the latter works for any iterable. When the C code corresponds to the Python code, that knowledge is kept intact and there is no confusion between PyNumber_InPlaceAdd(s,t) vs PyObject_CallMethod(s, "update", "(O)", t).
Of course, that should have been s|=t and PyNumber_InPlaceOr(). Raymond

On Tue, 2006-03-21 at 22:01 -0500, Raymond Hettinger wrote:
[Me]
There is a semantic difference between code like s+=t and s.update(t). The former only works when t is a set and the latter works for any iterable. When the C code corresponds to the Python code, that knowledge is kept intact and there is no confusion between PyNumber_InPlaceAdd(s,t) vs PyObject_CallMethod(s, "update", "(O)", t).
Of course, that should have been s|=t and PyNumber_InPlaceOr().
Heh, my point exactly. You wouldn't have gotten confused about PySet_Update(). :) -Barry

On Tue, 2006-03-21 at 21:31 -0500, Raymond Hettinger wrote:
[Barry]
Is it your intent to push for more use of the abstract API instead of the concrete APIs for all of Python's C data structures? Current API aside, are you advocating this approach for all new built-in types? Would you argue that Python 3.0's C API be stripped of everything but the abstract API and the bare essentials of the concrete API?
It's not up to me. Perhaps someone else can chime-in about the philosophy of how the C API is supposed to balance abstract and concrete APIs.
I think it's an important point to discuss, both for trying to resolve this impasse and for helping to direct future API designs, especially as we get into Python 3.0. Maybe it will help you to understand why I want a richer concrete API. I work on an app that is deeply integrated with Python. It's hard to say whether we embed or extend -- it's a lot of both. We use Python data structures such as lists, dicts, and sets in many places as our fundamental tracking objects. So we know what we have, i.e. it's definitely a set here and another set there, and we want to merge one into the other. Or we know we have a set of foo's here and we need to iterate over them quickly (yes, in a tight loop) to count things or whatever. So there's no question that a concrete API is very useful to us. And there's no questions that snaking through the abstract API causes us real debugging pain (a point which you mostly glossed over). We understand the gotchas about reference counting and the possibilities and implications about calling back into Python. Remember, we're all consenting adults here. I don't think we're unique here, as the rich concrete API of other fundamental Python objects attests to. Your comments lead me to think that you aren't taking this important use case into account. You talk about duck typing, but I don't care about that here. I absolutely know I have a PySet, so why cause me pain to use it?
I know that the more one uses the abstract API, the more likely the code is going to be able to accept duck typed inputs. Also, most things that have tp_slots have a corresponding abstract method instead of tons a concrete access points; hence, I would be supportive if you proposed a PyObject_Clear(o) function (for calling tp_clear slots when they exist and returning an error code when they don't).
I wouldn't object to that, but it wouldn't change my mind about PySet_Clear(). I'm not arguing against a rich abstract API, I'm arguing for having a richer concrete API too. And in this case, only slightly richer.
For setobject.c, if I still have a say in the matter, my strong preference is to keep the API minimal, expose fine-grained functions for efficiency, use PyNumber methods for direct access to operator style set operations, and use the abstract API for everything else.
P.S. One other thought: I don't want to crystalize the API in a way
I think this is a silly stance. You agree that PySet_Next() is easier to use than the iterator API. We will definitely not use the latter, and if your position stands, then we'll just have to hack Python to add it (or implement it in an auxiliary module). But I don't want to have to do that, so I really don't understand your reluctance to add three obviously useful functions. Another point: these don't expose internal bits of the set implementation. Well, except for the opaque position pointer, but that's still enough data hiding for me because you're never supposed to /do/ anything with that variable except pass it right back to PySet_Next(). PySet_Clear() and PySet_Update() don't expose any implementation details -- that's the whole point! that
precludes future development of the module. One possibility for the future is for updates to take multiple arguments such as s.update(t,u,v) causing three updates to be folded-in at once.
I don't see any way that my proposals preclude that. And besides, the three API calls I'm proposing are useful /today/. But just so we all know what we're talking about, I've uploaded the patch to SourceForge: http://sourceforge.net/tracker/index.php?func=detail&aid=1458476&group_id=5470&atid=305470 As with all good patches, there's (almost) more test code than implementation. Cheers, -Barry

[Barry]
Maybe it will help you to understand why I want a richer concrete API. I work on an app that is deeply integrated with Python. It's hard to say whether we embed or extend -- it's a lot of both. We use Python data structures such as lists, dicts, and sets in many places as our fundamental tracking objects.
In such an app, it would be trival to write a header: #define BarrySet_Clear(s) PyObject_CallMethod(s, "clear", NULL) Still, PyObject_Clear(s) would be better. Better still would be to examine the actual uses in the app. I suspect that most code that clears a set and then rebuilds it would be better-off starting with a new empty set (and because of freelisting, that is a very fast operation). Likewise, it only takes a one-line header to define BarrySet_Update(s). I do not want that part of the C API exposed yet. It is still under development and may eventually become a function with a variable length argument list. It's bogus to say there is some app critical need. Afterall, these are both one-line defines if you personally crave them. There's no speed argument here either -- saving an O(1) dispatch step in an O(n) operation.
there's no questions that snaking through the abstract API causes us real debugging pain
I honestly don't follow you here. Doesn't your debugger have options for step-over and step-into? Are you debugging the set module or your client code? Besides, these are all high volume functions -- do you really want to trace through the internal mechanics of set_clear? Internally, this code has special cases for small and large table sizes, it does a pointer swap with an empty table to avoid mid-stream resize issues, it treats dummy entries and active entries as being the same, and it's not at all beautiful. Ergo, it is not something you want to be tracing through. The debugging argument is bogus.
You agree that PySet_Next() is easier to use than the iterator API. We will definitely not use the latter, and if your position stands, then we'll just have to hack Python to add it (or implement it in an auxiliary module).
If you're dead-set against using the iterator API, then maybe there is something wrong with the API. You should probably start a new thread on why you detest the iterator API and see if there are ways to improve it. Avoidance of the iterator protocol is no reason to proliferate the _Next() api across other collections. That would be a mistake. It is a bug-factory. Any operation which could potentially call back arbitrary Python code can also potentially trigger a resize or table update, leaving an invalid pointer. Something as simple as PyObject_Hash(k) can trigger a callback. Usually with code like this, it would take Armin less than five minutes to write a pure Python crasher. If you absolutely must go against my recommendation, can we compromise with a semi-private _PySet_Next() so that you have a hook but without mucking-up the public API for the rest of the world?
You talk about duck typing, but I don't care about that here.
It's one of the virtues of Python that gets reflected in the abstract API. IMO, it's nice that PyObject_Dir(o) corresponds to "dir(o)" and the same for hash(o), repr(o), etc. I just hope that by hardwiring data types in stone, that your app doesn't become rigid and impossible to change. I certainly do not recommend that other people adopt this coding style (avoidance of iterators, duplication of abstact api functions in concrete form, etc.) If you're experiencing debugging pain, it may be that avoidance of abstraction is the root cause.
I would be supportive if you proposed a PyObject_Clear(o) function (for calling tp_clear slots when they exist and returning an error code when they don't).
I wouldn't object to that, but it wouldn't change my mind about PySet_Clear().
This is plain evidence that something is wrong with your approach. While possibly necessary in your environment, the rest of mankind should not have to stomach this kind of API clutter.

I'd really like to see someone else who understands the issues (i.e. using the Python C-API) weigh in. Both Barry and Raymond are clever programmers who generally understand what's Pythonic, and I find myself agreeing with whoever posted last. ;-) Having another perspective would probably shed some light here. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet

On Mar 25, 2006, at 9:57 PM, Aahz wrote:
I'd really like to see someone else who understands the issues (i.e. using the Python C-API) weigh in. Both Barry and Raymond are clever programmers who generally understand what's Pythonic, and I find myself agreeing with whoever posted last. ;-) Having another perspective would probably shed some light here.
My general preference is rather well-known, and I quote the advice I gave in "Python in a Nutshell"...: """ Some of the functions callable on specifically-typed objects [...] duplicate functionality that is also available from PyObject_ functions; in these cases, you should almost invariably use the more general PyObject_ function instead. I don’t cover such almost- redundant functions in this book. """ However, I don't go as far as suggesting PyObject_CallMethod and the like... I'd much rather have abstract-layer PyObject_... functions, as long as they're applicable to two or more concrete built-in types (for example, IMHO adding PyObject_Clear is a no-brainer -- it's obviously right). And I'm on the fence regarding the specific issue of PySet_Next. So, having carefully staked out a position smack in the middle, I cheerfully now expect to be fired upon from both sides!-) Alex

[Alex]
And I'm on the fence regarding the specific issue of PySet_Next.
So, having carefully staked out a position smack in the middle, I cheerfully now expect to be fired upon from both sides!-)
Okay, here's the first cheap shot ;-) Which of the following pieces of code is preferable? The first loops with the iterator protocol and the second loops with the _next protocol. static long frozenset_hash(PyObject *self) { PySetObject *so = (PySetObject *)self; long h, hash = 0; PyObject *it, *key; if (so->hash != -1) return so->hash; it = PyObject_GetIter(self); if (it == NULL) return -1; while ((key = PyIter_Next(it)) != NULL) { h = PyObject_Hash(key); Py_DECREF(key); if (h == -1) { Py_DECREF(it); return -1; } hash ^= h * 3644798167; } Py_DECREF(it); if (PyErr_Occurred()) return -1; if (hash == -1) hash = 590923713L; so->hash = hash; return hash; } static long frozenset_hash(PyObject *self) { PySetObject *so = (PySetObject *)self; long h, hash = 0; PyObject *key; Py_ssize_t pos = 0; if (so->hash != -1) return so->hash; while (set_next(so, &pos, &key)) { h = PyObject_Hash(key); if (h == -1) { return -1; } hash ^= h * 3644798167; } if (hash == -1) hash = 590923713L; so->hash = hash; return hash; }

On Sun, Mar 26, 2006, Raymond Hettinger wrote:
[Alex]
And I'm on the fence regarding the specific issue of PySet_Next.
So, having carefully staked out a position smack in the middle, I cheerfully now expect to be fired upon from both sides!-)
Okay, here's the first cheap shot ;-) Which of the following pieces of code is preferable? The first loops with the iterator protocol and the second loops with the _next protocol.
Speaking as a person who does relatively little C programming, I don't see much difference between them. The first example is more Pythonic -- for Python. I agree with Barry that it's not much of a virtue for C code. However, I do have one nitpick with both your examples; I don't know whether this is an artifact of them being examples:
hash ^= h * 3644798167;
Seems to me that magic numbers like this need to be made constants and explained with a comment. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet

[Aahz]
Speaking as a person who does relatively little C programming, I don't see much difference between them. The first example is more Pythonic -- for Python. I agree with Barry that it's not much of a virtue for C code.
It was a trick question. Everyone is supposed to be attracted to the _next version because it is shorter, faster, and takes less ref counting management. However, the _next version has a hard-to-find bug. The call to PyObject_Hash() can trigger arbitrary Python code and possibly mutate the table, leaving pointers to invalid memory addresses. It would likely take Armin less than five minutes to write a pure Python crasher for the code. And THAT is why PySet_Next() should never come into being. The iterator form is more duck-typable and re-usable than the set specific _next version, but the example was chosen to take that issue off of the table and just focus on mutation issues.
However, I do have one nitpick with both your examples; I don't know whether this is an artifact of them being examples:
hash ^= h * 3644798167;
Seems to me that magic numbers like this need to be made constants and explained with a comment
FWIW, the actual code does have comments. I stripped them out of the posting because they weren't relevant to the code comparison. Raymond

On Mar 26, 2006, at 8:43 AM, Raymond Hettinger wrote:
[Aahz]
Speaking as a person who does relatively little C programming, I don't see much difference between them. The first example is more Pythonic -- for Python. I agree with Barry that it's not much of a virtue for C code.
It was a trick question. Everyone is supposed to be attracted to the _next version because it is shorter, faster, and takes less ref counting management. However, the _next version has a hard-to-find bug. The call to PyObject_Hash() can trigger arbitrary Python code and possibly mutate the table, leaving pointers to invalid memory addresses. It would likely take Armin less than five minutes to write a pure Python crasher for the code. And THAT is why PySet_Next() should never come into being.
Sure, accidentally mutating underlying iterables is a subtle (but alas frequent) bug, but I don't see why it should be any harsher when the loop is using a hypothetical PySet_Next than when it is using PyIter_Next -- whatever precautions the latter takes to detect the bug and raise an exception instead of crashing, wouldn't it be at least as feasible for PySet_Next to take similar precautions (probably easier, since PySet_Next need only worry about one concrete case rather than an arbitrary variety)? What does PyDict_Next do in similar cases, and why couldn't PySet_Next behave similarly? (Yes, I could/should look it up myself, but I'm supposed to be working on the 2nd Ed of the Nutshell, whose deadline is getting worryingly close...;-). Alex

[Alex]
Sure, accidentally mutating underlying iterables is a subtle (but alas frequent) bug, but I don't see why it should be any harsher when the loop is using a hypothetical PySet_Next than when it is using PyIter_Next -- whatever precautions the latter takes to detect the bug and raise an exception instead of crashing, wouldn't it be at least as feasible for PySet_Next to take similar precautions
The difference is that the PySet_Next returns pointers to the table keys and that the mutation occurs AFTER the call to PySet_Next, leaving pointers to invalid addresses. IOW, the function cannot detect the mutation. PyIter_Next on the other hand returns an object (not a pointer to an object such as those in the hash table). If the table has mutated before the function is called, then it simply raises an exception instead of returning an object. If the table mutates afterwards, it is no big deal because the returned object is still valid. FWIW, here's an easier to understand example of the same ilk (taken from real code): s = PyString_AS_STRING(item); Py_DECREF(item); if (s == NULL) break; x = strtol(s, &endptr, 10); The problem, of course, is that the decref can render the string pointer invalid. The correct code moves the decref after the strtol() call and inside the conditional. This is at the core of the issue. I don't want the set iteration API to return pointers inside the table. The PyIter_Next API takes a couple more lines but is easy to get correct and has nice duck-typing properties. For dicts, the _next api is worth the risk because it saves a double lookup and because there are legitimate use cases for changing the contents of the value field directly inside the hash table. For sets, those arguments don't apply. We have a safe way that takes a couple more lines and a proposed second-way-to-do-it that is dangerously attractive, yet somewhat unsafe. For that reason, I say no to PySet_Next(). Hopefully, as the module author and principal maintainer, I get some say in the matter. Raymond Nothing is more conducive to peace of mind than not having any opinions at all. -- Georg Christoph Lichtenberg

Raymond Hettinger wrote:
The difference is that the PySet_Next returns pointers to the table keys and that the mutation occurs AFTER the call to PySet_Next, leaving pointers to invalid addresses. IOW, the function cannot detect the mutation.
I'm coming late to the discussion: where did anybody ever suggest that PySet_Next should return a pointer into the set? Looking over the entire discussion, I could not find any mentioning of a specific API. If it is similar to PyDict_Next, it will have PyObject** /input/ variables, which are really meant as PyObject* /output/ variables. But yes, PyDict_Next returns a borrowed reference, so if the dictionary mutates between calls, your borrowed reference might become stale.
PyIter_Next on the other hand returns an object (not a pointer to an object such as those in the hash table).
PyIter_Next behaves identical wrt. to result types to PyDict_Next. The difference is that PyIter_Next always returns a new reference (or NULL in case of an exception). For the caller, a clear usage strategy follows from this: either discard the references before making a potentially-mutating call, or Py_INCREF the set element before making that mutating call. Of course, *after* you made the mutating call, your iteration position might be bogus, as the set might have been reorganized. If the position is represented as a Py_ssize_t (as it is for PyDict_Next), the only consequence of continuing the iteration is that you might see elements twice or not at all - you cannot cause a crash with that. Regards, Martin

The difference is that the PySet_Next returns pointers to the table keys and that the mutation occurs AFTER the call to PySet_Next, leaving pointers to invalid addresses. IOW, the function cannot detect the mutation.
I'm coming late to the discussion: where did anybody ever suggest that PySet_Next should return a pointer into the set? Looking over the entire discussion, I could not find any mentioning of a specific API.
Pardon, I bungled the terminology. PySet_Next returns a borrowed reference. That is problematic is arbitrary Python code can be run afterwards (such as PyObject_Hash in the example). We could make a version that returns a new reference or immediately Py_INCREF the reference but then PySet_Next() loses its charm and you might as well be using PyIter_Next(). Aside from bad pointers, the issue of mid-stream table mutation has other reliability issues stemming from the contents of the table potentially changing in the an arbitrary way as the iteration proceeds. That means you can make very few guarantees about the meaningfulness of the results even if you don't crash due to a bad pointer. We have a perfectly good way to iterate with PyIter_Next(). It may take a couple of extra lines, but it is easy to get correct and has no surprises. It seems that the only issue is that Barry says that he refuses to use the iterator protocol. Heck, just turn it into a list and index directly. There is no need to muck-up the set api for this. Raymond

Raymond Hettinger wrote:
Pardon, I bungled the terminology. PySet_Next returns a borrowed reference. That is problematic is arbitrary Python code can be run afterwards (such as PyObject_Hash in the example).
Not really. It is under the control of the caller of PySet_Next what (if any) Python code is invoked, and getting this correct is straight-forward (once you know that it yields borrowed references). I don't know what specific application Barry has in mind, but I'm sure he can get it right (although it might be an interesting experiment to test that theory :-) In general, I would expect that people find it easier to get code involving PyDict_Next right than code dealing with iterators - primarily because of the error cases you have to consider.
We have a perfectly good way to iterate with PyIter_Next(). It may take a couple of extra lines, but it is easy to get correct and has no surprises. It seems that the only issue is that Barry says that he refuses to use the iterator protocol. Heck, just turn it into a list and index directly. There is no need to muck-up the set api for this.
I don't care that much either way, although I would prefer to see an actual, current use case for PySet_Next, rather than theoretical, made-up examples. I don't expect to use Python sets in C code at all, personally. Regards, Martin

On Sun, 2006-03-26 at 21:50 +0200, "Martin v. Löwis" wrote:
I don't know what specific application Barry has in mind, but I'm sure he can get it right (although it might be an interesting experiment to test that theory :-) In general, I would expect that people find it easier to get code involving PyDict_Next right than code dealing with iterators - primarily because of the error cases you have to consider.
I can't post the code because it's proprietary, but I gave a general feel to the types of things we do in a previous response. Imagine you have application objects that are also PyObjects. They have application specific state and behavior. They can be put in Python sets and they can be iterated over to check that state, invoke that behavior (which won't involve trips into Python), or perhaps add them to other collections. Do this 50 to 60 times in your application and I think you'll start to see why the iterator protocol is incredibly cumbersome to use.
I don't care that much either way, although I would prefer to see an actual, current use case for PySet_Next, rather than theoretical, made-up examples. I don't expect to use Python sets in C code at all, personally.
We really, honestly do use PySet_Next in many places. We implemented that API for Python 2.4 exactly because the iterator protocol was way too much overhead. The posted patch is a port to Python 2.5. We obviously can't add this to Python 2.4, but I had really hoped that we wouldn't have to maintain this extension for subsequent versions. I'm frankly astonished to get so much pushback from Raymond about it. -Barry

On Sun, 2006-03-26 at 13:24 -0500, Raymond Hettinger wrote:
We have a perfectly good way to iterate with PyIter_Next(). It may take a couple of extra lines, but it is easy to get correct and has no surprises. It seems that the only issue is that Barry says that he refuses to use the iterator protocol. Heck, just turn it into a list and index directly. There is no need to muck-up the set api for this.
I just think you have a narrow vision of how Python sets can be used in a C application. -Barry

On Sun, 2006-03-26 at 19:59 +0200, "Martin v. Löwis" wrote:
If it is similar to PyDict_Next, it will have PyObject** /input/ variables, which are really meant as PyObject* /output/ variables.
Yep, that's exactly what my posted patch does.
For the caller, a clear usage strategy follows from this: either discard the references before making a potentially-mutating call, or Py_INCREF the set element before making that mutating call.
Yep. Nice and simple. And if you're not making a potentially-mutating call, you don't have to worry about even that. These /are/ valid use cases. -Barry

On Sun, 2006-03-26 at 11:43 -0500, Raymond Hettinger wrote:
It was a trick question. Everyone is supposed to be attracted to the _next version because it is shorter, faster, and takes less ref counting management. However, the _next version has a hard-to-find bug. The call to PyObject_Hash() can trigger arbitrary Python code and possibly mutate the table, leaving pointers to invalid memory addresses. It would likely take Armin less than five minutes to write a pure Python crasher for the code. And THAT is why PySet_Next() should never come into being.
We're clearly going in circles here, and it's obvious we're not going to agree. The fact that PySet_Next() can be used incorrectly is no reason not to include it. There are /lots/ of things in Python that if you use incorrectly will screw you. So you document them, teach people when not to use them, and teach them how to use them correctly when they /are/ the right thing to use. I don't want to be babied into using inappropriate and cumbersome APIs which, yes, can be a source of their own subtle bugs. -Barry

Why don't we expose _PySet_Next() for Barry and leave it out of the public API for everyone else. Raymond

On Mar 27, 2006, at 7:20 AM, Raymond Hettinger wrote:
Why don't we expose _PySet_Next() for Barry and leave it out of the public API for everyone else.
There are precedents for adding some functionality to the C API but not documenting it to ensure "non advanced users" don't get hurt -- that's how we added the ability to raise exceptions in different threads, in particular. Not sure if this is the best solution here, but I'm just pointing out that it's definitely not "unthinkable", procedurally speaking. Alex

Why don't we expose _PySet_Next() for Barry and leave it out of the public API for everyone else.
[Alex]
There are precedents for adding some functionality to the C API but not documenting it to ensure "non advanced users" don't get hurt -- that's how we added the ability to raise exceptions in different threads, in particular. Not sure if this is the best solution here, but I'm just pointing out that it's definitely not "unthinkable", procedurally speaking.
Thank would be nice. It gives me the ability to keep a clean, sane API while at the same time making sure that my most important customer (Barry) gets his needs met. Raymond

Raymond Hettinger wrote:
Why don't we expose _PySet_Next() for Barry and leave it out of the public API for everyone else.
That is stupid. If Barry wants a "private" PySet_Next function, he can just implement it himself, no need to include it in the release. It should be included only if it is meant to be public. Regards, Martin

On Mon, 2006-03-27 at 23:21 +0200, "Martin v. Löwis" wrote:
Raymond Hettinger wrote:
Why don't we expose _PySet_Next() for Barry and leave it out of the public API for everyone else.
That is stupid. If Barry wants a "private" PySet_Next function, he can just implement it himself, no need to include it in the release. It should be included only if it is meant to be public.
The one thing I'm trying to avoid is requiring us to patch Python when we move to 2.5. The most obvious and straightforward implementation of the 3 API calls are very simple wrappers around static functions defined in setobject.c (see the patch I posted). So if the patch gets rejected, then implementing them as a hack on standard Python 2.5 will require patching setobject.c or cracking open the PySet structure and /really/ violating the principle of data hiding. Neither option is very appealing. That's why I said in a previous message that I could live with this as a compromise, although I agree with your (Martin's) sentiment. -Barry

On Mon, 2006-03-27 at 10:20 -0500, Raymond Hettinger wrote:
Why don't we expose _PySet_Next() for Barry and leave it out of the public API for everyone else.
Just so I understand exactly what you mean by "leave it out of the public API", let me ask: are you saying you don't want to document the function? Do you not want to include it in setobject.h? No tests for it? If all you're suggesting is to stick an underscore in front of the name to indicate "it's something special", I could live with that, although I don't see much benefit. -Barry

Barry Warsaw wrote:
We're clearly going in circles here, and it's obvious we're not going to agree.
Would it perhaps help if there were a better API for using the iterator protocol from C code? Something that's as easy to use as the proposed set iterating API, but which uses the general iterator protocol underneath. Greg

We're clearly going in circles here, and it's obvious we're not going to agree.
The fact that PySet_Next() can be used incorrectly is no reason not to include it. [etc]
For what it's worth[1], I think Raymond is absolutely on crack here. [1] Not necessarily very much. There is none of my code in Python, so far as I know. * Simple API: The complexity of an API is not determined by the number of methods in it but by the variety of different things you can ask it to do, and it's not any simpler to have PyObject_CallMethod(x, "foo") PyObject_CallMethod(x, "bar") PyObject_CallMethod(x, "baz") than to have PyObject_foo(x) PyObject_bar(x) PyObject_baz(x) API complexity is measured in brain cells, not in methods. * Ease of making mistakes: The Python API is absolutely stuffed with places where you can go wrong by forgetting about subtle refcounting issues. Sure, it's nice to minimize that pain, but it's never going to be possible to write much code that uses the C API without being alert to such issues. (Incidentally, the more things you have that can only be done by invoking PyObject_CallMethod, the more places you have where you have to assume that arbitrary Python code may have been called and that reference counts may have changed behind your back.) * Duck typing: Yup, supporting duck typing is good. That's why we have an abstract API. There are concrete APIs for all sorts of particular kinds of Python object; it seems pretty clear to me that this isn't a mistake, and that sets should be one such type. Clients get to choose how to trade off the benefits in efficiency, conciseness and clarity from using the concrete API against the benefits in generality from using the abstract one. And when PySet_Add is the obvious way to add items to sets, how much C code using sets is likely to work with things that merely walk and quack like sets, anyway? * Efficiency: Anyone measured this? The mere fact that the overhead of (say) emptying a set using PyObject_CallMethod is O(1) doesn't mean it's insignificant. For many applications the size of your sets is O(1) too. (Often with quite a small implicit constant, too.) -- Gareth McCaughan

Excerpting... On Tue, 2006-03-28 at 14:07 +0000, Gareth McCaughan wrote:
* Simple API:
API complexity is measured in brain cells, not in methods.
* Ease of making mistakes:
The Python API is absolutely stuffed with places where you can go wrong by forgetting about subtle refcounting issues. Sure, it's nice to minimize that pain, but it's never going to be possible to write much code that uses the C API without being alert to such issues.
* Duck typing:
Yup, supporting duck typing is good. That's why we have an abstract API. There are concrete APIs for all sorts of particular kinds of Python object; it seems pretty clear to me that this isn't a mistake, and that sets should be one such type. Clients get to choose how to trade off the benefits in efficiency, conciseness and clarity from using the concrete API against the benefits in generality from using the abstract one.
* Efficiency:
Anyone measured this? The mere fact that the overhead of (say) emptying a set using PyObject_CallMethod is O(1) doesn't mean it's insignificant. For many applications the size of your sets is O(1) too. (Often with quite a small implicit constant, too.)
My sentiments exactly Gareth. Thanks for putting it so much more eloquently than I have. :) -Barry

"Greg Ewing" <greg.ewing@canterbury.ac.nz> wrote in message news:4429F664.3080706@canterbury.ac.nz...
Gareth McCaughan wrote:
For what it's worth[1], I think Raymond is absolutely on crack here.
+1 on a good concrete set API from me, too.
For what it's worth, I think Gareth's crack at Raymond is childish and out of place here. tjr

Terry Reedy wrote: [me:]
For what it's worth[1], I think Raymond is absolutely on crack here.
[Greg Ewing:]
+1 on a good concrete set API from me, too.
[Terry:]
For what it's worth, I think Gareth's crack at Raymond is childish and out of place here.
Er, it wasn't a crack at Raymond, it was a crack at a particular position he's taking on a particular issue. What I intended (but may have failed) to convey was: "Raymond's a clever and sensible chap, and this is a very weird position for a clever and sensible person to be taking: must be the drugs." And, just in case it's still not clear, I wasn't in fact suggesting that Raymond *is* on drugs either. However: if Raymond, or anyone else, is offended, then I'm sorry. Now, what about the technical issues, as opposed to the way I happened to introduce my comments? -- g

Gareth McCaughan wrote:
However: if Raymond, or anyone else, is offended, then I'm sorry. Now, what about the technical issues, as opposed to the way I happened to introduce my comments?
Proposing that a certain API in an open source project is introduced for a single "customer" is indeed a surprising notion, and I don't think it should be done. Either there is a need for the API, in which case it should be added, or there isn't (and the user is mistaken requesting it), then it shouldn't be added. Given that Barry insists so firmly that there is a need, and that this need arises from a significant code simplification that can be achieved through the API, the natural conclusion is to add the API. That, of course, assumes that you believe Barry's testimony. Regards, Martin

On Mar 29, 2006, at 1:38 PM, Martin v. Löwis wrote:
Given that Barry insists so firmly that there is a need, and that this need arises from a significant code simplification that can be achieved through the API, the natural conclusion is to add the API. That, of course, assumes that you believe Barry's testimony.
It doesn't seem to me that there really is a significant code simplification, looking at the diff between Raymond's code examples. @@ -7,2 +7 @@ - PyObject *key; - Py_ssize_t pos = 0; + PyObject *it, *key; @@ -13 +12,5 @@ - while (set_next(so, &pos, &key)) { + it = PyObject_GetIter(self); + if (it == NULL) + return -1; + + while ((key = PyIter_Next(it)) != NULL) { @@ -14,0 +18 @@ + Py_DECREF(key); @@ -16 +20,2 @@ - return -1; + Py_DECREF(it); + return -1; @@ -19,0 +25,3 @@ + Py_DECREF(it); + if (PyErr_Occurred()) + return -1; James

[Gareth McCaughan]
For what it's worth[1], I think Raymond is absolutely on crack here.
Nope. No mind-altering drugs here. Based on real-word experience, I have found PySet_Next() to be a bug factory and do not want it included in the API. The story is different for PySet_Update(). Defining it now could get in the way of possible future development for the module (the function may end-up taking a variable length argument list instead of a single argument). Neither of these proposals are necessary. Both have safe, simple, workable alternatives. It is not my problem if those alternatives do not suit your tastes. A personal aversion to the abstract api is no reason to forgo safety or to interfere with future development of the module. Quality and flexibility considerations trump micro-optimizations and personal style biases. Most of the push has been predicated on being in a snit about the existing iterator API. However, in the course of writing itertools and other Python enchancements, I've had occassion to thoroughly exercise the iterator API and have not found it to be a problem in practice. Raymond

On Wed, 2006-03-29 at 16:29 -0500, Raymond Hettinger wrote:
The story is different for PySet_Update(). Defining it now could get in the way of possible future development for the module (the function may end-up taking a variable length argument list instead of a single argument).
So why not just go ahead and do that now? If you know that's what you want eventually, why wait? From my perspective, adding a NULL at the end of the argument list wouldn't be that big of a deal. -Barry

On Wed, 2006-03-29 at 19:34 -0500, Barry Warsaw wrote:
On Wed, 2006-03-29 at 16:29 -0500, Raymond Hettinger wrote:
The story is different for PySet_Update(). Defining it now could get in the way of possible future development for the module (the function may end-up taking a variable length argument list instead of a single argument).
So why not just go ahead and do that now? If you know that's what you want eventually, why wait? From my perspective, adding a NULL at the end of the argument list wouldn't be that big of a deal.
BTW, I'm willing to do the work on this. I'm already going to update my patch anyway to reflect our current decisions, so I'm happy to do this while I'm at it. I'll try to get a new patch posted in a day or so. -Barry

Barry Warsaw wrote:
On Wed, 2006-03-29 at 16:29 -0500, Raymond Hettinger wrote:
The story is different for PySet_Update(). Defining it now could get in the way of possible future development for the module (the function may end-up taking a variable length argument list instead of a single argument).
Would that really buy you anything much over just making multiple PySet_Update() calls? Is it just syntactic sugar, or is there some optimisation you can do with multiple updates presented all at once? -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiam! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

[Raymond Hettinger]
Barry, go ahead with PySet_Clear().
[Barry] Cool thanks. I think we've also compromised on _PySet_Next(), correct? Yes, _PySet_Next() is a good compromise for you and me -- it saves you from writing a hack and saves my API from including a bug factory. The only issue is that Martin thinks it to be a crummy idea. Personally, I have no problem with putting-in an undocumented hook for daring people who aspire to swim in quicksand ;-) [Raymond Hettinger]
The story is different for PySet_Update(). Defining it now could get in the way of possible future development for the module (the function may end-up taking a variable length argument list instead of a single argument).
So why not just go ahead and do that now? If you know that's what you want eventually, why wait? From my perspective, adding a NULL at the end of the argument list wouldn't be that big of a deal.
[Barry]
BTW, I'm willing to do the work on this. I'm already going to update my patch anyway to reflect our current decisions, so I'm happy to do this while I'm at it. I'll try to get a new patch posted in a day or so.
The idea is not yet ready for prime-time. If I do it for one of the named operations, I will do it for all (to keep the interface uniform). I haven't yet had the time to work-out the math on whether it would be worthwhile and provide some differential advantage over simply repeating the same operation several times over. My research question is whether work can be saved by controlling the order of operations -- the concept is somewhat like optimizing multi-term matrix multiplication where the total work effort can vary dramatically depending on which matrices are multiplied together first, A((BC)D) vs (AB)(CD) vs (A(BC))D etc. Put in business terms, the question is whether I'm able to leverage the associative and commutative properties of some chained set operations. FWIW, the module already has optimizations to take advantage of the commutative property of binary AND, OR, and SYMMETRIC_DIFFERENCE operations. However, the multi-term optimization probably wait until Py2.6 -- it is too experimental for now. Raymond

Raymond Hettinger wrote:
Yes, _PySet_Next() is a good compromise for you and me -- it saves you from writing a hack and saves my API from including a bug factory. The only issue is that Martin thinks it to be a crummy idea.
If it makes everyone happy, I shouldn't be in the way. Of course, it might be that not only Barry will use it, but other people as well. Regards, Martin

On Wed, 2006-03-29 at 23:09 -0500, Raymond Hettinger wrote:
Yes, _PySet_Next() is a good compromise for you and me -- it saves you from writing a hack and saves my API from including a bug factory. The only issue is that Martin thinks it to be a crummy idea. Personally, I have no problem with putting-in an undocumented hook for daring people who aspire to swim in quicksand ;-)
Of course if it was "just" a bug factory I might agree. But since it's instead a powerful tool that can be misused if misunderstood, I'd tend to want to document it and explain to people where and why it might or might not be the right hammer for the nail you're trying to pound in. But that's just me. :)
The idea is not yet ready for prime-time. If I do it for one of the named operations, I will do it for all (to keep the interface uniform).
Which named operations are you thinking of?
I haven't yet had the time to work-out the math on whether it would be worthwhile and provide some differential advantage over simply repeating the same operation several times over. My research question is whether work can be saved by controlling the order of operations -- the concept is somewhat like optimizing multi-term matrix multiplication where the total work effort can vary dramatically depending on which matrices are multiplied together first, A((BC)D) vs (AB)(CD) vs (A(BC))D etc. Put in business terms, the question is whether I'm able to leverage the associative and commutative properties of some chained set operations. FWIW, the module already has optimizations to take advantage of the commutative property of binary AND, OR, and SYMMETRIC_DIFFERENCE operations. However, the multi-term optimization probably wait until Py2.6 -- it is too experimental for now.
Does that mean you want to make sure the function is insanely fast before you'll add it? Shouldn't you instead decide whether there's even a need for vararg update first and then figure out how to optimize it? IOW, if there's a need for vararg update, let's add the API now so that people can start using it, even if it's not as fast as it could be. Then they'll be especially grateful when you figure out how to make it insanely fast in Python 2.6. If vararg update isn't useful, then there's no point in adding the API, even if it can be made insanely fast. You'd just be wasting your time because no one would use it. It seems backwards to design the implementation first and then the API. An API represents how you want people to use your objects, what operations and semantics you want it to have, what contracts you're guaranteeing and so on. Optimization then is a very nice side benefit. Let me ask this: if you can't make vararg PySet_Update() insanely fast, does that mean you won't add a vararg version? Or you won't add the function at all? I'm all for making things fast, but I just don't believe that in general that should be the primary driver for how you want people to /use/ your objects. -Barry

The idea is not yet ready for prime-time. If I do it for one of the named operations, I will do it for all (to keep the interface uniform).
Which named operations are you thinking of?
s.union(t), s.intersection(t), s.difference(t), s.symmetric_difference(t), s.update(t), s.intersection_update(t), s.difference_update(t), s.symmetric_difference_update(t)
if you can't make vararg PySet_Update() insanely fast, does that mean you won't add a vararg version?
Right. Please leave this one alone. I still need to work on this part of the API and do not currently have the spare clock cycles to do it right now. You don't HAVE to jam something in right away. Please let it continue to cook and not muck it up through over-enthusiasm. If I had time to explain/debate every darned aspect of what is under consideration, then it would have been done already. The fierce insistence for the patch is pre-mature and is grotesquely out-of-proportion to any potential benefit. Please do not jam this one down my throat -- the function is not necessary to have right away -- you're talking about nanoseconds of efficiency and a microscopically shorter call. Sorry, I need to stop wasting time on this thread. It has consumed far too much development time already. Please write a one-line macro for yourself and leave this alone so I can continue the development efforts at a thoughtful pace. Raymond

On Thu, 2006-03-30 at 13:09 -0500, Raymond Hettinger wrote:
Please leave this one alone. I still need to work on this part of the API and do not currently have the spare clock cycles to do it right now. You don't HAVE to jam something in right away. Please let it continue to cook and not muck it up through over-enthusiasm. If I had time to explain/debate every darned aspect of what is under consideration, then it would have been done already. The fierce insistence for the patch is pre-mature and is grotesquely out-of-proportion to any potential benefit. Please do not jam this one down my throat -- the function is not necessary to have right away -- you're talking about nanoseconds of efficiency and a microscopically shorter call. Sorry, I need to stop wasting time on this thread. It has consumed far too much development time already. Please write a one-line macro for yourself and leave this alone so I can continue the development efforts at a thoughtful pace.
As per your comment in patch 1458476, I will add _PySet_Update() and consider this thread closed. -Barry

On Sat, 2006-03-25 at 22:05 -0500, Raymond Hettinger wrote:
Still, PyObject_Clear(s) would be better.
Although not ideal, in the interest of compromise, I could support this option. There's a problem with this though: I don't think you want to be able to clear a frozen set. My PySet_Clear() raises a SystemError and returns -1 when the object is a frozen set. If PyObject_Clear() is implemented something like int PyObject_Clear(PyObject *o) { return (o->ob_type->tp_clear ? o->ob_type->tp_clear(o) : -1); } then you /would/ be able to clear a frozen set. For that matter, it would be the case that any immutable collection would be clearable if it had a tp_clear (which it probably would). That isn't the semantics I'd expect though. That may not be solvable unless you make PyObject_Clear() an alias for PyObject_CallMethod("clear"). Although I'm sure you'll disagree, I think this is less than ideal. For one thing, you're requiring objects that work with PyObject_Clear() to implement an exact Python-level protocol (it must have a method, it must be called "clear" and it must take zero arguments). You also have to implement PyObject_Clear() with a hasattr test, because I don't think you want PyObject_Clear() raising AttributeErrors. That raises the constant overhead cost, which can make clearing small sets more expensive.
Better still would be to examine the actual uses in the app. I suspect that most code that clears a set and then rebuilds it would be better-off starting with a new empty set (and because of freelisting, that is a very fast operation).
That may not be possible. Imagine a complex application where the set is passed through many layers of calls. The set hangs off of other application level objects which you don't have access to at the point where you're deciding whether to clear the set or not. You can't create a new set because you have no way to pass the new set back to the several application level objects that would need to get their pointers updated. So the most obvious, simple approach is to just clear the set you have right there.
Likewise, it only takes a one-line header to define BarrySet_Update(s). I do not want that part of the C API exposed yet. It is still under development and may eventually become a function with a variable length argument list.
Really? That would be odd and not at all parallel with established convention (e.g. PyDict_Update()). I would think that a vararg update should be named something different in order to preserve the principle of least surprise.
If you're dead-set against using the iterator API, then maybe there is something wrong with the API. You should probably start a new thread on why you detest the iterator API and see if there are ways to improve it.
I'm not saying there's anything wrong with the iterator API, I'm saying that it's not always appropriate. It's the nail/hammer argument. But I ran out of clever when I tried to propose the simplest, most direct fix for our most pressing issues, so I'm not going to take the bait.
You talk about duck typing, but I don't care about that here.
It's one of the virtues of Python that gets reflected in the abstract API. IMO, it's nice that PyObject_Dir(o) corresponds to "dir(o)" and the same for hash(o), repr(o), etc. I just hope that by hardwiring data types in stone, that your app doesn't become rigid and impossible to change. I certainly do not recommend that other people adopt this coding style (avoidance of iterators, duplication of abstact api functions in concrete form, etc.) If you're experiencing debugging pain, it may be that avoidance of abstraction is the root cause.
Trust me Raymond, it's not the cause. I keep trying to explain this but I must be completely inept because you're just not getting it. Let me try this way: we're using Python's collection types (sets, lists, dicts) as our fundamental collection data structures internally in our application. There's no duck typing going on. There's no need for abstraction because we know exactly what we have and there's no chance we'll have something that smells like a set that isn't exactly a PySet. As I've said many times, I'm all for an abstract API because it's darn useful in many applications. It's the lack of a concrete API that is limiting.
I wouldn't object to that, but it wouldn't change my mind about PySet_Clear().
This is plain evidence that something is wrong with your approach. While possibly necessary in your environment, the rest of mankind should not have to stomach this kind of API clutter.
Please, that's a bit extreme. I haven't heard anybody scream about the PyDict's API clutter and I don't see my PySet proposal as being any different. -Barry

Barry Warsaw wrote:
My PySet_Clear() raises a SystemError and returns -1 when the object is a frozen set.
Isn't SystemError a bit drastic? TypeError would be sufficient here, surely.
If PyObject_Clear() is implemented something like
int PyObject_Clear(PyObject *o) { return (o->ob_type->tp_clear ? o->ob_type->tp_clear(o) : -1); }
then you /would/ be able to clear a frozen set.
Hmmm, the problem here, I think, is that tp_clear is really only designed for use by the garbage collector. Giving anything else access to it is probably wrong. Clearability is not a general feature in Python land -- a few types have a clear() method, but this is an ad hoc feature of the type concerned. I don't think it makes sense to have a general PyObject_Clear function at all. -- Greg

Greg Ewing wrote:
Hmmm, the problem here, I think, is that tp_clear is really only designed for use by the garbage collector. Giving anything else access to it is probably wrong.
Clearability is not a general feature in Python land -- a few types have a clear() method, but this is an ad hoc feature of the type concerned. I don't think it makes sense to have a general PyObject_Clear function at all.
I agree. Barry's PySet_Clear and Raymond's PyObject_Clear would be two completely unrelated functions (one invoking the "clear" method, the other invoking tp_clear). Regards, Martin

On Tue, 2006-03-28 at 17:28 +1200, Greg Ewing wrote:
Barry Warsaw wrote:
My PySet_Clear() raises a SystemError and returns -1 when the object is a frozen set.
Isn't SystemError a bit drastic? TypeError would be sufficient here, surely.
Possibly, but all the other PySet_*() functions call PyErr_BadInternalCall() when they get a type they don't accept, so PySet_Clear() should be consistent.
If PyObject_Clear() is implemented something like
int PyObject_Clear(PyObject *o) { return (o->ob_type->tp_clear ? o->ob_type->tp_clear(o) : -1); }
then you /would/ be able to clear a frozen set.
Hmmm, the problem here, I think, is that tp_clear is really only designed for use by the garbage collector. Giving anything else access to it is probably wrong.
Exactly.
Clearability is not a general feature in Python land -- a few types have a clear() method, but this is an ad hoc feature of the type concerned. I don't think it makes sense to have a general PyObject_Clear function at all.
I'm thinking the same thing, which is why I'm now favoring PySet_Clear() again. -Barry

On Saturday 18 March 2006 18:48, Neal Norwitz wrote:
Just in case anybody here's been snoozing, 2.5 alpha 1 is coming up real quick, hopefully within a couple of weeks. If you have any *major* features (particularly implemented in C) that you want to see in 2.5, bring it up now. I want to strive for feature completeness by alpha 1. I know we will have some .py modules that won't make it into alpha 1, but they really should make it in by alpha 2 or be deferred to 2.6.
+1. We shouldn't be making feature changes once we hit beta. I'd still like to push 2.4.3rc1 out in a couple of days time, with 2.4.3 final next week, and then maybe aim for 2.5a1 a week or two later? How does that work for everyone? Anthony -- Anthony Baxter <anthony@interlink.com.au> It's never too late to have a happy childhood.

On Monday 20 March 2006 00:49, Anthony Baxter wrote:
I'd still like to push 2.4.3rc1 out in a couple of days time, with 2.4.3 final next week, and then maybe aim for 2.5a1 a week or two later? How does that work for everyone?
I should be fine to build the documentation Wednesday night (US Eastern time). -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
participants (18)
-
"Martin v. Löwis"
-
Aahz
-
Alex Martelli
-
Alex Martelli
-
Anthony Baxter
-
Barry Warsaw
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Gareth McCaughan
-
Giovanni Bajo
-
Greg Ewing
-
Guido van Rossum
-
James Y Knight
-
Josiah Carlson
-
Neal Norwitz
-
Raymond Hettinger
-
Raymond Hettinger
-
Terry Reedy