Re: [Python-Dev] Iterable String Redux (aka String ABC)

Bill Janssen wrote:
Look, even if there were *no* additional methods, it's worth adding the base class, just to differentiate the class from the Sequence, as a marker, so that those of us who want to ask "isinstance(o, String)" can do so.
Doesn't isinstance(x, basestring) already cover that? -- Greg

Greg Ewing schrieb:
Bill Janssen wrote:
Look, even if there were *no* additional methods, it's worth adding the base class, just to differentiate the class from the Sequence, as a marker, so that those of us who want to ask "isinstance(o, String)" can do so.
Doesn't isinstance(x, basestring) already cover that?
That doesn't cover UserString, for example. Georg

Greg Ewing schrieb:
Georg Brandl wrote:
Greg Ewing schrieb:
Doesn't isinstance(x, basestring) already cover that?
That doesn't cover UserString, for example.
A better solution to that might be to have UserString inherit from basestring.
But with that argument you could throw out the whole ABC machinery, just let all lists inherit from list, all dicts from dict, etc. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Georg Brandl wrote:
Greg Ewing schrieb:
A better solution to that might be to have UserString inherit from basestring.
But with that argument you could throw out the whole ABC machinery, just let all lists inherit from list, all dicts from dict, etc.
Well, I'm skeptical about the whole ABC thing in the first place -- it all seems very unpythonic to me. But another way of thinking about it is that we already have an ABC of sorts for strings, and it's called basestring. It might be better to enhance that with whatever's considered missing than introducing another one. -- Greg

Well, I'm skeptical about the whole ABC thing in the first place -- it all seems very unpythonic to me. I think it's very pythonic and the very best solution to interfaces *and* duck typing. Not only does it extend duck-typing in a very, very cool way but also does it provide a very cool way to get custom sets or lists going with few extra work. Subclassing builtins was always very painful in the
But another way of thinking about it is that we already have an ABC of sorts for strings, and it's called basestring. It might be better to enhance that with whatever's considered missing than introducing another one.
Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes: past and many used the User* objects which however often broke because some code did something like isinstance(x, (tuple, list)). Of course one could argue that instance checking is the root of all evil but there are situations where you have to do instance checking. And ABCs are the perfect solution for that as they combine duck-typing and instance checking. In my oppinion ABCs are the best feature of 2.6 and 3.0. basestring is not subclassable for example. Also it requires subclassing which ABCs do not. Regards, Armin

Armin Ronacher wrote:
Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
Well, I'm skeptical about the whole ABC thing in the first place -- it all seems very unpythonic to me.
I think it's very pythonic and the very best solution to interfaces *and* duck typing. Not only does it extend duck-typing in a very, very cool way but also does it provide a very cool way to get custom sets or lists going with few extra work. Subclassing builtins was always very painful in the past and many used the User* objects which however often broke because some code did something like isinstance(x, (tuple, list)). Of course one could argue that instance checking is the root of all evil but there are situations where you have to do instance checking. And ABCs are the perfect solution for that as they combine duck-typing and instance checking.
In my oppinion ABCs are the best feature of 2.6 and 3.0.
But another way of thinking about it is that we already have an ABC of sorts for strings, and it's called basestring. It might be better to enhance that with whatever's considered missing than introducing another one.
basestring is not subclassable for example. Also it requires subclassing which ABCs do not.
I would be strongly +1 on a string ABC. Currently (to my knowledge) there is no way of using duck typing for built-in APIs that expect a string. How do I pass in an object to 'open' for example that isn't actually a string or subclass?
class X(object): ... def __unicode__(self): ... return 'fish' ... __str__ = __repr__ = __unicode__ ... x = X() open(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: coercing to Unicode: need string or buffer, X found unicode(x) u'fish'
Michael Foord
Regards, Armin
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.ironpythoninaction.com/ http://www.theotherdelia.co.uk/ http://www.voidspace.org.uk/ http://www.ironpython.info/ http://www.resolverhacks.net/

Michael Foord wrote:
I would be strongly +1 on a string ABC. Currently (to my knowledge) there is no way of using duck typing for built-in APIs that expect a string. How do I pass in an object to 'open' for example that isn't actually a string or subclass?
Implement the character buffer API, which you can't actually do from Python code :P Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

On Sat, 31 May 2008 12:48:41 am Armin Ronacher wrote:
Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
Well, I'm skeptical about the whole ABC thing in the first place -- it all seems very unpythonic to me.
I think it's very pythonic and the very best solution to interfaces *and* duck typing. Not only does it extend duck-typing in a very, very cool way
I'm with Greg on this one: despite the assertions made in the PEP, I don't see how ABC can fail to be anything but anti-duck-typing. How does it extend duck-typing? Can you give an example?
but also does it provide a very cool way to get custom sets or lists going with few extra work. Subclassing builtins was always very painful in the past
"Always" very painful? class ListWithClear(list): def clear(self): self[:] = self.__class__() Not so very painful to me. Maybe I just have more pain-tolerance than some people.
and many used the User* objects which however often broke because some code did something like isinstance(x, (tuple, list)). Of course one could argue that instance checking is the root of all evil
Perhaps not the root of *all* evil but it is certainly the root of much evil, and the treatment of delegation-based classes like UserString as second-class objects is a good example of why isinstance checking should be avoided as much as possible. -- Steven

Steven D'Aprano schrieb:
but also does it provide a very cool way to get custom sets or lists going with few extra work. Subclassing builtins was always very painful in the past
"Always" very painful?
class ListWithClear(list): def clear(self): self[:] = self.__class__()
Not so very painful to me. Maybe I just have more pain-tolerance than some people.
Sure, nobody said that adding another method is a problem. But overriding methods like __getitem__() and having them used in other methods that derive from it (like get()) is impossible without resorting to UserDict, which in turn doesn't inherit from dict. ABCs unify these possibilities. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

ISTM, the whole reason people are asking for a String ABC is so you can write isinstance(obj, String) and allow registered string-like objects to be accepted. The downside is that everytime you want this for a concrete class or type, it is necessary to write a whole new ABC listing all of the required methods. Also, you have to change all of the client code's isinstance tests from concrete to abstract. I propose a simpler approach. Provide an alternative registration function that overrides isinstance() so that objects can explicitly fake any concrete type: s = UserString('whiffleball') print isinstance(s, str) --> False register_lookalike(UserString, str) print isinstance(s, str) --> True Besides saving us from writing tons of new ABCs, the approach works with existing code that already uses isinstance() with concrete types. The ABCs that would remain are ones that are truly abstract, that define a generic interface (like mappings and sequences) and ones that offer some useful mixin behavior. The remaining ABCs are ones where you have a fighting chance of actually being able to implement the interface (unlike String where it would be darned tricky to fully emulate encode(), split(), etc.) This would completely eliminate the need for numbers.Integral for example. AFAICT, its sole use case is to provide a way for numeric's integral types (like int8, int32) to pass themselves off as having the same API as regular ints. Unfortunately, the current approach requires all consumer code to switch from isinstance(x,int) to isinstance(x,Integral). It would be more useful if we could simply write register_lookalike(x,int) and be done with it (no need for numbers.py and its attendant abc machinery). If we don't do this, then String won't be the last request. People will want Datetime for example. Pretty much any concrete type could have a look-a-like that wanted its own ABC and for all client code to switch from testing concrete types. Raymond

I'm willing to meet you halfway. I really don't want isinstance(x, str) to return True for something that doesn't inherit from the concrete str type; this is bound to lead to too much confusion and breakage. But I'm fine with a String ABC (or any other ABC, e.g. Atomic?) that doesn't define any methods but can be used for type testing. How about that? --Guido On Sat, May 31, 2008 at 5:25 AM, Raymond Hettinger <python@rcn.com> wrote:
ISTM, the whole reason people are asking for a String ABC is so you can write isinstance(obj, String) and allow registered string-like objects to be accepted.
The downside is that everytime you want this for a concrete class or type, it is necessary to write a whole new ABC listing all of the required methods. Also, you have to change all of the client code's isinstance tests from concrete to abstract.
I propose a simpler approach. Provide an alternative registration function that overrides isinstance() so that objects can explicitly fake any concrete type:
s = UserString('whiffleball') print isinstance(s, str) --> False register_lookalike(UserString, str) print isinstance(s, str) --> True
Besides saving us from writing tons of new ABCs, the approach works with existing code that already uses isinstance() with concrete types.
The ABCs that would remain are ones that are truly abstract, that define a generic interface (like mappings and sequences) and ones that offer some useful mixin behavior. The remaining ABCs are ones where you have a fighting chance of actually being able to implement the interface (unlike String where it would be darned tricky to fully emulate encode(), split(), etc.)
This would completely eliminate the need for numbers.Integral for example. AFAICT, its sole use case is to provide a way for numeric's integral types (like int8, int32) to pass themselves off as having the same API as regular ints. Unfortunately, the current approach requires all consumer code to switch from isinstance(x,int) to isinstance(x,Integral). It would be more useful if we could simply write register_lookalike(x,int) and be done with it (no need for numbers.py and its attendant abc machinery).
If we don't do this, then String won't be the last request. People will want Datetime for example. Pretty much any concrete type could have a look-a-like that wanted its own ABC and for all client code to switch from testing concrete types.
Raymond
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

From: "Guido van Rossum" <guido@python.org>
I'm willing to meet you halfway. I really don't want isinstance(x, str) to return True for something that doesn't inherit from the concrete str type; this is bound to lead to too much confusion and breakage.
Probably true. It was an attractive idea though. Unless all client code converts all its isinstance() tests from concrete to abstract, life is going to be tough for people writing look-alike classes which will have limited applicability.
But I'm fine with a String ABC (or any other ABC, e.g. Atomic?) that doesn't define any methods but can be used for type testing. How about that?
That's progress! It makes abstract substitution possible while still saving us a lot of code and avoiding over-specification. I propose the following empty abstract classes: String, Datetime, Deque, and Socket. -1 on Atomic though. Earlier in the thread it was made clear that that atomicity is not an intrinsic property of a type; instead it varies across applications (when flattening email folders, a multi-part mime message is atomic, but when flattening individual messages, a multi-part mime message is just another nested container, part of the tree, not one of the leaves). Are you open to considering numbers.Integral to be one of the new empty abstract classes? That would make it much easier for objects wanting to pass themselves as integers. As it stands now, an aspiring Integral class is required to implement a number of arcana including: __rxor__, __rrshift__, __pow__, __invert__, __index__, and __long__. Raymond

On Sat, May 31, 2008 at 6:41 PM, Raymond Hettinger <python@rcn.com> wrote:
From: "Guido van Rossum" <guido@python.org>
I'm willing to meet you halfway. I really don't want isinstance(x, str) to return True for something that doesn't inherit from the concrete str type; this is bound to lead to too much confusion and breakage.
Probably true. It was an attractive idea though. Unless all client code converts all its isinstance() tests from concrete to abstract, life is going to be tough for people writing look-alike classes which will have limited applicability.
I'd rather require that people rewrite their code to benefit from some new piece of functionality than foisting it upon them regardless, breaking some perfectly fine working in the process. This is how we've always done it.
But I'm fine with a String ABC (or any other ABC, e.g. Atomic?) that doesn't define any methods but can be used for type testing. How about that?
That's progress! It makes abstract substitution possible while still saving us a lot of code and avoiding over-specification. I propose the following empty abstract classes: String, Datetime, Deque, and Socket.
Sounds like a mini-PEP is in place. It should focus on the code to actually define these and the intended ways to use them.
-1 on Atomic though. Earlier in the thread it was made clear that that atomicity is not an intrinsic property of a type; instead it varies across applications (when flattening email folders, a multi-part mime message is atomic, but when flattening individual messages, a multi-part mime message is just another nested container, part of the tree, not one of the leaves).
Fine, it was just an idle thought.
Are you open to considering numbers.Integral to be one of the new empty abstract classes? That would make it much easier for objects wanting to pass themselves as integers. As it stands now, an aspiring Integral class is required to implement a number of arcana including: __rxor__, __rrshift__, __pow__, __invert__, __index__, and __long__.
I don't think Integer should be completely abstract (what good is an int you can't add 1 to?) but I could be amenable to reducing the set of required operations (which could then resurface as a separate ABC). Please write another mini-PEP. Where did you see __long__? That seems a mistake (at least in 3.0). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

[Raymond]
I propose the following empty abstract classes: String, Datetime, Deque, and Socket.
[GvR]
Sounds like a mini-PEP is in place. It should focus on the code to actually define these and the intended ways to use them.
Okay, will run a Google code search to see if real code exists that runs isinstance tests on the concrete types. Since the new classes are very lightweight (completely empty), these probably only need minimal justification. The case for String has already been made. And the concept of a Socket is already fully abstract. Not sure I really care about Deque. The Datetime.class is tricky. The existence of many implementations of date/time routines indicates that there is a need; however, they don't share the API so they likely won't fit under a common umbrella.
Are you open to considering numbers.Integral to be one of the new empty abstract classes? That would make it much easier for objects wanting to pass themselves as integers. As it stands now, an aspiring Integral class is required to implement a number of arcana including: __rxor__, __rrshift__, __pow__, __invert__, __index__, and __long__.
I don't think Integer should be completely abstract (what good is an int you can't add 1 to?) but I could be amenable to reducing the set of required operations (which could then resurface as a separate ABC). Please write another mini-PEP.
Okay. Will propose to remove the bit flipping methods and anything else that doesn't seem essential to integeriness. Will take a look at the integral types in numeric to see what that actually implement.
Where did you see __long__? That seems a mistake (at least in 3.0).
It's the first listed abstract method in the Py2.6 code. Raymond

On Sat, May 31, 2008 at 8:09 PM, Raymond Hettinger <python@rcn.com> wrote:
[Raymond]
I propose the following empty abstract classes: String, Datetime, Deque, and Socket.
[GvR]
Sounds like a mini-PEP is in place. It should focus on the code to actually define these and the intended ways to use them.
Okay, will run a Google code search to see if real code exists that runs isinstance tests on the concrete types.
I wasn't asking for existing code -- I was asking for the code you propose to add to abc.py (or wherever).
Since the new classes are very lightweight (completely empty), these probably only need minimal justification.
Again, in a mini-PEP I'm not so much looking for justification but for a precise spec.
The case for String has already been made.
Actually I'm not sure. One you know that isinstance(x, String) is true, what can you assume you can do with x?
And the concept of a Socket is already fully abstract.
Can you elaborate? There's a very specific API that is assumed of sockets. The code that creates sockets is usually pretty close to the code that consumes it. There are some major classes that cut right through the APIs though: connection or listening (the latter being something on which you call accept()), stream or datagram, and as a special case of stream SSL and the like.
Not sure I really care about Deque. The Datetime.class is tricky. The existence of many implementations of date/time routines indicates that there is a need; however, they don't share the API so they likely won't fit under a common umbrella.
Right. I'm now beginning to wonder what exactly you're after here -- saying that something is an "X" without saying anything about an API isn't very useful. You need to have at least *some* API to be able to do anything with that knowledge.
Are you open to considering numbers.Integral to be one of the new empty abstract classes? That would make it much easier for objects wanting to pass themselves as integers. As it stands now, an aspiring Integral class is required to implement a number of arcana including: __rxor__, __rrshift__, __pow__, __invert__, __index__, and __long__.
I don't think Integer should be completely abstract (what good is an int you can't add 1 to?) but I could be amenable to reducing the set of required operations (which could then resurface as a separate ABC). Please write another mini-PEP.
Okay. Will propose to remove the bit flipping methods and anything else that doesn't seem essential to integeriness. Will take a look at the integral types in numeric to see what that actually implement.
Where did you see __long__? That seems a mistake (at least in 3.0).
It's the first listed abstract method in the Py2.6 code.
That actually makes sense -- correct interoperability with longs probably requires that. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

2008/6/1 Guido van Rossum <guido@python.org>:
The case for String has already been made.
Actually I'm not sure. One you know that isinstance(x, String) is true, what can you assume you can do with x? [...] Right. I'm now beginning to wonder what exactly you're after here -- saying that something is an "X" without saying anything about an API isn't very useful. You need to have at least *some* API to be able to do anything with that knowledge.
Apologies to Raymond if I'm putting words into his mouth, but I think it's more about *not* doing things with the type - a String is a Sequence that we don't wish to iterate through (in the flatten case), so the code winds up looking like for elem in seq: if isinstance(elem, Sequence) and not isinstance(elem, String): recurse into the element else: deal with the element as atomic This implies that other "empty" abstract types aren't useful, though, as they are not subclasses of anything else... Paul.

On Sun, Jun 1, 2008 at 6:57 AM, Paul Moore <p.f.moore@gmail.com> wrote:
2008/6/1 Guido van Rossum <guido@python.org>:
The case for String has already been made.
Actually I'm not sure. One you know that isinstance(x, String) is true, what can you assume you can do with x? [...] Right. I'm now beginning to wonder what exactly you're after here -- saying that something is an "X" without saying anything about an API isn't very useful. You need to have at least *some* API to be able to do anything with that knowledge.
Apologies to Raymond if I'm putting words into his mouth, but I think it's more about *not* doing things with the type - a String is a Sequence that we don't wish to iterate through (in the flatten case), so the code winds up looking like
for elem in seq: if isinstance(elem, Sequence) and not isinstance(elem, String): recurse into the element else: deal with the element as atomic
I thought that was he meant too, until he said he rejected my offhand suggestion of Atomic with these words: "Earlier in the thread it was made clear that that atomicity is not an intrinsic property of a type; instead it varies across applications [...]"
This implies that other "empty" abstract types aren't useful, though, as they are not subclasses of anything else...
There's a thread on this out now I believe. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Raymond Hettinger wrote:
If we don't do this, then String won't be the last request. People will want Datetime for example. Pretty much any concrete type could have a look-a-like that wanted its own ABC and for all client code to switch from testing concrete types.
If I remember rightly, the machinery in the ABC's to support registration slows down some other operations with the types - do we want to pay that price all the time? If we do, then it may be a matter of moving some of the registration machinery from ABCMeta up into type itself. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

Armin Ronacher wrote:
basestring is not subclassable for example. Also it requires subclassing which ABCs do not.
The use case that was cited was recognising subclasses of UserString, and that's what I was responding to. If basestring were made subclassable and UserString inherited from it, that use case would be covered. Recognising string-like objects *without* requiring subclassing is a hopeless morass to get into, in my opinion. You'll just have endless arguments about which of the zillion methods of str should be in the blessed set which confers string-ness. I also think that the ABC idea in general suffers from that problem, to one degree or another depending on the class involved. Strings are just an extreme case. -- Greg

On Sun, Jun 1, 2008 at 3:54 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
The use case that was cited was recognising subclasses of UserString, and that's what I was responding to. If basestring were made subclassable and UserString inherited from it, that use case would be covered.
UserString intentionally doesn't subclass basestring. When basestring was introduced, it was specifically meant to be the base class of *only* str and unicode. There are quite a few core APIs that accept no substitutes, and being an instance of basestring was intended to guarantee that a value is accepted by such APIs. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
There are quite a few core APIs that accept no substitutes, and being an instance of basestring was intended to guarantee that a value is accepted by such APIs.
In that case, the idea of a user-defined string class that doesn't inherit from str or unicode seems to be a lost cause, since it will never be acceptable in those places, whatever is done with ABCs. -- Greg
participants (9)
-
Armin Ronacher
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
Michael Foord
-
Nick Coghlan
-
Paul Moore
-
Raymond Hettinger
-
Steven D'Aprano