[Python-Dev] bytes / unicode

Nick Coghlan ncoghlan at gmail.com
Wed Jun 23 12:58:00 CEST 2010


On Wed, Jun 23, 2010 at 7:18 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> Note that the point of using a builtin method was to get
> better performance. Such type adaptions are often needed in
> loops, so adding a few extra Python function calls just to
> convert a str object to a bytes object or vice-versa is a
> bit much overhead.

I actually agree with that, I just think we need more real world
experience as to what works with the Python 3 text model before we
start messing with the APIs for the builtin objects (fair point that
"coerce" is a loaded term given the existence of the old coercion
protocol. It's the right word for the task though).

One of the key points coming out of this thread (to my mind) is the
lack of a Text ABC or other way of making an object that can be passed
to functions expecting a str instance with a reasonable expectation of
having it work. Are there some core string capabilities that can be
identified and then expanded out to a full str-compatible API? (i.e.
something along the lines of what collections.MutableMapping now
provides for dict-alikes).

However, even if something like that was added, PJE is correct in
pointing out that builtin strings still don't play well with others in
many cases (usually due to underlying optimisations or other sound
reasons, but perhaps sometimes gratuitously). Most of the string
binary operations can be dealt with through their reflected forms, but
str.__mod__ will never return NotImplemented, __contains__ has no
reflected form and the actual method calls are of course right out
(e.g. the arguments to str.join() or str.split() calls have no ability
to affect the type of the result).

Third party number implementations couldn't provide comparable
funtionality to builtin int and long objects until the __index__
protocol was added. Perhaps PJE is right that what this is really
crying out for is a way to have third party "real string"
implementations so that there can actually be genuine experimentation
in the Unicode handling space outside the language core (comparable to
the difference between the "you can turn me into an int" __int__
method and the "I am an int equivalent" __index__ method).

That may be tapping in a nail with a sledgehammer (and would raise
significant moratorium questions if pursued further), but I think it's
a valid question to at least ask.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list