[Python-3000] PEP 3137 plan of attack

Wed Oct 10 20:08:20 CEST 2007

On 10/10/07, Christian Heimes <lists at cheimes.de> wrote:
> Guido van Rossum wrote:
> > > The tasks I can think of are:
> [...]
>
> (Resend, the first mail didn't make it and I forgot a point)
>
> While I was working on a patch for the renaming of bytes and str8 I
> found some open issues that need to be discussed and addressed:
>
> - Create an iterator view for PyBytes. The buffer object doesn't have a
> view for iteration like bytes have with PyStringIter_Type. Guido said he
> wants a view to play nice with the Sequence ABC.

Right. Though it is a minor point and can be done later.

> - Should bytes (PyString_Type) subclass from basestring? It doesn't feel
> quite right to me. I think we could remove basestring completely if
> bytes doesn't subclass from it.

Definitely not. basestring is for text strings. We could even decide
to remove it; we should instead have ABCs for this purpose.

> - Do we need a common base type for bytes and buffer like e.g. basebytes?

We can deal with that in abc.py as well, using virtual inheritance
(the .register() method).

> - The new bytes type (formally known as str8 / PyString_Type) still has

You mean 'formerly', not 'formally' :-) I prefer to just call these by
their C names (PyString) to be precise, as the C names aren't changing
(at least not yet ;-).

> a bunch of methods from its original Python 2.x parent:
>
> ['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
> '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
> '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__',
> '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__',
> '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__',
> '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count',
> 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index',
> 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle',
> 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace',
> 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split',
> 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate',
> 'upper', 'zfill']
>
> Should any of these methods be removed?

No, that's spelled out in the PEP. Those should all stay. (If you see
a method that's not listed in the PEP, ask me about it before deleting
it. :-)

> - PyString still excepts unicode in a lot of places and some important
> parts of Python still require it. The interpreter was f... up as I
> removed unicode support from functions like PyString_Size and
> PyString_AsString. I'm not sure which function is causing trouble. The
> error message was an exception bootstrapping error because
> PyImport_ImportModule("__builtin__") failed. Should these methods still
> accept unicode and convert it with the default encoding?

Several people have noted the same issue. My goal is to remove this
behavior completely. I don't know how much it will take; these
bootstrap issues are always hard to debug and sometimes hard to fix.

I am looking into this a bit right now; I suspect it's got to do with
some types that still return a PyString from their repr(). I noticed
that even removing .encode() from PyString breaks about 5 tests.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)