[Python-3000] PEP 3137 plan of attack
Guido van Rossum
guido at python.org
Wed Oct 10 20:08:20 CEST 2007
On 10/10/07, Christian Heimes <lists at cheimes.de> wrote:
> Guido van Rossum wrote:
> > > The tasks I can think of are:
> (Resend, the first mail didn't make it and I forgot a point)
> While I was working on a patch for the renaming of bytes and str8 I
> found some open issues that need to be discussed and addressed:
> - Create an iterator view for PyBytes. The buffer object doesn't have a
> view for iteration like bytes have with PyStringIter_Type. Guido said he
> wants a view to play nice with the Sequence ABC.
Right. Though it is a minor point and can be done later.
> - Should bytes (PyString_Type) subclass from basestring? It doesn't feel
> quite right to me. I think we could remove basestring completely if
> bytes doesn't subclass from it.
Definitely not. basestring is for text strings. We could even decide
to remove it; we should instead have ABCs for this purpose.
> - Do we need a common base type for bytes and buffer like e.g. basebytes?
We can deal with that in abc.py as well, using virtual inheritance
(the .register() method).
> - The new bytes type (formally known as str8 / PyString_Type) still has
You mean 'formerly', not 'formally' :-) I prefer to just call these by
their C names (PyString) to be precise, as the C names aren't changing
(at least not yet ;-).
> a bunch of methods from its original Python 2.x parent:
> ['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
> '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
> '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__',
> '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__',
> '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__',
> '__rmul__', '__setattr__', '__str__', 'capitalize', 'center', 'count',
> 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index',
> 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle',
> 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace',
> 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split',
> 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate',
> 'upper', 'zfill']
> Should any of these methods be removed?
No, that's spelled out in the PEP. Those should all stay. (If you see
a method that's not listed in the PEP, ask me about it before deleting
> - PyString still excepts unicode in a lot of places and some important
> parts of Python still require it. The interpreter was f... up as I
> removed unicode support from functions like PyString_Size and
> PyString_AsString. I'm not sure which function is causing trouble. The
> error message was an exception bootstrapping error because
> PyImport_ImportModule("__builtin__") failed. Should these methods still
> accept unicode and convert it with the default encoding?
Several people have noted the same issue. My goal is to remove this
behavior completely. I don't know how much it will take; these
bootstrap issues are always hard to debug and sometimes hard to fix.
I am looking into this a bit right now; I suspect it's got to do with
some types that still return a PyString from their repr(). I noticed
that even removing .encode() from PyString breaks about 5 tests.
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000