[Python-3000] PEP 3137 plan of attack
lists at cheimes.de
Wed Oct 10 21:08:27 CEST 2007
Guido van Rossum wrote:
> Definitely not. basestring is for text strings. We could even decide
> to remove it; we should instead have ABCs for this purpose.
I'm going to provide a patch which rips basestring out, k? Somebody has
to write a fixer for 2to3 which replaces code like isinstance(egg,
basestring) with isinstance(egg, str).
> You mean 'formerly', not 'formally' :-) I prefer to just call these by
> their C names (PyString) to be precise, as the C names aren't changing
> (at least not yet ;-).
Oh, formerly ... right. The current state of the names is very
confusing. It's going to cost me some cups of coffee.
str - PyUnicode
bytes - PyString
buffer - PyBytes
> No, that's spelled out in the PEP. Those should all stay. (If you see
> a method that's not listed in the PEP, ask me about it before deleting
> it. :-)
Doh, I should have read the PEP again before asking the question.
I've a question about one point. The PEP states "They accept anything
that implements the PEP 3118 buffer API for bytes arguments, and return
the same type as the object whose method is called ("self")". Which
types do implement the buffer API? PyString, PyBytes but not PyUnicode?
For now the PyString takes PyUnicode objects are argument and vice versa
but PyBytes doesn't take unicode. Do I understand correctly that
PyString must not accept PyUnicode?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: can't use str as char buffer
> Several people have noted the same issue. My goal is to remove this
> behavior completely. I don't know how much it will take; these
> bootstrap issues are always hard to debug and sometimes hard to fix.
I tried to debug and fix it but I gave up after half an hour.
> I am looking into this a bit right now; I suspect it's got to do with
> some types that still return a PyString from their repr(). I noticed
> that even removing .encode() from PyString breaks about 5 tests.
I've a patch that renames PyString -> bytes and PyByte -> buffer while
keeping str8 as an alias for bytes until str8 is removed. It's based on
Alexandres patch which itself is partly based on my patch. It breaks a
hell of a lot but it could give you a head start.
>>> type(b'') is str8
>>> type(b'') is bytes
I'll keep working on the patch.
More information about the Python-3000