[Python-3000] PEP 3137 plan of attack

Christian Heimes lists at cheimes.de
Wed Oct 10 21:08:27 CEST 2007


Guido van Rossum wrote:
> Definitely not. basestring is for text strings. We could even decide
> to remove it; we should instead have ABCs for this purpose.

I'm going to provide a patch which rips basestring out, k? Somebody has
to write a fixer for 2to3 which replaces code like isinstance(egg,
basestring) with isinstance(egg, str).

> You mean 'formerly', not 'formally' :-) I prefer to just call these by
> their C names (PyString) to be precise, as the C names aren't changing
> (at least not yet ;-).

Oh, formerly ... right. The current state of the names is very
confusing. It's going to cost me some cups of coffee.

  str - PyUnicode
  bytes - PyString
  buffer - PyBytes

> No, that's spelled out in the PEP. Those should all stay. (If you see
> a method that's not listed in the PEP, ask me about it before deleting
> it. :-)

Doh, I should have read the PEP again before asking the question.

I've a question about one point. The PEP states "They accept anything
that implements the PEP 3118 buffer API for bytes arguments, and return
the same type as the object whose method is called ("self")". Which
types do implement the buffer API? PyString, PyBytes but not PyUnicode?

For now the PyString takes PyUnicode objects are argument and vice versa
but PyBytes doesn't take unicode. Do I understand correctly that
PyString must not accept PyUnicode?

>>> b"abc".count("b")
1
>>> "abc".count(b"b")
1
>> buffer(b"abc").count("b")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: can't use str as char buffer
>>> buffer(b"abc").count(b"b")
1

> Several people have noted the same issue. My goal is to remove this
> behavior completely. I don't know how much it will take; these
> bootstrap issues are always hard to debug and sometimes hard to fix.

I tried to debug and fix it but I gave up after half an hour.

> I am looking into this a bit right now; I suspect it's got to do with
> some types that still return a PyString from their repr(). I noticed
> that even removing .encode() from PyString breaks about 5 tests.

Great!

I've a patch that renames PyString -> bytes and PyByte -> buffer while
keeping str8 as an alias for bytes until str8 is removed. It's based on
Alexandres patch which itself is partly based on my patch. It breaks a
hell of a lot but it could give you a head start.

>>> b''
b''
>>> type(b'')
<type 'bytes'>
>>> type(b'') is str8
True
>>> type(b'') is bytes
True
>>> type(buffer(b''))
<type 'buffer'>

I'll keep working on the patch.

Crys




More information about the Python-3000 mailing list