[Python-Dev] thoughts on the bytes/string discussion

Wed Jul 7 17:35:37 CEST 2010

Greg Ewing writes:

 > The use cases I had in mind for a 1-byte build are those for
 > which the alternative would be keeping everything in bytes.
 > Applications using a 1-byte build would need to be aware of
 > the fact and take care to slice strings at valid places. If
 > they were using bytes, they would have to face exactly the
 > same issues.

In other words, the people who want to use bytes have no less pain,
and the people who want to use characters suffer much greater pain.
How can this be a win?  If you live in an ASCII-only world, there are
a few APIs where bytes aren't allowed, and indeed it would be a win to
use those APIs on ASCII-encoded bytestrings.  And I don't mean
ISO-8859-1-only, either; UTF-8 is not compatible with ISO-8859-1 at
the byte level.

But the proposal Guido supports would address that by making those
APIs polymorphic.

 > > And finally: RAM is cheap and today's CPUs work better with 16- or
 > > 32-bit values than 8-bit characters.
 > 
 > Yet some people have reported significant performance benefits
 > for some applications from using a 2-byte build instead of a
 > 4-byte build. I was just speculating whether a 1-byte build
 > might be of further advantage in a few specialised cases.

Of course it would be.  But as soon as you want to do *any* I/O in
text mode with non-ASCII characters, you're in real pain.  What do you
do if a user cut/pastes some text containing proper quotation marks or
an en-dash at prompt in a terminal?  So polymorphism is a far better
way to optimize those special cases, as it allows a byte string in any
encoding to be treated as text, not just UTF-8.