[Python-Dev] Maintenance burden of str.swapcase

Wed Sep 7 14:47:49 CEST 2011

On Wed, 07 Sep 2011 11:15:04 +0900
"Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> Antoine Pitrou writes:
> 
>  > Bytes objects are often used for partly ASCII strings,
> 
> All I can say to that phrase is, "urk, ISO 2022 anyone?"

You could also point out UTF-16 or EBCDIC, but I fail to see how that's
relevant. Do you have problems with ISO 2022 when parsing, say, e-mail
headers?

>  > not arbitrary "arrays of bytes". And making indexing of bytes
>  > objects return ints was IMHO a mistake.
> 
> Bytes objects are not ASCII strings, even though they can be used to
> represent them.

I'm talking about practice, not some idealistic view of the world.
In many use cases (XML, HTML, e-mail headers, many other test-based
protocols), you can get a mixture of ASCII "commands", and opaque
binary stuff (which will or will not, depending on these "commands",
have a meaningful unicode decoding).

In the stdlib, bytes objects are accessed far more often to poke at
some text-like data, than to poke at arbitrary numbers.

> With PEP 393,
> there isn't even really a space excuse.

Of course there is. Any single non-ASCII byte of data mingled with
aforementioned ASCII "commands" will make it switch to a less efficient
representation.

And "surrogateescape" will be a performance problem in itself, when
used on large binary data; if you use "latin1" instead, you are risking
far greater confusion; ask David about that dilemma. :-)

> AFAICS, anything that should be done with ASCII-punned magic numbers
> ("protocol tokens", if you prefer) can be done with slices and (ta-da!)
> case conversion.

So, basically, you're saying that we should remove useful functionality
and tell people to reimplement an adhoc version of it when they need
it. That sounds obnoxious.

Regards

Antoine.