[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

3 Nov 2021


      Chris Angelico writes:
...
Ah, okay, so much for that, then. What about the weaker sense:
Characters below 128 are always and only represented by those byte
values? So if you find byte value 39, it might not actually be an
apostrophe, but if you're looking for an apostrophe, you know for sure
that it'll be represented by byte value 39?
1.  The apostrophe that Python considers a string delimiter is always
    represented by byte value 39 in the compiler input.  So the only
    time that wouldn't be true is if escape sequences are allowed to
    represent characters.  I believe unicode_escape is the only codec
    that does.

2.  There's always eval which will accept a string containing escape
    sequences.
...
Yes. I'm sure someone will come along and say "but I have to have an
all-ASCII source file, directly runnable, with non-ASCII variable
names", because XKCD 1172, but I don't have enough sympathy for that
obscure situation to want the mess that unicode_escape can give.
It's not an obscure situation to me.  As I wrote earlier, been there,
done that, made my own T-shirt.  I don't *think* it matters today, but
the number of DOS machines and Windows 98 machines left in Japan is
not zero.  Probably they can't run Python 3, but that's not something
I can testify to.

[Python-Dev] Re: pre-PEP: Unicode Security Considerations for Python

Stephen J. Turnbull