[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Chris Angelico rosuav at gmail.com
Sun Jan 12 23:28:31 CET 2014


On Mon, Jan 13, 2014 at 4:57 AM, Juraj Sukop <juraj.sukop at gmail.com> wrote:
> On Sun, Jan 12, 2014 at 6:22 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:
>> First, "utf16_string" confuses me. What is it? If it is a Unicode
>> string, i.e.:
>
> It is a Unicode string which happens to contain code points outside U+00FF
> (as with the TTF example above), so that it triggers the (at least) 2-bytes
> memory representation in CPython 3.3+. I agree, I chose the variable name
> poorly, my bad.

When I'm talking about Unicode strings based on their maximum
codepoint, I usually call them something like "ASCII string", "Latin-1
string", "BMP string", and "SMP string". Still not wholly accurate,
but less confusing than naming an encoding... oh wait, two of those
_are_ encodings :| But you could use "narrow string" for the first
two. Or "string(0..127)" for ASCII, "string(0..255)" for Latin-1, and
then for consistency "string(0..65535)" and "string(0..1114111)" for
the others, except that I doubt that'd be helpful :) At any rate,
"BMP" as a term for "includes characters outside of Latin-1 but all on
the Basic Multilingual Plane" would probably be close enough to get
away with.

ChrisA


More information about the Python-Dev mailing list