[Python-Dev] Re: Allowing u.encode() to return non-strings
"Martin v. Löwis"
martin at v.loewis.de
Wed Jun 30 13:10:39 EDT 2004
Terry Reedy wrote:
> Python strings are sequences of 0 to n chars from an abstract 256-char
> alphabet. This meets my understanding of the standard 20th century CS
> definition of string. Has there been a significant change in the last few
> years?
Yes. Abstract 256-char alphabets have been found useless for the
representation of natural-language text. You need concrete alphabets,
and having more than 256 characters is often important.
> The byte set is intentionally not any *particular* natural language char
> set, but a possible carrier for any of them. Perhaps unfortunately, it
> lacks a single standard glyph set or graphic representation., but I believe
> Unicode also differentiates between characters (code points?) and glyphs
> (which are also not standardized).
Yes. But Unicode does define concrete characters - even if it leaves
the choice of glyphs.
Regards,
Martin
More information about the Python-Dev
mailing list