[Python-Dev] Re: PEP 263 - Defining Python Source Code Encodings
Martin v. Loewis
martin@v.loewis.de
15 Jul 2002 23:16:52 +0200
Guido van Rossum <guido@python.org> writes:
> Yes, but all the non-ASCII has to be represented as Unicode strings.
> I.e. no Latin-1 in 8-bit strings!
Exactly. This might still cause problems for inspect and other
introspective tools.
For ASCII identifiers, I agree that using byte strings is sensible,
for best backwards compatibility.
> Really? I thought Unicode's isalpha() was built on the Unicode text
> database?
It isn't if it has a "usable wchar_t", see unicodeobject.h:
#if defined(HAVE_USABLE_WCHAR_T) && defined(WANT_WCTYPE_FUNCTIONS)
#include <wctype.h>
#define Py_UNICODE_ISSPACE(ch) iswspace(ch)
...
I was missing the part that it also requires active selection of
wctype functions - that is probably a feature that is never used. So
it is better than I thought: isletter might vary across builds on the
same platform, but likely never varies in practice.
Regards,
Martin