Multibyte Character Surport for Python

Thu May 9 09:58:11 EDT 2002

"Stephen J. Turnbull" <stephen at xemacs.org> writes:

>     Martin> You can't use UTF-8 to represent non-ASCII identifiers,
>     Martin> and Unicode objects later. Old byte code would not
>     Martin> interoperate with new byte code.
> 
> _This_ is a serious objection.  But if we're ever going to have
> non-ASCII identifiers with some sanity, that transition will have to
> be made.  So I guess it never will happen in PSF Python?

No. It just means that if the feature is implemented, it should be
done using Unicode objects right from the start. Unicode objects
interact with byte string favourably if the byte strings are ASCII
only: they have the same hash values, and compare equal, hence you can
mix ASCII strings and Unicode strings freely as dictionary keys.

> Maybe that's for the best.  Francois and my students can write in
> their preferred languages, and "official" Python will support Alex's
> "one world, one substrate for programming languages" campaign.

I'd encourage you to develop a separate patch for it (perhaps after
the PEP 263 patch gets integrated), and distribute it to users - based
on user feedback, we get a clearer view whether people would use this
feature, and in what way.

I agree with Alex on the policy that every group of people developing
software should use (i.e. all code in English); that policy will
certainly apply to the source code of Python itself. I disagree that
the language should prevent violations of the policy - I just think
there will be additional problems if the feature is implemented.

Regards,
Martin