Is there really a default source encoding?
brian at sweetapp.com
Thu Jan 23 23:13:53 CET 2003
> > I don't understand the goal of your proposal. The goal of the
> > plan is to support existing non-ASCII source files but warn the user
> > that they will need to add an encoding comment in the future.
> This is a laudable goal. I just didn't understand why one would want
> after slowly and carefully moving away from eurocentric (latin-1),
> even further back to anglocentric (ascii) instead of opting for truly
> international (and anglo-neutral, utf-8).
UTF-8 is certainly not "anglo-neutral". It is often prohibitively
expensive to encode Japanese and Chinese text in UTF-8 (UTF-16 is much
Defaulting to ASCII seems like a very reasonable solution because it is
symmetrical with the way string => unicode conversions are done.
Once the Python Unicode type is called "string", current strings are
called "byte strings" and implicit byte string => string conversions are
NEVER supported then maybe we can make a Unicode encoding the default
> Great. Only are you sure that BOMs are such a great idea?
I don't really care about how screwed-up Unix Unicode handling is. The
Unicode people should be willing to change their text model every 3
decades or so.
> I don't pretend to be a great unicode expert and maybe the above is
> outdated, flawed, irrelevant or whatever, but it still isn't clear to
> me why .py files (with or without BOM) shouldn't just be assumed to
> be utf-8 (after the transitory latin-1 period), BOM or no BOM (and my
> cursory rereading of pep-263 didn't make it clear to me either).
I wouldn't object to this but I don't see it as a major issue.
More information about the Python-list