Is there really a default source encoding?

Brian Quinlan brian at sweetapp.com
Thu Jan 23 23:13:53 CET 2003


 > > I don't understand the goal of your proposal. The goal of the
current
> > plan is to support existing non-ASCII source files but warn the user
> > that they will need to add an encoding comment in the future.
> 
> This is a laudable goal. I just didn't understand why one would want
to,
> after slowly and carefully moving away from eurocentric (latin-1),
revert
> even further back to anglocentric (ascii) instead of opting for truly
> international (and anglo-neutral, utf-8). 

UTF-8 is certainly not "anglo-neutral". It is often prohibitively
expensive to encode Japanese and Chinese text in UTF-8 (UTF-16 is much
more popular).

Defaulting to ASCII seems like a very reasonable solution because it is
symmetrical with the way string => unicode conversions are done.

Once the Python Unicode type is called "string", current strings are
called "byte strings" and implicit byte string => string conversions are
NEVER supported then maybe we can make a Unicode encoding the default
(maybe UTF-32).

> Great. Only are you sure that BOMs are such a great idea?

I don't really care about how screwed-up Unix Unicode handling is. The
Unicode people should be willing to change their text model every 3
decades or so.
 
> I don't pretend to be a great unicode expert and maybe the above is
> outdated, flawed, irrelevant or whatever, but it still isn't clear to 
> me why .py files (with or without BOM) shouldn't just be assumed to 
> be utf-8 (after the transitory latin-1 period), BOM or no BOM (and my
> cursory rereading of pep-263 didn't make it clear to me either).

I wouldn't object to this but I don't see it as a major issue. 

Cheers,
Brian






More information about the Python-list mailing list