[I18n-sig] Random thoughts on Unicode and Python

"Walter Dörwald" walter@livinglogic.de
Mon, 12 Feb 2001 12:27:16 +0100


On 10.02.01 at 22:56 M.-A. Lemburg wrote:

> [...]
> We are trying to tell people that storing text data is better
> done in Unicode than in a raw data buffer like Python's current
> string data type.

It's not enought to tell people, you actually have to make sure
that storing unicode text data is better and more convenient 
than plain old strings, this means that Unicode text must be 
usable in:
	open(u"foo.txt")
	urllib.open(u"foo.txt")
	s = eval(u"\"\\u3042\"")
	exec(u"s = \"\\u3042\"")
	os.stat(u"foo.txt")
	os.system(u"foo -x \u3042")
	os.popen2(u"foo -x \u3042",u"r")
and thousands of others.

I think that the first step should be to make Unicode usable 
everywhere. As a first step this can be done by converting to 
the default encoding internally (as e.g. eval and exec do now),

There may be OS services (e.g. file i/o) that are not Unicode 
aware. For these services converting to the default encoding
is all that can be done, but when the OS supports Unicode, it
should be used (for example Unicode filenames on NT/2000).

The next step should be to switch to Unicode internally, i.e.
use Unicode for Python variable names, module names, source 
code, etc. 

>  [...]

Just my $0.02!



Bye,
   Walter Dörwald

-- 
Walter Dörwald · LivingLogic AG · Bayreuth, Germany ·
www.livinglogic.de