[Python-Dev] Re: [I18n-sig] Re: Unicode debate
M.-A. Lemburg
mal@lemburg.com
Tue, 02 May 2000 17:55:40 +0200
[Guido going ASCII]
Do you mean going ASCII all the way (using it for all
aspects where Unicode gets converted to a string and cases
where strings get converted to Unicode), or just
for some aspect of conversion, e.g. just for the silent
conversions from strings to Unicode ?
[BTW, I'm pretty sure that the Latin-1 folks won't like
ASCII for the same reason they don't like UTF-8: it's
simply an inconvenient way to write strings in their favorite
encoding directly in Python source code. My feeling in this
whole discussion is that it's more about convenience than
anything else. Still, it's very amusing ;-) ]
FYI, here's the conversion table of (potentially) all
conversions done by the implementation:
Python:
-------
string + unicode: unicode(string,'utf-8') + unicode
string.method(unicode): unicode(string,'utf-8').method(unicode)
print unicode: print unicode.encode('utf-8'); with stdout
redirection this can be changed to any
other encoding
str(unicode): unicode.encode('utf-8')
repr(unicode): repr(unicode.encode('unicode-escape'))
C (PyArg_ParserTuple):
----------------------
"s" + unicode: same as "s" + unicode.encode('utf-8')
"s#" + unicode: same as "s#" + unicode.encode('unicode-internal')
"t" + unicode: same as "t" + unicode.encode('utf-8')
"t#" + unicode: same as "t#" + unicode.encode('utf-8')
This effects all C modules and builtins. In case a C module
wants to receive a certain predefined encoding, it can
use the new "es" and "es#" parser markers.
Ways to enter Unicode:
----------------------
u'' + string same as unicode(string,'utf-8')
unicode(string,encname) any supported encoding
u'...unicode-escape...' unicode-escape currently accepts
Latin-1 chars as single-char input; using
escape sequences any Unicode char can be
entered (*)
codecs.open(filename,mode,encname)
opens an encoded file for
reading and writing Unicode directly
raw_input() + stdin redirection (see one of my earlier posts for code)
returns UTF-8 strings based on the input
encoding
IO:
---
open(file,'w').write(unicode)
same as open(file,'w').write(unicode.encode('utf-8'))
open(file,'wb').write(unicode)
same as open(file,'wb').write(unicode.encode('unicode-internal'))
codecs.open(file,'wb',encname).write(unicode)
same as open(file,'wb').write(unicode.encode(encname))
codecs.open(file,'rb',encname).read()
same as unicode(open(file,'rb').read(),encname)
stdin + stdout
can be redirected using StreamRecoders to handle any
of the supported encodings
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/