
Let me tell you why you would want to have an encoding which can be set: (1) sday I am on a Japanese Windows box, I have a string called 'address' and I do 'print address'. If I see utf8, I see garbage. If I see Shift-JIS, I see the correct Japanese address. At this point in time, utf8 is an interchange format but 99% of the world's data is in various native encodings. Analogous problems occur on input. (2) I'm using htmlgen, which 'prints' objects to standard output. My web site is supposed to be encoded in Shift-JIS (or EUC, or Big 5 for Taiwan, etc.) Yes, browsers CAN detect and display UTF8 but you just don't find UTF8 sites in the real world - and most users just don't know about the encoding menu, and will get pissed off if they have to reach for it. Ditto for streaming output in some protocol. Java solves this (and we could too by hacking stdout) using Writer classes which are created as wrappers around an output stream and can take an encoding, but you lose the flexibility to 'just print'. I think being able to change encoding would be useful. What I do not want is to auto-detect it from the operating system when Python boots - that would be a portability nightmare. Regards, Andy ===== Andy Robinson Robinson Analytics Ltd. ------------------ My opinions are the official policy of Robinson Analytics Ltd. They just vary from day to day. __________________________________________________ Do You Yahoo!? Bid and sell for free at http://auctions.yahoo.com

You almost convinced me there, but I think this can still be done without changing the default encoding: simply reopen stdout with a different encoding. This is how Java does it. I/O streams with an encoding specified at open() are a very powerful feature. You can hide this in your $PYTHONSTARTUP. François Pinard might not like it though... BTW, someone asked what HP asked for: I can't reveal what exactly they asked for, basically because they don't seem to agree amongst themselves. The only firm statements I have is that they want i18n and that they want it fast (before the end of the year). The desire from Perl-compatible regexps comes from me, and the only reason is compatibility with re.py. (HP did ask for regexps, but they don't know the difference between POSIX and Perl if it poked them in the eye.) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
True and it probably covers all cases where setting the default encoding to something other than UTF-8 makes sense. I guess you've convinced me there ;-) The current proposal has wrappers around stream for this purpose: For explicit handling of Unicode using files, the unicodec module could provide stream wrappers which provide transparent encoding/decoding for any open stream (file-like object): import unicodec file = open('mytext.txt','rb') ufile = unicodec.stream(file,'utf-16') u = ufile.read() ... ufile.close() XXX unicodec.file(<filename>,<mode>,<encname>) could be provided as short-hand for unicodec.file(open(<filename>,<mode>),<encname>) which also assures that <mode> contains the 'b' character when needed. The above can be done using: import sys,unicodec sys.stdin = unicodec.stream(sys.stdin,'jis') sys.stdout = unicodec.stream(sys.stdout,'jis') -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 50 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

You almost convinced me there, but I think this can still be done without changing the default encoding: simply reopen stdout with a different encoding. This is how Java does it. I/O streams with an encoding specified at open() are a very powerful feature. You can hide this in your $PYTHONSTARTUP. François Pinard might not like it though... BTW, someone asked what HP asked for: I can't reveal what exactly they asked for, basically because they don't seem to agree amongst themselves. The only firm statements I have is that they want i18n and that they want it fast (before the end of the year). The desire from Perl-compatible regexps comes from me, and the only reason is compatibility with re.py. (HP did ask for regexps, but they don't know the difference between POSIX and Perl if it poked them in the eye.) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
True and it probably covers all cases where setting the default encoding to something other than UTF-8 makes sense. I guess you've convinced me there ;-) The current proposal has wrappers around stream for this purpose: For explicit handling of Unicode using files, the unicodec module could provide stream wrappers which provide transparent encoding/decoding for any open stream (file-like object): import unicodec file = open('mytext.txt','rb') ufile = unicodec.stream(file,'utf-16') u = ufile.read() ... ufile.close() XXX unicodec.file(<filename>,<mode>,<encname>) could be provided as short-hand for unicodec.file(open(<filename>,<mode>),<encname>) which also assures that <mode> contains the 'b' character when needed. The above can be done using: import sys,unicodec sys.stdin = unicodec.stream(sys.stdin,'jis') sys.stdout = unicodec.stream(sys.stdout,'jis') -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 50 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (3)
-
Andy Robinson
-
Guido van Rossum
-
M.-A. Lemburg