Right solution to unicode error?

Oscar Benjamin oscar.j.benjamin at gmail.com
Thu Nov 8 19:32:11 CET 2012


On 8 November 2012 15:05,  <wxjmfauth at gmail.com> wrote:
> Le jeudi 8 novembre 2012 15:07:23 UTC+1, Oscar Benjamin a écrit :
>> On 8 November 2012 00:44, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
>> > On 7 November 2012 23:51, Andrew Berg <bahamutzero8825 at gmail.com> wrote:
>> >> On 2012.11.07 17:27, Oscar Benjamin wrote:
>>
>> >>> Are you using cmd.exe (standard Windows terminal)? If so, it does not
>> >>> support unicode
>>
>> >> Actually, it does. Code page 65001 is UTF-8. I know that doesn't help
>> >> the OP since Python versions below 3.3 don't support cp65001, but I
>> >> think it's important to point out that the Windows command line system
>> >> (it is not unique to cmd) does in fact support Unicode.
>>
>> > I have tried to use code page 65001 and it didn't work for me even if
>> > I did use a version of Python (possibly 3.3 alpha) that claimed to
>> > support it.
>>
>> I stand corrected. I've just checked and codepage 65001 does work in
>> cmd.exe (on this machine):
>>
>> O:\>chcp 65001
>> Active code page: 65001
>>
>> O:\>Q:\tools\Python33\python -c print('abc\u2013def')
>> abc-def
>>
>> O:\>Q:\tools\Python33\python -c print('\u03b1')
>> α
>>
>> It would be a lot better though if it just worked straight away
>> without me needing to set the code page (like the terminal in every
>> other OS I use).
>
> It *WORKS* straight away. The problem is that
> people do not wish to use unicode correctly
> (eg. Mulder's example).
> Read the point 1) and 4) in my previous post.
>
> Unicode and in general the coding of the characters
> have nothing to do with the os's or programming languages.

I don't know what you mean that it works "straight away".

The default code page on my machine is cp850.

O:\>chcp
Active code page: 850

cp850 doesn't understand utf-8. It just prints garbage:

O:\>Q:\tools\Python33\python -c "import sys;
sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
╬▒

Using the correct encoding doesn't help:

O:\>Q:\tools\Python33\python -c "import sys;
sys.stdout.buffer.write('\u03b1\n'.encode('cp850'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "Q:\tools\Python33\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in
position 0: character maps to
 <undefined>

O:\>Q:\tools\Python33\python -c "import sys;
sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
coding))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "Q:\tools\Python33\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in
position 0: character maps to
 <undefined>

If I want the other characters to work I need to change the code page:

O:\>chcp 65001
Active code page: 65001

O:\>Q:\tools\Python33\python -c "import sys;
sys.stdout.buffer.write('\u03b1\n'.encode('utf-8'))"
α

O:\>Q:\tools\Python33\python -c "import sys;
sys.stdout.buffer.write('\u03b1\n'.encode(sys.stdout.en
coding))"
α


Oscar


More information about the Python-list mailing list