cp936 uses gbk codec, doesn't decode `\x80` as U+20AC EURO SIGN

Ulrich Eckhardt eckhardt at satorlaser.com
Mon Oct 11 04:54:05 EDT 2010


John Machin wrote:
> |>>> '\x80'.decode('cp936')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'gbk' codec can't decode byte 0x80
>  in position 0: incomplete multibyte sequence
[...]
> So Microsoft appears to think that
> cp936 includes the euro,
> and the ICU project seem to think that GBK and cp936
> both include the euro.
> 
> A couple of questions:
> 
> Is this a bug or a shrug?

Bug, IMHO.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932




More information about the Python-list mailing list