nntplib encoding problem

Thomas L. Shinnick tshinnic at io.com
Mon Feb 28 04:26:02 CET 2011


At 08:12 PM 2/27/2011, you wrote:
>On 28/02/2011 01:31, Laurent Duchesne wrote:
>>Hi,
>>
>>I'm using python 3.2 and got the following error:
>>
>>>>>nntpClient = nntplib.NNTP_SSL(...)
>>>>>nntpClient.group("alt.binaries.cd.lossless")
>>>>>nntpClient.over((534157,534157))
>>... 'subject': 'Myl\udce8ne Farmer - Anamorphosee (Japan Edition) 1995
>>[02/41] "Back.jpg" yEnc (1/3)' ...
>>>>>overview = nntpClient.over((534157,534157))
>>>>>print(overview[1][0][1]['subject'])
>>Traceback (most recent call last):
>>File "<stdin>", line 1, in <module>
>>UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
>>position 3: surrogates not allowed
>>
>>I'm not sure if I should report this as a bug in nntplib or if I'm doing
>>something wrong.
>>
>>Note that I get the same error if I try to write this data to a file:
>>
>>>>>h = open("output.txt", "a")
>>>>>h.write(overview[1][0][1]['subject'])
>>Traceback (most recent call last):
>>File "<stdin>", line 1, in <module>
>>UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
>>position 3: surrogates not allowed
>It's looks like the subject was originally encoded as Latin-1 (or
>similar) (b'Myl\xe8ne Farmer - Anamorphosee (Japan Edition) 1995
>[02/41] "Back.jpg" yEnc (1/3)') but has been decoded as UTF-8 with
>"surrogateescape" passed as the "errors" parameter.

3.2 Docs
   6.6. codecs — Codec registry and base classes
     Possible values for errors are
       'surrogateescape': replace with surrogate U+DCxx, see PEP 383

Yes, it would have been 0xE8 -  Mylène

Googling on surrogateescape I can see lots of 
argument about unintended outcomes....  yikes!

>You can get the "correct" Unicode by encoding as UTF-8 with
>"surrogateescape" and then decoding as Latin-1:
>
> 
>overview[1][0][1]['subject'].encode("utf-8", 
>"surrogateescape").decode("latin-1")
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110227/f902af15/attachment.html>


More information about the Python-list mailing list