nntplib encoding problem

Laurent Duchesne l at urent.org
Mon Feb 28 18:49:18 CET 2011


 Hi,

 Thanks it's working!
 But is it "normal" for a string coming out of a module (nntplib) to 
 crash when passed to print or write?

 I'm just asking to know if I should open a bug report or not :)

 I'm also wondering which strings should be re-encoded using the 
 surrogateescape parameter and which should not.. I guess I could 
 reencode them all and it wouldn't cause any problems?

 Laurent

 On Mon, 28 Feb 2011 02:12:20 +0000, MRAB wrote:
> On 28/02/2011 01:31, Laurent Duchesne wrote:
>> Hi,
>>
>> I'm using python 3.2 and got the following error:
>>
>>>>> nntpClient = nntplib.NNTP_SSL(...)
>>>>> nntpClient.group("alt.binaries.cd.lossless")
>>>>> nntpClient.over((534157,534157))
>> ... 'subject': 'Myl\udce8ne Farmer - Anamorphosee (Japan Edition) 
>> 1995
>> [02/41] "Back.jpg" yEnc (1/3)' ...
>>>>> overview = nntpClient.over((534157,534157))
>>>>> print(overview[1][0][1]['subject'])
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
>> position 3: surrogates not allowed
>>
>> I'm not sure if I should report this as a bug in nntplib or if I'm 
>> doing
>> something wrong.
>>
>> Note that I get the same error if I try to write this data to a 
>> file:
>>
>>>>> h = open("output.txt", "a")
>>>>> h.write(overview[1][0][1]['subject'])
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> UnicodeEncodeError: 'utf-8' codec can't encode character '\udce8' in
>> position 3: surrogates not allowed
>>
> It's looks like the subject was originally encoded as Latin-1 (or
> similar) (b'Myl\xe8ne Farmer - Anamorphosee (Japan Edition) 1995
> [02/41] "Back.jpg" yEnc (1/3)') but has been decoded as UTF-8 with
> "surrogateescape" passed as the "errors" parameter.
>
> You can get the "correct" Unicode by encoding as UTF-8 with
> "surrogateescape" and then decoding as Latin-1:
>
>     overview[1][0][1]['subject'].encode("utf-8",
> "surrogateescape").decode("latin-1")




More information about the Python-list mailing list