[Python-Dev] Logging, Unicode and sockets

Vinay Sajip vinay_sajip at yahoo.co.uk
Thu Oct 8 21:55:15 CEST 2009


Martin v. Löwis <martin <at> v.loewis.de> writes:

> I can't understand what the problem with SocketHandler/DatagramHandler
> is. As they use pickle, they should surely be able to send records with
> Unicode strings in them, no?

Of course you are right. When I posted that it was a knee-jerk reaction to the
issue that was raised for SysLogHandler configured to use UDP. I did realise a
bit later that the issue didn't apply to the other two handlers but I was hoping
nobody would notice ;-)

> OTOH, why is SMTPHandler not in your list?

I assumed smtp.sendmail() would deal with it, as it deals with the wire
protocol, but perhaps I was wrong to do so. I noticed that Issue 521270 (SMTP
does not handle Unicode) was closed, but I didn't look at it closely. I now see
it was perhaps only a partial solution. I did a bit of searching and found this
post by Marius Gedminas:

http://mg.pov.lt/blog/unicode-emails-in-python.html

Now if that's the right approach, shouldn't it be catered for in a more general
part of the stdlib than logging - perhaps in smtplib itself? Or, seeing that
Marius' post is five years old, is there a better way of doing it using the
stdlib as it is now?

> For syslog, I don't think that's appropriate. I presume this is meant to
> follow RFC 5424? If so, it SHOULD send the data in UTF-8, in which case
> it MUST include a BOM also. A.8 then says that if you are not certain
> that it is UTF-8 (which you wouldn't be if the application passes a byte
> string), you MAY omit the BOM.

So ISTM that the right thing to do on 2.x would be: if str to be sent, send as
is; if unicode to be sent, encode using utf-8 and send with a BOM. For 3.x, just
encode using utf-8 and send with a BOM.

Does that seem right?

Thanks and regards,

Vinay Sajip



More information about the Python-Dev mailing list