[Python-Dev] Logging, Unicode and sockets
Vinay Sajip
vinay_sajip at yahoo.co.uk
Thu Oct 8 21:55:15 CEST 2009
Martin v. Löwis <martin <at> v.loewis.de> writes:
> I can't understand what the problem with SocketHandler/DatagramHandler
> is. As they use pickle, they should surely be able to send records with
> Unicode strings in them, no?
Of course you are right. When I posted that it was a knee-jerk reaction to the
issue that was raised for SysLogHandler configured to use UDP. I did realise a
bit later that the issue didn't apply to the other two handlers but I was hoping
nobody would notice ;-)
> OTOH, why is SMTPHandler not in your list?
I assumed smtp.sendmail() would deal with it, as it deals with the wire
protocol, but perhaps I was wrong to do so. I noticed that Issue 521270 (SMTP
does not handle Unicode) was closed, but I didn't look at it closely. I now see
it was perhaps only a partial solution. I did a bit of searching and found this
post by Marius Gedminas:
http://mg.pov.lt/blog/unicode-emails-in-python.html
Now if that's the right approach, shouldn't it be catered for in a more general
part of the stdlib than logging - perhaps in smtplib itself? Or, seeing that
Marius' post is five years old, is there a better way of doing it using the
stdlib as it is now?
> For syslog, I don't think that's appropriate. I presume this is meant to
> follow RFC 5424? If so, it SHOULD send the data in UTF-8, in which case
> it MUST include a BOM also. A.8 then says that if you are not certain
> that it is UTF-8 (which you wouldn't be if the application passes a byte
> string), you MAY omit the BOM.
So ISTM that the right thing to do on 2.x would be: if str to be sent, send as
is; if unicode to be sent, encode using utf-8 and send with a BOM. For 3.x, just
encode using utf-8 and send with a BOM.
Does that seem right?
Thanks and regards,
Vinay Sajip
More information about the Python-Dev
mailing list