[Twisted-Python] Bytes vs unicode in twisted.python.logfile's python3 porting
Hi everyone, I'm working on #6749 for porting t.p.logfile to python3. I'm dealing with some test failures, which you can see from the buildbot here: http://buildbot.twistedmatrix.com/builders/python-3.3-tests/builds/1602/step... I have pasted the relevant bit into a gist here: https://gist.github.com/lvh/7174766 I think what's happening is that LogFile.write should take native strings (since that's what log.msg takes). However, I'm opening all files in binary mode, since that's on the reviewer checklist (point 8) for the Python 3 porting plan. Should I not open the files in binary mode? cheers lvh
On 26 Oct, 09:49 pm, _@lvh.io wrote:
Hi everyone,
I'm working on #6749 for porting t.p.logfile to python3. I'm dealing with some test failures, which you can see from the buildbot here:
http://buildbot.twistedmatrix.com/builders/python-3.3-tests/builds/1602/step...
I have pasted the relevant bit into a gist here:
https://gist.github.com/lvh/7174766
I think what's happening is that LogFile.write should take native strings (since that's what log.msg takes). However, I'm opening all files in binary mode, since that's on the reviewer checklist (point 8) for the Python 3 porting plan.
Argh. I was all set up to object to the "...should take native strings" bit but then I reviewed some history. If you haven't already, read <https://twistedmatrix.com/trac/ticket/5932>. In hindsight, it would have been nice if this branch had come with documentation of some kind. From what I can tell, this means: 1. yep, `log.msg` takes native strings 2. LogPublisher doesn't do anything with encoding or decoding 3. a log observer can choose to do whatever it wants with what it gets (but it should be prepared to handle bytes on python 2 and unicode on python 3) So... invent some policy. What encoding does Twisted's built-in file log observer use for the files it writes? Up until now the answer has been ASCII because it doesn't try to handle unicode so anything non- ASCII results in exceptions (unless you're using PyGTK... let's not go there). UTF-8 is obviously the only possible correct answer, I guess. This needs to be documented, of course. And I predict that next someone will come along with a feature request for a command-line option to twistd to make it write logs with a different encoding. :( Whether you open the file in binary mode or not is up to you in this case, I think. You could do that and handle the encoding yourself or you could open it in text mode with the right encoding and let it handle the encoding. The blanket statement on https://twistedmatrix.com/trac/wiki/Plan/Python3 about open calls is perhaps slightly too general. It seems like that should apply to cases where the behavior is not supposed to be changing - which should be the case for the majority of porting work. However, `log.msg` has already changed behavior as part of the port. *Or*, it now occurs to me, just stick with the ASCII-only policy that's already in place. I'd even say this is more correct since porting isn't supposed to change behavior. Leave support for some other codec for another ticket (perhaps #989). Apart from being simpler (I hope) and avoiding breaching the documented porting guidelines, this also means someone will actually have to think about unicode support on Python 2 as well. Saying we support unicode in the logging system is a lot better than saying we support unicode in the logging system on Python 3 only. Jean-Paul
Hi JP, On Sun, Oct 27, 2013 at 2:19 AM, <exarkun@twistedmatrix.com> wrote:
*Or*, it now occurs to me, just stick with the ASCII-only policy that's already in place. I'd even say this is more correct since porting isn't supposed to change behavior. Leave support for some other codec for another ticket (perhaps #989). Apart from being simpler (I hope) and avoiding breaching the documented porting guidelines, this also means someone will actually have to think about unicode support on Python 2 as well. Saying we support unicode in the logging system is a lot better than saying we support unicode in the logging system on Python 3 only.
Did this. There's a few Windows-specific bugs left on the ticket that I don't know how to fix, unfortunately. cheers lvh
On Oct 27, 2013, at 6:57 AM, Laurens Van Houtven <_@lvh.io> wrote:
Hi JP,
On Sun, Oct 27, 2013 at 2:19 AM, <exarkun@twistedmatrix.com> wrote: *Or*, it now occurs to me, just stick with the ASCII-only policy that's already in place. I'd even say this is more correct since porting isn't supposed to change behavior. Leave support for some other codec for another ticket (perhaps #989). Apart from being simpler (I hope) and avoiding breaching the documented porting guidelines, this also means someone will actually have to think about unicode support on Python 2 as well. Saying we support unicode in the logging system is a lot better than saying we support unicode in the logging system on Python 3 only.
Did this.
There's a few Windows-specific bugs left on the ticket that I don't know how to fix, unfortunately.
The new logging branch (destined for review Real Soon Now) is a bit more rigorous about sensible behavior with respect to encoding, so we can tighten up this behavior and make it work in more cases without breaking compatibility. What you're doing here basically makes sense though. -glyph
Hi Glyph, On Wed, Oct 30, 2013 at 1:22 AM, Glyph <glyph@twistedmatrix.com> wrote:
The new logging branch (destined for review Real Soon Now) is a bit more rigorous about sensible behavior with respect to encoding, so we can tighten up this behavior and make it work in more cases without breaking compatibility. What you're doing here basically makes sense though.
Okay, great :-) I specialcased Windows so that the test would use NUL instead of /dev/null. That appears to make the win7 bot happy. I've resubmitted it for review, we'll see what happens :) cheers lvh
participants (3)
-
exarkun@twistedmatrix.com
-
Glyph
-
Laurens Van Houtven