On 26 Oct, 09:49 pm, _@lvh.io wrote:
Hi everyone,
I'm working on #6749 for porting t.p.logfile to python3. I'm dealing with some test failures, which you can see from the buildbot here:
http://buildbot.twistedmatrix.com/builders/python-3.3-tests/builds/1602/step...
I have pasted the relevant bit into a gist here:
https://gist.github.com/lvh/7174766
I think what's happening is that LogFile.write should take native strings (since that's what log.msg takes). However, I'm opening all files in binary mode, since that's on the reviewer checklist (point 8) for the Python 3 porting plan.
Argh. I was all set up to object to the "...should take native strings" bit but then I reviewed some history. If you haven't already, read <https://twistedmatrix.com/trac/ticket/5932>. In hindsight, it would have been nice if this branch had come with documentation of some kind. From what I can tell, this means: 1. yep, `log.msg` takes native strings 2. LogPublisher doesn't do anything with encoding or decoding 3. a log observer can choose to do whatever it wants with what it gets (but it should be prepared to handle bytes on python 2 and unicode on python 3) So... invent some policy. What encoding does Twisted's built-in file log observer use for the files it writes? Up until now the answer has been ASCII because it doesn't try to handle unicode so anything non- ASCII results in exceptions (unless you're using PyGTK... let's not go there). UTF-8 is obviously the only possible correct answer, I guess. This needs to be documented, of course. And I predict that next someone will come along with a feature request for a command-line option to twistd to make it write logs with a different encoding. :( Whether you open the file in binary mode or not is up to you in this case, I think. You could do that and handle the encoding yourself or you could open it in text mode with the right encoding and let it handle the encoding. The blanket statement on https://twistedmatrix.com/trac/wiki/Plan/Python3 about open calls is perhaps slightly too general. It seems like that should apply to cases where the behavior is not supposed to be changing - which should be the case for the majority of porting work. However, `log.msg` has already changed behavior as part of the port. *Or*, it now occurs to me, just stick with the ASCII-only policy that's already in place. I'd even say this is more correct since porting isn't supposed to change behavior. Leave support for some other codec for another ticket (perhaps #989). Apart from being simpler (I hope) and avoiding breaching the documented porting guidelines, this also means someone will actually have to think about unicode support on Python 2 as well. Saying we support unicode in the logging system is a lot better than saying we support unicode in the logging system on Python 3 only. Jean-Paul