[Twisted-Python] Twisted FTP: Data must not be unicode

Should I file a bug? If so, any guidelines what to do? --- <exception caught here> --- File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/internet/defer.py", line 545, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/protocols/ftp.py", line 907, in gotListing self.dtpInstance.sendListResponse(name, attrs) File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/protocols/ftp.py", line 421, in sendListResponse self.sendLine(self._formatOneListResponse(name, *response)) File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/protocols/ftp.py", line 385, in sendLine self.transport.write(line + '\r\n') File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/internet/_newtls.py", line 180, in write FileDescriptor.write(self, bytes) File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/internet/abstract.py", line 300, in write raise TypeError("Data must not be unicode") exceptions.TypeError: Data must not be unicode [autobahn@autobahnhub ~/Twisted]$ svn diff twisted/protocols/ftp.py Index: twisted/protocols/ftp.py =================================================================== --- twisted/protocols/ftp.py (revision 33225) +++ twisted/protocols/ftp.py (working copy) @@ -382,7 +382,7 @@ self._onConnLost.callback(None) def sendLine(self, line): - self.transport.write(line + '\r\n') + self.transport.write(str(line) + '\r\n') def _formatOneListResponse(self, name, size, directory, permissions, hardlinks, modified, owner, group):

On Jo, 2011-11-24 at 05:49 -0800, Tobias Oberstein wrote:
Should I file a bug? If so, any guidelines what to do?
--- <exception caught here> --- File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/internet/defer.py", line 545, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/protocols/ftp.py", line 907, in gotListing self.dtpInstance.sendListResponse(name, attrs) File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/protocols/ftp.py", line 421, in sendListResponse self.sendLine(self._formatOneListResponse(name, *response)) File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/protocols/ftp.py", line 385, in sendLine self.transport.write(line + '\r\n') File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/internet/_newtls.py", line 180, in write FileDescriptor.write(self, bytes) File "/home/autobahn/python/lib/python2.7/site-packages/Twisted-11.1.0_r33225-py2.7-freebsd-8.2-RELEASE-p3-i386.egg/twisted/internet/abstract.py", line 300, in write raise TypeError("Data must not be unicode") exceptions.TypeError: Data must not be unicode
[autobahn@autobahnhub ~/Twisted]$ svn diff twisted/protocols/ftp.py Index: twisted/protocols/ftp.py =================================================================== --- twisted/protocols/ftp.py (revision 33225) +++ twisted/protocols/ftp.py (working copy) @@ -382,7 +382,7 @@ self._onConnLost.callback(None)
def sendLine(self, line): - self.transport.write(line + '\r\n') + self.transport.write(str(line) + '\r\n')
def _formatOneListResponse(self, name, size, directory, permissions, hardlinks, modified, owner, group):
Rather than implementing an hard-coded conversion at lower level of FTP protocol implementation, you could write explicit unicode encoders to bytestream before making a call to FTP API. By doing so you can have an FTP protocol, that beside UTF could also handle other encodings. Cheers, -- Adi Roiban

Rather than implementing an hard-coded conversion at lower level of FTP protocol implementation, you could write explicit unicode encoders to bytestream before making a call to FTP API.
By doing so you can have an FTP protocol, that beside UTF could also handle other encodings.
I'm not making calls to FTP protocol ... this is a FTP server and the errors pops when the client does a DIR.

On 01:49 pm, tobias.oberstein@tavendo.de wrote:
Should I file a bug? If so, any guidelines what to do?
This report isn't sufficiently complete to decide if this is a bug in Twisted or in something else. You really cannot send unicode over a socket without encoding it. The question to consider here is the question of whose responsibility it should be to do that encoding in this case.
[snip]
[autobahn@autobahnhub ~/Twisted]$ svn diff twisted/protocols/ftp.py Index: twisted/protocols/ftp.py =================================================================== --- twisted/protocols/ftp.py (revision 33225) +++ twisted/protocols/ftp.py (working copy) @@ -382,7 +382,7 @@ self._onConnLost.callback(None)
def sendLine(self, line): - self.transport.write(line + '\r\n') + self.transport.write(str(line) + '\r\n')
This isn't the correct fix, even if the bug is in Twisted's FTP support. `str(line)` is the least reliable way to encode a unicode string into a byte string. It has unpredictable behavior (it relies on the action- at-a-distance API, `sys.setdefaultencoding`, which doesn't even exist most of the time, but which can be used to completely change what `str(unicode)` does). A more correct solution would be `line.encode(someencoding)`. However, looking at `sendLine`, it's clear that the value of `someencoding` is not easily decided upon. Should it be UTF-8? ASCII with an error replacement policy? cp1252? Does it depend on the client, or the server, or the filesystem encoding, or a user preference? An even more correct solution would be for `line` to have been encoded properly already before it was passed to `sendLine`. Where did the data come from, and why wasn't it encoded already? Jean-Paul

On Thu, 24 Nov 2011 14:07:07 -0000 exarkun@twistedmatrix.com wrote:
A more correct solution would be `line.encode(someencoding)`. However, looking at `sendLine`, it's clear that the value of `someencoding` is not easily decided upon. Should it be UTF-8? ASCII with an error replacement policy? cp1252? Does it depend on the client, or the server, or the filesystem encoding, or a user preference?
RFC 3659 specifies UTF-8 as the default encoding. The Python 3 port of Twisted uses this (with the "surrogateescape" error handler). Regards Antoine.

On 02:23 pm, solipsis@pitrou.net wrote:
On Thu, 24 Nov 2011 14:07:07 -0000 exarkun@twistedmatrix.com wrote:
A more correct solution would be `line.encode(someencoding)`. However, looking at `sendLine`, it's clear that the value of `someencoding` is not easily decided upon. Should it be UTF-8? ASCII with an error replacement policy? cp1252? Does it depend on the client, or the server, or the filesystem encoding, or a user preference?
RFC 3659 specifies UTF-8 as the default encoding. The Python 3 port of Twisted uses this (with the "surrogateescape" error handler).
Please say "My Python 3 port of Twisted". The work you're doing is appreciated, but it's also important to avoid confusing people into thinking that there is an official version of Twisted for Python 3. Twisted is what's hosted on <http://twistedmatrix.com/> (11.1 just released!) and is still limited to Python 2.4 - 2.7. It's very easy to confuse people and get them thinking that a piece of software is supported in a context where it's not, or by a team that it's not. I want everyone to remain clear on the current status of Python 3 support in *Twisted* (the thing we have a mailing list for, the thing we have an issue tracker for, the thing we have continuous integration for, the thing we periodically make releases of) as distinct from the progress of various porting efforts. Thanks, Jean-Paul

On Thu, 24 Nov 2011 14:58:14 -0000 exarkun@twistedmatrix.com wrote:
On 02:23 pm, solipsis@pitrou.net wrote:
On Thu, 24 Nov 2011 14:07:07 -0000 exarkun@twistedmatrix.com wrote:
A more correct solution would be `line.encode(someencoding)`. However, looking at `sendLine`, it's clear that the value of `someencoding` is not easily decided upon. Should it be UTF-8? ASCII with an error replacement policy? cp1252? Does it depend on the client, or the server, or the filesystem encoding, or a user preference?
RFC 3659 specifies UTF-8 as the default encoding. The Python 3 port of Twisted uses this (with the "surrogateescape" error handler).
Please say "My Python 3 port of Twisted". The work you're doing is appreciated, but it's also important to avoid confusing people into thinking that there is an official version of Twisted for Python 3. Twisted is what's hosted on <http://twistedmatrix.com/> (11.1 just released!) and is still limited to Python 2.4 - 2.7.
You're right. I've just added and bolded "unofficial" on https://bitbucket.org/pitrou/t3k/wiki/Home Regards Antoine.

On Nov 24, 2011, at 12:37 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
You're right. I've just added and bolded "unofficial" on https://bitbucket.org/pitrou/t3k/wiki/Home
Thanks for doing this! And thanks even more for your continued efforts on making most of this official, eventually :)

An even more correct solution would be for `line` to have been encoded properly already before it was passed to `sendLine`. Where did the data come from, and why wasn't it encoded already?
The data is coming from the FTP directory listing formatting function ftp.DTP. _formatOneListResponse Here is a more localized fix: [autobahn@autobahnhub ~/Twisted]$ svn diff twisted/protocols/ftp.py Index: twisted/protocols/ftp.py =================================================================== --- twisted/protocols/ftp.py (revision 33225) +++ twisted/protocols/ftp.py (working copy) @@ -415,7 +415,7 @@ 'group': group[:8], 'size': size, 'date': formatDate(time.gmtime(modified)), - 'name': name} + 'name': name.encode("utf-8")} def sendListResponse(self, name, response): self.sendLine(self._formatOneListResponse(name, *response))

On 02:26 pm, tobias.oberstein@tavendo.de wrote:
An even more correct solution would be for `line` to have been encoded properly already before it was passed to `sendLine`. Where did the data come from, and why wasn't it encoded already?
The data is coming from the FTP directory listing formatting function
ftp.DTP. _formatOneListResponse
Here is a more localized fix:
[autobahn@autobahnhub ~/Twisted]$ svn diff twisted/protocols/ftp.py Index: twisted/protocols/ftp.py =================================================================== --- twisted/protocols/ftp.py (revision 33225) +++ twisted/protocols/ftp.py (working copy) @@ -415,7 +415,7 @@ 'group': group[:8], 'size': size, 'date': formatDate(time.gmtime(modified)), - 'name': name} + 'name': name.encode("utf-8")}
def sendListResponse(self, name, response): self.sendLine(self._formatOneListResponse(name, *response))
Cool, that seems much more reasonable. It would be great if you could file a ticket for this. Thanks! Jean-Paul

On Jo, 2011-11-24 at 15:00 +0000, exarkun@twistedmatrix.com wrote:
On 02:26 pm, tobias.oberstein@tavendo.de wrote:
An even more correct solution would be for `line` to have been encoded properly already before it was passed to `sendLine`. Where did the data come from, and why wasn't it encoded already?
The data is coming from the FTP directory listing formatting function
ftp.DTP. _formatOneListResponse
Here is a more localized fix:
[autobahn@autobahnhub ~/Twisted]$ svn diff twisted/protocols/ftp.py Index: twisted/protocols/ftp.py =================================================================== --- twisted/protocols/ftp.py (revision 33225) +++ twisted/protocols/ftp.py (working copy) @@ -415,7 +415,7 @@ 'group': group[:8], 'size': size, 'date': formatDate(time.gmtime(modified)), - 'name': name} + 'name': name.encode("utf-8")}
def sendListResponse(self, name, response): self.sendLine(self._formatOneListResponse(name, *response))
Cool, that seems much more reasonable. It would be great if you could file a ticket for this. Thanks!
The source of this evil is FTPAnonymousShell.list as this is the entry point for the file name. FTPAnonymousShell.list -- Adi Roiban

Cool, that seems much more reasonable. It would be great if you could file a ticket for this. Thanks!
http://twistedmatrix.com/trac/ticket/5411
The source of this evil is FTPAnonymousShell.list as this is the entry point for the file name. FTPAnonymousShell.list
Yep, I see. The patch above does the fix at the mediating function ftp.DTP. _formatOneListResponse which is not the entry point above, but has the charme that it's really minimally invasive.
participants (5)
-
Adi Roiban
-
Antoine Pitrou
-
exarkun@twistedmatrix.com
-
Glyph Lefkowitz
-
Tobias Oberstein