[Twisted-Python] twisted.internet.abstract.FileDescription.write vs unicode

Is there a good reason for twisted.internet.abstract.FileDescription.write to require isinstance(data, str) rather than also allowing isinstance(type, unicode)??

Yes. Sockets can only transmit and files can only store bytes, not characters. The python 'str' unfortunately does double-duty as a byte-string and a character string limited to ASCII and/or ISO8859-1 characters, which often causes confusion among users. The python 'unicode' is a character string, and thus has no place being written to a socket/file. If you want to write unicode data, first convert to the appropriate byte encoding, using .encode(encname). I'd suggest something like "UTF-8" or, perhaps you'd prefer "UTF-16LE". Or maybe something else...whatever your app requires. James On Jul 12, 2004, at 6:30 PM, Jeff Bowden wrote:

On Mon, Jul 12, 2004, Jeff Bowden wrote:
This is one of those rare things, an FAQ that really is a "Frequently Asked Question" not a "Question We Wish You'd Ask". Unfortunately, I haven't actually added it to the FAQs yet :( See the discussion at http://www.twistedmatrix.com/users/roundup.twistd/twisted/issue617, in particular Glyph Lefkowitz's comment: "Q. Why doesn't {API X} accept both string objects and unicode objects? Isn't it better to use unicode so you can support internationalization? A. Unicode is for talking about strings of human-readable text. String objects can also be used for this purpose, and when they are, it is better to use unicode, you are correct. However, {API X} is dealing with raw data, probably coming from a network connection, and is using String objects as containers of sequences of bytes. Unicode has no way of representing sequences of bytes and streams of 'raw', unparsed data. The data has to be translated at some level *above* that, in order to get things like unicode character alignment correct. For more information and some idea of the complexity involved, read http://www.sidhe.org/~dan/blog/archives/000255.html and http://www.joelonsoftware.com/articles/Unicode.html" -Mary

Mary Gardiner wrote:
Right, so I worked around this problem by calling .encode('utf8') in all the places where strings go out (after making appropriate changes to the content-encoding). It wasn't that complicated but it was a PITA and it will be an ongoing maintenance headache. It would be a lot nicer if the framework dealt with it transparently. FileDescriptor.write does seem like the wrong place to handle it even though that's where the error message pops out. Apparently what's needed is another layer on top of the http layer. Has anyone attempted to write one?

On Jul 12, 2004, at 6:30 PM, Jeff Bowden wrote:
Yeah, unicode doesn't have a designated wire format and Python's default encoding choice is generally VERY stupid (usually ascii or latin-1) which leads to hard to detect bugs. -bob

Bob Ippolito <bob@redivi.com> writes:
Pff, Python's choice is very wimpy which means that bugs bite you earlier. The real problem is the double-duty thing others alluded to; there's no way changing the default encoding can make this go away. Cheers, mwh -- same software, different verbosity settings (this one goes to eleven) -- the effbot on the martellibot

Yes. Sockets can only transmit and files can only store bytes, not characters. The python 'str' unfortunately does double-duty as a byte-string and a character string limited to ASCII and/or ISO8859-1 characters, which often causes confusion among users. The python 'unicode' is a character string, and thus has no place being written to a socket/file. If you want to write unicode data, first convert to the appropriate byte encoding, using .encode(encname). I'd suggest something like "UTF-8" or, perhaps you'd prefer "UTF-16LE". Or maybe something else...whatever your app requires. James On Jul 12, 2004, at 6:30 PM, Jeff Bowden wrote:

On Mon, Jul 12, 2004, Jeff Bowden wrote:
This is one of those rare things, an FAQ that really is a "Frequently Asked Question" not a "Question We Wish You'd Ask". Unfortunately, I haven't actually added it to the FAQs yet :( See the discussion at http://www.twistedmatrix.com/users/roundup.twistd/twisted/issue617, in particular Glyph Lefkowitz's comment: "Q. Why doesn't {API X} accept both string objects and unicode objects? Isn't it better to use unicode so you can support internationalization? A. Unicode is for talking about strings of human-readable text. String objects can also be used for this purpose, and when they are, it is better to use unicode, you are correct. However, {API X} is dealing with raw data, probably coming from a network connection, and is using String objects as containers of sequences of bytes. Unicode has no way of representing sequences of bytes and streams of 'raw', unparsed data. The data has to be translated at some level *above* that, in order to get things like unicode character alignment correct. For more information and some idea of the complexity involved, read http://www.sidhe.org/~dan/blog/archives/000255.html and http://www.joelonsoftware.com/articles/Unicode.html" -Mary

Mary Gardiner wrote:
Right, so I worked around this problem by calling .encode('utf8') in all the places where strings go out (after making appropriate changes to the content-encoding). It wasn't that complicated but it was a PITA and it will be an ongoing maintenance headache. It would be a lot nicer if the framework dealt with it transparently. FileDescriptor.write does seem like the wrong place to handle it even though that's where the error message pops out. Apparently what's needed is another layer on top of the http layer. Has anyone attempted to write one?

On Jul 12, 2004, at 6:30 PM, Jeff Bowden wrote:
Yeah, unicode doesn't have a designated wire format and Python's default encoding choice is generally VERY stupid (usually ascii or latin-1) which leads to hard to detect bugs. -bob

Bob Ippolito <bob@redivi.com> writes:
Pff, Python's choice is very wimpy which means that bugs bite you earlier. The real problem is the double-duty thing others alluded to; there's no way changing the default encoding can make this go away. Cheers, mwh -- same software, different verbosity settings (this one goes to eleven) -- the effbot on the martellibot
participants (5)
-
Bob Ippolito
-
James Y Knight
-
Jeff Bowden
-
Mary Gardiner
-
Michael Hudson