Re: [Twisted-Python] Sending unicode strings

25 Apr 2005

      On Apr 25, 2005, at 12:01 PM, Ken Kinder wrote:
...
Tommi Virtanen wrote:
...
Personally, I think ass-u-ming Unicode is encoded as UTF-8 would have
been sane, but I can understand that not everyone agrees; e.g. Java
wants UCS-16 if I remember correctly. And not serializing to UTF-8
by default catches errors that would otherwise cause mysterious things
to happen.
Most of the time, you should know the encoding. Instead of forcing the 
protocol to do the work, why not just have a way of setting the 
expected encoding for write() and similar methods? If the encoding is 
not set (ie, None), then raise the exception. Otherwise, use the 
specified encoding. This would have the added readability advantage in 
that unicode encoding -- uhh code -- wouldn't have to be sprinkled 
throughout the protocol classes -- only in places where the encoding 
is actually set -- in HTTP's headers for example.
import codecs
class MyProtocol(....):
     def __init__(self, encoding='ascii'):
         self.textwriter = codecs.getwriter(encoding)(self.transport)

     def write_text(self, s):
         self.textwriter.write(s)

     def write(self, s):
         self.transport.write(s)

This way write_text will verify that you are only sending valid strings 
in the chosen encoding.  If you call write_text() with a str then it 
will be decoded using sys.getdefaultencoding() and then encoded using 
the chosen encoding, so it really does guarantee that all strings sent 
with write_text are valid (at this level).

You should really keep separate what you're doing with raw bytes 
(write) and what you're doing with text (write_text) as they are 
different beasts.

There is no need to sprinkle this everywhere, just make it a mix-in or 
whatever and use as appropriate.

-bob

Re: [Twisted-Python] Sending unicode strings

Bob Ippolito