[Python-Dev] codecs question

Martin von Loewis loewis@informatik.hu-berlin.de
Fri, 29 Sep 2000 19:16:25 +0200 (MET DST)


>   Unfortunately, I can't see what "encoding" I should use if I want
>   to read & write Unicode string objects to it.  ;( (Marc-Andre,
>   please tell me I've missed something!)

It depends on the output you want to have. One option would be

s=codecs.lookup('unicode-escape')[3](sys.stdout)

Then, s.write(u'\251') prints a string in Python quoting notation.

Unfortunately,

print >>s,u'\251'

won't work, since print *first* tries to convert the argument to a
string, and then prints the string onto the stream.

>  On the other hand, it's annoying that I can't create a file-object
> that takes Unicode strings from "print", and doesn't seem intuitive.

Since you are asking for a hack :-) How about having an additional
letter of 'u' in the "mode" attribute of a file object?

Then, print would be

def print(stream,string):
  if type(string) == UnicodeType:
    if 'u' in stream.mode:
      stream.write(string)
      return
  stream.write(str(string))

The Stream readers and writers would then need to have a mode or 'ru'
or 'wu', respectively.

Any other protocol to signal unicode-awareness in a stream might do as
well.

Regards,
Martin

P.S. Is there some function to retrieve the UCN names from ucnhash.c?