A Unicode problem -HELP

Tim Roberts timr at probo.com
Wed May 17 08:12:29 CEST 2006

"manstey" <manstey at csu.edu.au> wrote:
>I have done more reading on unicode and then tried my code in IDLE
>rather than WING IDE, and discovered that it works fine in IDLE, so I
>think WING has a problem with unicode.

Rather, its output defaults to ASCII.

>So, assuming I now work in IDLE, all I want help with is how to read in
>an ascii string and convert its letters to various unicode values and
>save the resulting 'string' to a utf-8 text file. Is this clear?
>so in pseudo code
>1.  F is converted to \u0254, $ is converted to \u0283, C is converted
>to \u02A6\02C1, etc.
>(i want to do this using a dictionary TRANSLATE={'F':u'\u0254', etc)
>2. I read in a file with lines like:
>$$C$ etc
>3. I convert this to
>\u0254\u02A6\02C1\u0254 etc
>4. i save the results in a new file
>when i read the new file in a unicode editor (EmEditor), i don't see
>\u0254\u02A6\02C1\u0254, but I see the actual characters (open o, esh,
>ts digraph, modified letter reversed glottal stop, etc.

Of course.  Isn't that exactly what you wanted?  The Python string
u"\u0254" contains one character (Latin small open o).  It does NOT contain
6 characters.  If you write that to a file, that file will contain 1
character -- 2 bytes.

If you actually want the 6-character string \u0254 written to a file, then
you need to escape the \u special code:  "\\u0254".  However, I don't see
what good that would do you.  The \u escape is a Python source code thing.

>I'm sure this is straightforward but I can't get it to work.

I think it is working exactly as you want.
- Tim Roberts, timr at probo.com
  Providenza & Boekelheide, Inc.

More information about the Python-list mailing list