Characters in Python

Paul Boddie paul at boddie.net
Fri Jun 6 10:26:37 EDT 2003


jansun at home.se (Jan Sundström) wrote in message news:<aaf09156.0306050423.21b45f6 at posting.google.com>...
> 
>    str = 'Åäö'
>    print str
> 
> Python, or perhaps IDLE rather, doesn't seem toaccept characters with 
> codes over 127. 

This reference might be interesting:

  http://mail.python.org/pipermail/patches/2002-February/007368.html

However, I see the same problem with IDLE from the ActiveState Python
2.2 distribution. Using IDLEfork instead seems to be a good solution:

  http://idlefork.sourceforge.net

It turns out that you can do exactly what you wanted in IDLEfork.

> Is there a simple way to turn off this obscession with 7-bit ASCII?

The above reference would suggest that it isn't a point of obsession
but an oversight carried over from an earlier time. Anyway, as a point
of reference it should be noted that ASCII is a 7 bit standard, if not
officially (but I think it actually is) then for most intents and
purposes you should consider it to be so. I imagine that you actually
mean, "Can I work with strings in my own encoding?" With IDLEfork, the
answer is yes.

Personally, I think that most people would be better off working with
Unicode. In IDLEfork you can actually print Unicode objects directly.
However, in console environments, Python gets rather upset at the mere
suggestion - see below for a workable approach.

> And how can one easily change what Python considers to be
> default character encoding? 
> I couldn't find anything about that in the tutorial.

To get your default encoding, at least, try this:

  import locale
  locale.setlocale(locale.LC_ALL, "")

Apparently, you should only ever do this once in your program, and you
shouldn't do it in a library. Then try this:

  encoding = locale.getlocale()[1]

Sadly, in Windows environments, this may return something like "1252"
which isn't enough for my next suggestion: to convert strings to
Unicode, do this:

  u = unicode(s, encoding) # don't use 'str' for your variable since
                           # it's a built-in

On Windows, try this kind of thing instead:

  u = unicode(s, "cp1252")

To write Unicode out to your console, try this:

  print u.encode(encoding)

This works on IDLEfork, too, so for success in both DOS boxes and
IDLEfork I'd suggest explicit encoding of Unicode objects when
printing stuff out.

Paul




More information about the Python-list mailing list