[Tutor] Using special characters in Python

Magnus Lycka magnus@thinkware.se
Fri Feb 21 23:19:01 2003


At Fri, 21 Feb 2003 19:08:01 +0100, Ole Jensen wrote:
>as from the unicode.org page i've found out that the =E5 is written like =
>this: "00E5" (without the qoutes) in unicode, BUT when trying to do =
>create a string to that, a la:=20
>http://www.python.org/doc/2.2p1/ref/strings.html
>
> >>> s =3D "u\00E5"
> >>> print s
>u

I think you need to read more carefully. You want:

print u"\u00E5"

...
stringliteral   ::=   [stringprefix](shortstring | longstring)
stringprefix  ::=  "r" | "u" | "ur" | "R" | "U" | "UR" | "Ur" | "uR"
...
A prefix of 'u' or 'U' makes the string a Unicode string.
...
\uxxxx  Character with 16-bit hex value xxxx (Unicode only)
\Uxxxxxxxx Character with 32-bit hex value xxxxxxxx (Unicode only)
...

See also http://www.reportlab.com/i18n/python_unicode_tutorial.html
(this is a bit old, you don't need to get Python from CVS of course.)

I didn't follow this thread thorougly, but Latin1 is the default
encoding in Python, so, for western European encodings, we don't
usually need Unicode.

Also, have a look at the locale module. You typically use it
like this:

import locale
locale.setlocale(locale.LC_ALL, '')

Then sorting etc will behave as per your locale. As far as I
understand, the only European languages that behave exactly
the same as English when it comes to sorting etc are Portugese
and Italian, so it's not just in Scandinavia we are different...

Finally, if you want to print to a DOS-box in windows, you need
to know how to handle code pages:

print stringWithNtnlChars.decode('latin1').encode('cp437')

Sometimes you need to do vice versa:

import os
x = os.popen('DIR').read()
print x.decode('cp437').encode('latin1')

Actually, if you are printing in a Unicode aware environment,
you only need:

import os
x = os.popen('DIR').read()
print x.decode('cp437')

but if you are for instance writing to a web page, Latin1 might
be better.


-- 
Magnus Lycka, Thinkware AB
Alvans vag 99, SE-907 50 UMEA, SWEDEN
phone: int+46 70 582 80 65, fax: int+46 70 612 80 65
http://www.thinkware.se/  mailto:magnus@thinkware.se