beginner - py unicode Q
John Machin
sjmachin at lexicon.net
Sat Apr 7 22:42:46 EDT 2007
On Apr 8, 9:51 am, enquiring mind <braind... at braindead.com> wrote:
> I read the posting by Rehceb Rotkiv and response but don't know if it
> relates to my problem in any way.
>
> I only want to write German to the screen/console for little German
> programs/exercises in python. No file w/r will be used.
>
> #! /usr/bin/env python
> # -*- coding: utf-8 -*-
> # Filename: 7P07png.py
> # SUSE Linux 10 Python 2.4.1 gedit 2.12.0
>
> print 'Ich zähle zwölf weiß Hüte.'
> print 'Wollen Sie'
> verbs = ( 'kömmen' , 'essen' , 'trinken' )
> print verbs[:3]
Note: the [:3] is redundant. "print verbs" would have the same effect.
When you do print list_tuple_dict_etc, Python prints the repr() of
each element. You are seeing repr('kömmen'). This is great for
debugging, to see exactly what you've got (\xc3\xb6 is the utf8
encoding for small o with diaeresis (aka umlaut)) but no so great for
presentation to the user.
To see the difference, insert here:
for v in verbs:
print v
print str(v)
print repr(v)
>
> print ' program ends '
>
> console display is: Ich zähle zwölf weiß Hüte.
> Wollen Sie
> ('k\xc3\xb6mmen', 'essen', 'trinken')
> program ends
>
> The first 2 print statements in German print perfectly to screen/console
> but not the 3rd.
>
> I ran it with these lines below from Rehceb Rotkiv's code but it did not
> fix problem.
> import sys
> import codecs
Importing modules without using them is pointless.
>
> I also tried unicode string u'kömmen', but it did not fix problem.
> Any help/direction would be appreciated. Thanks in advance.
>
> I found this reference section but I am not sure it applies or how to
> use it to solve my problem.:
It doesn't solve your problem. Forget you ever read it.
>
> This built in setdefaultencoding(name) sets the default codec used to
> encode and decode Unicode and string objects (normally ascii)and is
> meant to be called only from sitecustomize.py at program startup; the
> site module them removes this attribute from sys. You can call
> reload(sys) to make this attriute available again but this is not a good
> programming practice.
>
> I just thought of this. I suppose because this is py source code, it
> should not be German but a reference/key to u'strings' to print German
> text to the screen?
It's "German" only to a human who reads the console output and
recognizes the bunches of characters as representing German words/
phrases/sentences. Python and your computer see only utf8 encoding
(which can be used to represent multiple languages all at once on the
same screen or in the same paragraph of a document).
Your console is quite happy rendering utf8 e.g. it printed "Ich zähle
zwölf weiß Hüte" OK, didn't it? Try this:
print "blahblah"
print u"blahblah".encode('utf8')
print u"blahblah"
and see what happens.
HTH,
John
More information about the Python-list
mailing list