string processing question

Fri May 1 10:02:37 EDT 2009

Sion Arrowsmith wrote:
> Kurt Mueller <mu at problemlos.ch> wrote:
>> :> python -c 'print unicode("ä", "utf8")'
>> ä
>> :> python -c 'print unicode("ä", "utf8")' | cat
>> Traceback (most recent call last):
>> File "<string>", line 1, in <module>
>> UnicodeEncodeError: 'ascii' codec can't encode characters in position
>> 0-1: ordinal not in range(128)
> $ python -c 'import sys; print sys.stdout.encoding'
> UTF-8
> $ python -c 'import sys; print sys.stdout.encoding' | cat
> None
>
> If print gets a Unicode string, it does an implicit
> .encode(sys.stdout.encoding or sys.getdefaultencoding()) on it.
> If you want your output to be guaranteed UTF-8, you'll need to
> explicitly .encode("utf8") it yourself.

This works now correct with and without piping:

python -c 'a=unicode("ä", "utf8") ; print (a.encode("utf8"))'

In my python source code I have these two lines first:
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :

So the source code itself and the strings in the source code
are interpreted as utf-8.

But from the command line python interprets the code
as 'latin_1' I presume. That is why I have to convert
the "ä" with unicode().
Am I right?

> (I dare say this is slightly different in 3.x .)
I heard about it but I wait to go to 3.x until its time to...

Thanks
-- 
Kurt Mueller

-- 
Kurt Müller, mu at problemlos.ch