string processing question
mu at problemlos.ch
Fri May 1 10:02:37 EDT 2009
Sion Arrowsmith wrote:
> Kurt Mueller <mu at problemlos.ch> wrote:
>> :> python -c 'print unicode("ä", "utf8")'
>> :> python -c 'print unicode("ä", "utf8")' | cat
>> Traceback (most recent call last):
>> File "<string>", line 1, in <module>
>> UnicodeEncodeError: 'ascii' codec can't encode characters in position
>> 0-1: ordinal not in range(128)
> $ python -c 'import sys; print sys.stdout.encoding'
> $ python -c 'import sys; print sys.stdout.encoding' | cat
> If print gets a Unicode string, it does an implicit
> .encode(sys.stdout.encoding or sys.getdefaultencoding()) on it.
> If you want your output to be guaranteed UTF-8, you'll need to
> explicitly .encode("utf8") it yourself.
This works now correct with and without piping:
python -c 'a=unicode("ä", "utf8") ; print (a.encode("utf8"))'
In my python source code I have these two lines first:
# vim: set fileencoding=utf-8 :
So the source code itself and the strings in the source code
are interpreted as utf-8.
But from the command line python interprets the code
as 'latin_1' I presume. That is why I have to convert
the "ä" with unicode().
Am I right?
> (I dare say this is slightly different in 3.x .)
I heard about it but I wait to go to 3.x until its time to...
Kurt Müller, mu at problemlos.ch
More information about the Python-list