string processing question
norseman
norseman at hughes.net
Thu Apr 30 13:51:23 EDT 2009
Kurt Mueller wrote:
> Hi,
>
>
> on a Linux system and python 2.5.1 I have the
> following behaviour which I do not understand:
>
>
>
> case 1
>> python -c 'a="ä"; print a ; print a.center(6,"-") ; b=unicode(a, "utf8"); print b.center(6,"-")'
> ä
> --ä--
> --ä---
>
>
> case 2
> ----- an UnicodeEncodeError in this case:
>> python -c 'a="ä"; print a ; print a.center(20,"-") ; b=unicode(a, "utf8"); print b.center(20,"-")' | cat
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 9: ordinal not in range(128)
> ä
> --ä--
>
>
> The behaviour changes if I pipe the output to another prog or to a file.
> and
> centering with the string a is not correct, but with string b.
>
>
>
> Could somebody please explain this to me?
>
>
>
>
> Thanks in advance
========================================================
Let me add to the confusion:
=======================================================================
stevet:> python -c 'a="ä"; print a ; print a.center(6,"-") ;
b=unicode(a, "utf8"); print b.center(6,"-")'
ä
--ä---
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0:
unexpected end of data
stevet:>
stevet:> python -c 'a="ä"; print a ; print a.center(20,"-") ;
b=unicode(a, "utf8"); print b.center(20,"-")' | cat
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0:
unexpected end of data
ä
---------ä----------
stevet:>
stevet:> python -c 'a="ä"; print a ; print a.center(20,"-") ;
b=unicode(a, "utf8"); print b.center(20,"-")'
ä
---------ä----------
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0:
unexpected end of data
stevet:>
=======================================================================
I'm using Python 2.5.2 on Linux Slackware 10.2
Line wraps (if showing) are Email induced.
Your first line bombs for me at unicode.
a is centered (even number field len)
The second line bombs for me at unicode. (Has a pipe)
a is centered (even number field len)
The third line bombs for me at unicode. (No pipe)
a is centered (even number field len)
In no case does the 'b' print.
If I put the code into a file and python zz #(a dummy file)
I get:
File zz:
-------
a="ä"
print a
print a.center(20,"-")
b=unicode(a, "utf8")
print b.center(20,"-")
----
Output:
------
stevet:> py zz
File "zz", line 2
SyntaxError: Non-ASCII character '\xe4' in file zz on line 2, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for details
stevet:>
------
It don't like "ä"
Python is cooking print. It is disallowing the full ASCII set in the
print routine. (Yes - Yes, ASCII is 128 bytes (0->127) high bit off. But
the full set allows the high bits to be used 'undefined' AKA use at your
own risk.)
Look for special handling routines/docs whatever.
Seek 8Bit binary, 8Bit ASCII, etc..
Steve
More information about the Python-list
mailing list