string processing question

Thu Apr 30 13:51:23 EDT 2009

Kurt Mueller wrote:
> Hi,
> 
> 
> on a Linux system and python 2.5.1 I have the
> following behaviour which I do not understand:
> 
> 
> 
> case 1
>> python -c 'a="ä"; print a ; print a.center(6,"-") ; b=unicode(a, "utf8"); print b.center(6,"-")'
> ä
> --ä--
> --ä---
> 
> 
> case 2
> ----- an UnicodeEncodeError in this case:
>> python -c 'a="ä"; print a ; print a.center(20,"-") ; b=unicode(a, "utf8"); print b.center(20,"-")' | cat
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 9: ordinal not in range(128)
> ä
> --ä--
> 
> 
> The behaviour changes if I pipe the output to another prog or to a file.
> and
> centering with the string a is not correct, but with string b.
> 
> 
> 
> Could somebody please explain this to me?
> 
> 
> 
> 
> Thanks in advance
========================================================

Let me add to the confusion:
=======================================================================
stevet:> python -c 'a="ä"; print a ; print a.center(6,"-") ; 
b=unicode(a, "utf8"); print b.center(6,"-")'
ä
--ä---
Traceback (most recent call last):
   File "<string>", line 1, in <module>
   File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
     return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0: 
unexpected end of data
stevet:>

stevet:> python -c 'a="ä"; print a ; print a.center(20,"-") ; 
b=unicode(a, "utf8"); print b.center(20,"-")' | cat
Traceback (most recent call last):
   File "<string>", line 1, in <module>
   File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
     return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0: 
unexpected end of data
ä
---------ä----------
stevet:>

stevet:> python -c 'a="ä"; print a ; print a.center(20,"-") ; 
b=unicode(a, "utf8"); print b.center(20,"-")'
ä
---------ä----------
Traceback (most recent call last):
   File "<string>", line 1, in <module>
   File "/usr/local/lib/python2.5/encodings/utf_8.py", line 16, in decode
     return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 0: 
unexpected end of data
stevet:>

=======================================================================

I'm using Python 2.5.2 on Linux Slackware 10.2
Line wraps (if showing) are Email induced.

Your first line bombs for me at unicode.
   a is centered (even number field len)
The second line bombs for me at unicode. (Has a pipe)
   a is centered (even number field len)
The third  line bombs for me at unicode. (No pipe)
   a is centered (even number field len)

In no case does the 'b' print.

If I put the code into a file and   python zz   #(a dummy file)
I get:

File zz:
-------
a="ä"
print a
print a.center(20,"-")
b=unicode(a, "utf8")
print b.center(20,"-")
----

Output:
------
stevet:> py zz
   File "zz", line 2
SyntaxError: Non-ASCII character '\xe4' in file zz on line 2, but no 
encoding declared; see http://www.python.org/peps/pep-0263.html for details
stevet:>
------
It don't like "ä"

Python is cooking print. It is disallowing the full ASCII set in the 
print routine. (Yes - Yes, ASCII is 128 bytes (0->127) high bit off. But 
the full set allows the high bits to be used 'undefined' AKA use at your 
own risk.)

Look for special handling routines/docs whatever.
Seek  8Bit binary,  8Bit ASCII, etc..

Steve