unicode mystery/problem
Petr Jakeš
petr.jakes at tpc.cz
Fri Sep 22 07:53:42 EDT 2006
John, thanks for your extensive answer.
>> Hi,
>> I am using Python 2.4.3 on Fedora Core4 and "Eric3" Python IDE
>> .
>> Below mentioned code works fine in the Eric3 environment. While trying
>> to start it from the command line, it returns:
>>
>> Traceback (most recent call last):
>> File "pokus_1.py", line 5, in ?
>> print str(a)
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in
>> position 6: ordinal not in range(128)
JM> So print a works, but print str(a) crashes.
JM> Instead, insert this:
JM> import sys
JM> print "default", sys.getdefaultencoding()
JM> print "stdout", sys.stdout.encoding
JM> and run your script at the command line. It should print:
JM> default ascii
JM> stdout x
**** in the command line it prints: *****
default ascii
stdout UTF-8
JM> here, and crash at the later use of str(a).
JM> Step 2: run your script under Eric3. It will print:
JM> default y
JM> stdout z
**** in the Eric3 it prints: ****
if the # -*- Eencoding: utf_8 -*- is set than:
default utf_8
stdout
unhandled AttributeError, "AsyncFile instance has no attribute
'encoding' "
if the encoding is not set than it prints:
DeprecationWarning: Non-ASCII character '\xc3' in file
/root/eric/analyza_dat_TPC/pokus_1.py on line 26, but no encoding
declared; see http://www.python.org/peps/pep-0263.html for details execfile(sys.argv[0], self.debugMod.__dict__)
default latin-1
stdout
unhandled AttributeError, "AsyncFile instance has no attribute
'encoding' "
JM> and then should work properly. It is probable that x == y == z ==
JM> 'utf-8'
JM> Step 3: see below.
>>
>> ========== 8< =============
>> #!/usr/bin python
>> # -*- Encoding: utf_8 -*-
JM> There is no UTF8-encoded text in this short test script. Is the above
JM> encoding comment merely a carry-over from your real script, or do you
JM> believe it is necessary or useful in this test script?
Generally, I am working with string like u'DISKOV\xc1 POLE' (I am
getting it from the database)
My intention to use >> # -*- Encoding: utf_8 -*- was to suppress
DeprecationWarnings if I use utf_8 in the code (like u'DISKOV\xc1 POLE')
>>
>> a= u'DISKOV\xc1 POLE'
>> print a
>> print str(a)
>> ========== 8< =============
>>
>> Even it looks strange, I have to use str(a) syntax even I know the "a"
>> variable is a string.
JM> Some concepts you need to understand:
JM> (a) "a" is not a string, it is a reference to a string.
JM> (b) It is a reference to a unicode object (an implementation of a
JM> conceptual Unicode string) ...
JM> (c) which must be distinguished from a str object, which represents a
JM> conceptual string of bytes.
JM> (d) str(a) is trying to produce a str object from a unicode object. Not
JM> being told what encoding to use, it uses the default encoding
JM> (typically ascii) and naturally this will crash if there are non-ascii
JM> characters in the unicode object.
>> I am trying to use ChartDirector for Python (charts for Python) and the
>> method "layer.addDataSet()" needs above mentioned syntax otherwise it
>> returns an Error.
JM> Care to tell us which error???
you can see the Error description and author comments here:
http://tinyurl.com/ezohe
>>
>> layer.addDataSet(data, colour, str(dataName))
I have try to experiment with the code a bit.
the simplest code where I can demonstrate my problems:
#!/usr/bin python
import sys
print "default", sys.getdefaultencoding()
print "stdout", sys.stdout.encoding
a=['P\xc5\x99\xc3\xad','Petr Jake\xc5\xa1']
b="my nice try %s" % ''.join(a).encode("utf-8")
print b
When I run it from the command line i am getting:
sys:1: DeprecationWarning: Non-ASCII character '\xc3' in file pokus_1.py on line 26, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
default ascii
stdout UTF-8
Traceback (most recent call last):
File "pokus_1.py", line 8, in ?
b="my nice try %s" % ''.join(a).encode("utf-8")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: ordinal not in range(128)
JM> The method presumably expects a str object (8-bit string). What does
JM> its documentation say? Again, what error message do you get if you feed
JM> it a unicode object with non-ascii characters?
JM> [Step 3] For foo in set(['x', 'y', 'z']):
JM> Change str(dataName) to dataName.encode(foo). Change any debugging
JM> display to use repr(a) instead of str(a). Test it with both Eric3 and
JM> the command line.
JM> [Aside: it's entirely possible that your problem will go away if you
JM> remove the letter u from the line a= u'DISKOV\xc1 POLE' -- however if
JM> you want to understand what is happening generally, I suggest you don't
JM> do that]
JM> HTH,
JM> John
More information about the Python-list
mailing list