unicode mystery/problem

John Machin sjmachin at lexicon.net
Wed Sep 20 18:04:47 EDT 2006


Petr Jakes wrote:
> Hi,
> I am using Python 2.4.3 on Fedora Core4 and  "Eric3" Python IDE
> .
> Below mentioned code works fine in the Eric3 environment. While trying
> to start it from the command line, it returns:
>
> Traceback (most recent call last):
>   File "pokus_1.py", line 5, in ?
>     print str(a)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in
> position 6: ordinal not in range(128)

So print a works, but print str(a) crashes.

Instead, insert this:
   import sys
   print "default", sys.getdefaultencoding()
   print "stdout", sys.stdout.encoding
and run your script at the command line. It should print:
    default ascii
    stdout x
here, and crash at the later use of str(a).
Step 2: run your script under Eric3. It will print:
    default y
    stdout z
and then should work properly. It is probable that x == y == z ==
'utf-8'
Step 3: see below.

>
> ========== 8< =============
> #!/usr/bin python
> # -*- Encoding: utf_8 -*-

There is no UTF8-encoded text in this short test script. Is the above
encoding comment merely a carry-over from your real script, or do you
believe it is necessary or useful in this test script?

>
> a= u'DISKOV\xc1 POLE'
> print a
> print str(a)
> ========== 8< =============
>
> Even it looks strange, I have to use str(a) syntax even I know the "a"
> variable is a string.

Some concepts you need to understand:
(a) "a" is not a string, it is a reference to a string.
(b) It is a reference to a unicode object (an implementation of a
conceptual Unicode string) ...
(c) which must be distinguished from a str object, which represents a
conceptual string of bytes.
(d) str(a) is trying to produce a str object from a unicode object. Not
being told what encoding to use, it uses the default encoding
(typically ascii) and naturally this will crash if there are non-ascii
characters in the unicode object.

> I am trying to use ChartDirector for Python (charts for Python) and the
> method "layer.addDataSet()" needs above mentioned syntax otherwise it
> returns an Error.

Care to tell us which error???

>
> layer.addDataSet(data, colour, str(dataName))

The method presumably expects a str object (8-bit string). What does
its documentation say? Again, what error message do you get if you feed
it a unicode object with non-ascii characters?

[Step 3] For foo in set(['x', 'y', 'z']):
    Change str(dataName) to dataName.encode(foo). Change any debugging
display to use repr(a) instead of str(a). Test it with both Eric3 and
the command line.

[Aside: it's entirely possible that your problem will go away if you
remove the letter u from the line a= u'DISKOV\xc1 POLE' -- however if
you want to understand what is happening generally, I suggest you don't
do that]

HTH,
John




More information about the Python-list mailing list