wxPython and Croatian characters

J. Cliff Dyer jcd at sdf.lonestar.org
Mon Feb 16 17:06:21 EST 2009


On Mon, 2009-02-16 at 20:06 +0100, Diez B. Roggisch wrote:
> vedrandekovic at gmail.com schrieb:
> > Hello,
> > 
> > I have problem with configuring my wxPython script to work with
> > Croatian characters like:  đ,š,ž,č,ć.
> > Here is my simple script without wxPython (this script works):
> > 
> >       # -*- coding: utf-8 -*-
> >       s = "hello normal string đšžćč"
> >       print s
> > 
> > ..here is my snippet with wxPython:
> > 
> >     text = wx.StaticText(self, -1,"Matični broj",(0,100)) # in this
> > example,we have character "č"
> > 
> > ...when I run this text, it looks something like:  "Mati",some weird
> > characters ,and "ni"
> 
> Unless you are using python 3.0 (which I doubt, afaik no wx available), 
> your above coding declaration is useless for the shown piece of code, as 
> it only applies to unicode literals, which are written with a preceding u.
> 

No.  The coding declaration does nothing to unicode literals.  It only
affects how the python's source code parser reads the the source code.
Without it, your source code will be parsed (or is it lexed?) by python
as an ascii document.  So if your document is UTF-8, it will choke as
soon as it reaches a non ascii character.  If it's encoded in UTF-16,
however, it will choke right away, as it will immediately come across a
\x00 byte, which is treated as the ascii NULL character, which is not
legal in python source code.

But print will still try to encode all unicode objects to the encoding
of your terminal, and any file writing operations will try to decode as
ASCII, unless explicitly told otherwise.  Same as without the -*- coding
-*- declaration. 

For the OP:  

When dealing with potentially non-ascii text, always use unicode objects
instead of strings, and explicitly encode them to the encoding you want
on output, unless your printing facility (here wx.StaticText) handles
that for you.  I also don't know how wxpython works in this regard.  

So:

s = u"Matični broj"             # instead of "Matični broj"
text = wx.StaticText(self, -1, s,(0,100))
# or if that doesn't work try this:
#text = wx.StaticText(self, -1, s.encode('utf-8'), (0,100))

Cheers,
Cliff





More information about the Python-list mailing list