[Tutor] Encode problem

Kent Johnson kent37 at tds.net
Mon May 4 21:08:55 CEST 2009


On Mon, May 4, 2009 at 10:09 AM, Pablo P. F. de Faria
<pablofaria at gmail.com> wrote:
> Thanks, Kent, but that doesn't solve my problem. In fact, I need
> ConfigParser to work with non-ascii characters, since my App may run
> in "latin-1" environments (folders e files names).

Yes, I understand that.

Python has two different kinds of strings - byte strings, which are
instances of class str,  and unicode strings, which are instances of
class unicode. String objects are byte strings - sequences of bytes.
They are not limited to ascii characters, they hold encoded strings in
any supported encoding. In particular, UTF-8 data is stored in string
objects.

Unicode objects hold "unencoded" unicode data. (I know, Unicode is an
encoding, but it is useful to think of it this way in this context.)

str.decode() converts a string to a unicode object. unicode.encode()
converts a unicode object to a (byte) string. Both of these functions
take the encoding as a parameter. When Python is given a string, but
it needs a unicode object, or vice-versa, it will encode or decode as
needed. The encode or decode will use the system default encoding,
which as you have discovered is ascii. If the data being encoded or
decoded contains non-ascii characters, you get an error that you are
familiar with. These errors indicate that you are not correctly
handling encoded data.

See the references at the end of this essay for more background information:
http://personalpages.tds.net/~kent37/stories/00018.html

> I must find out why
> the str() function in the module ConfigParser doesn't use the encoding
> defined for the application (# -*- coding: utf-8 -*-).

Because the encoding declaration doesn't define an encoding for the
application. It defines the encoding of the text of the source file
containing the declaration, that's all.

> The rest of the
> application works properly with utf-8, except for ConfigParser.

I guess you have been lucky.

> What I
> found out is that ConfigParser seems to make use of the configuration
> in Site.py (which is set to 'ascii'), instead of the configuration
> defined for the App (if I change . But this is very problematic to
> have to change Site.py in every computer... So I wonder if there is a
> way to replace the settings in Site.py only for my App.

It is the wrong solution. What you should do is
- understand why you have a problem. Hint: it's not a ConfigParser bug
- give only utf-8-encoded strings to ConfigParser
- don't use the codecs module, because the data you are writing will
already be encoded.

Kent


More information about the Tutor mailing list