[Python-Dev] Re: [Patches] Patch for xmlrpc encoding

Martin v. Löwis Martin v. Löwis
Tue, 10 Dec 2002 09:56:17 +0100


> > There isn't a reason to assume that strings use any other encoding,
> > either.
>
> Isn't that what sys.{get,set}defaultencoding is for?

No. If it was for me, those functions would not be there in the first
place, and the Python system encoding would be ASCII. I consider it an
advantage that setdefaultencoding isn't really there.

Byte strings are used in many places in Python, e.g. when writing to
files. You can't possibly know the encoding of all files that you may
encounter in advance, so the notion of a single default encoding is too
limited in practice - as you have just discovered.

The purpose of the default encoding is to define automatic coercion
between byte strings and Unicode strings, e.g. when comparing or
concatenating them. This coercion is necessary primarily because of
string literals; if they were Unicode strings instead of byte strings,
coercion and the default encoding could be deprecated.

> So perhaps it would be better to change the postgresql-modules to
return
> unicode-objects instead of strings?

I'm without context here to answer the question. If you are talking
about a character string type, and if postgresql has a well-defined
notion of what the encoding of character strings is, then yes (except
for backwards compatibility concerns). If you are talking about BLOBs,
then no: these are byte strings, and don't have any inherent encoding.

I don't know how SQL deals with non-ASCII strings. If (as I assume)
there are no clear specifications, it might be best to introduce the
notion of a "table encoding" or "driver encoding" in the DB ABI. If this
is not set, the driver would return byte strings. If it is set, the
driver would return Unicode strings, decoding the data it receives from
the underlying database appropriately.

I don't use relational database or SQL regularly, so take this advise
with a grain of salt.

My 0.02 €,
Martin