unicode confusing

Paul Boddie paul at boddie.org.uk
Tue May 26 04:29:28 EDT 2009


On 26 Mai, 10:09, Pet <petshm... at googlemail.com> wrote:
>
> After some time, I've tried, to convert result with unicode(result,
> 'ISO-8859-15') and that was it :)

I haven't really investigated having unicode_results set to false (or
the default) with a database containing UTF-8 (or any non-ASCII
encoded) text, since it's always desirable to manipulate Unicode
internally in one's programs: I don't want plain strings containing
various encoded sequences of bytes when I'm dealing with characters.
That said, if one were consuming XML/HTML and then putting it in raw
form into a database (including the tags), I could understand that
Unicode objects might then seem like a distraction.

> I've thought it was already utf-8, because of charset defining in
> <meta> of webpage I'm fetching

There are lots of caveats about Web page encodings - which metadata
actually indicates the encoding - but I still regard the best approach
to involve converting text to Unicode as soon as possible, then
presenting Unicode objects to the database. This way, you can separate
the decisions about which encodings the Web pages are using and which
encoding the database is using.

Paul



More information about the Python-list mailing list