Unicode (UTF8) in dbhas on 2.5

Paul Boddie paul at boddie.org.uk
Wed Oct 22 12:36:43 CEST 2008

On 21 Okt, 22:39, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
> It's not possible to "fix" this - it isn't even broken. The *db modules,
> by design, support storing of arbitrary bytes, not just character data.
> You can put images into them, or sound files, java byte code files, etc.
> So if Python would assume they have to be UTF-8 encoded character
> strings, it would severely limit the usability of these modules.

If the inquirer was aware of the Unicode/UTF-8 distinction, then he
apparently wanted a conversion from Unicode to UTF-8 for the purpose
of storing text in the database. I don't really see a problem with a
module like this handling Unicode values in a reasonable fashion
whilst letting the user supply plain/byte strings if they also want to
do so, except perhaps for the issue of whether retrieved values should
be Unicode or something else, how the user gets to override the
default behaviour, and how this fits in with the existing API. Various
DB-API modules support Unicode, so this isn't a completely new
phenomenon, and a connection parameter for alternative encodings would
be adequate if people wanted to use something other than UTF-8 to
represent textual values within the database.


More information about the Python-list mailing list