is there a library/program that converts sqlite database from windows-1252 to utf-8 ?
robert.kern at gmail.com
Sun Sep 12 00:07:52 CEST 2010
On 9/11/10 4:45 PM, Stef Mientki wrote:
> On 11-09-2010 21:11, Robert Kern wrote:
>> SQLite internally stores its strings as UTF-8 or UTF-16 encoded Unicode. So it's not clear what
>> you mean when you say the database is "windows-1252". Can you be more specific?
> I doubt that, but I'm not sure ...
From the documentation, it looks like SQLite does not attempt to validate the
input as UTF-8 encoded, so it is possible that someone pushed in raw bytes. See
"Support for UTF-8 and UTF-16" in the following page:
> For some databases written by other programs and
> written with Python, with
> cursor = self.conn.cursor ()
> self.conn.text_factory = str
> Can only be read back with with text_factory = str
> then the resulting string columns contains normal strings with windows 1252 coding, like character 0xC3
You can probably use
self.conn.text_factory = lambda x: x.decode('windows-1252')
to read the data, though I've never tried to use that API myself.
You will need to write a program yourself that opens one connection to your
existing database for reading and another connection to another database (using
the defaults) for writing. Then iterate over your tables and copy data from one
database to the other.
You may also be able to simply dump the database to a text file using "sqlite3
bad-database.db .dump > bad-sql.sql", read the text file into Python as a
string, decode it from windows-1252 to unicode and then encode it as utf-8 and
write it back out. Then use "sqlite3 good-database.db .read good-sql.sql" to
create the new database. I've never tried such a thing, so it may not work.
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list