is there a library/program that converts sqlite database from windows-1252 to utf-8 ?

Robert Kern robert.kern at gmail.com
Sun Sep 12 00:07:52 CEST 2010


On 9/11/10 4:45 PM, Stef Mientki wrote:
>   On 11-09-2010 21:11, Robert Kern wrote:
>> SQLite internally stores its strings as UTF-8 or UTF-16 encoded Unicode. So it's not clear what
>> you mean when you say the database is "windows-1252". Can you be more specific?
> I doubt that, but I'm not sure ...

 From the documentation, it looks like SQLite does not attempt to validate the 
input as UTF-8 encoded, so it is possible that someone pushed in raw bytes. See 
"Support for UTF-8 and UTF-16" in the following page:

   http://www.sqlite.org/version3.html

> For some databases written by other programs and
> written with Python, with
>      cursor = self.conn.cursor ()
>      self.conn.text_factory = str
>
> Can only be read back with with text_factory = str
> then the resulting string columns contains normal strings with windows 1252 coding, like character 0xC3

You can probably use

   self.conn.text_factory = lambda x: x.decode('windows-1252')

to read the data, though I've never tried to use that API myself.

You will need to write a program yourself that opens one connection to your 
existing database for reading and another connection to another database (using 
the defaults) for writing. Then iterate over your tables and copy data from one 
database to the other.

You may also be able to simply dump the database to a text file using "sqlite3 
bad-database.db .dump > bad-sql.sql", read the text file into Python as a 
string, decode it from windows-1252 to unicode and then encode it as utf-8 and 
write it back out. Then use "sqlite3 good-database.db .read good-sql.sql" to 
create the new database. I've never tried such a thing, so it may not work.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco




More information about the Python-list mailing list