Python 3.x stuffing utf-8 into SQLite db
mm0fmf
none at mailinator.com
Mon Feb 9 14:41:20 EST 2015
On 09/02/2015 03:44, Skip Montanaro wrote:
> I am trying to process a CSV file using Python 3.5 (CPython tip as of a
> week or so ago). According to chardet[1], the file is encoded as utf-8:
>
> >>> s = open("data/meets-usms.csv", "rb").read()
> >>> len(s)
> 562272
> >>> import chardet
> >>> chardet.detect(s)
> {'encoding': 'utf-8', 'confidence': 0.99}
>
> so I created the reader like so:
>
> rdr = csv.DictReader(open(csvfile, encoding="utf-8"))
>
> This seems to work. The rows are read and records added to a SQLite3
> database. When I go into sqlite3, I get what looks to be raw utf-8 on
> output:
>
> % LANG=en_US.UTF-8 sqlite3 topten.db
> SQLite version 3.8.5 2014-08-15 22:37:57
> Enter ".help" for usage hints.
> sqlite> select * from swimmeet where meetname like '%Barracuda%';
> sqlite> select count(*) from swimmeet;
> 0
> sqlite> select count(*) from swimmeet;
> 4171
> sqlite> select meetname from swimmeet where meetname like
> '%Barracuda%Patrick%';
> Anderson Barracudas St. Patrick's Day Swim Meet
> Anderson Barracuda Masters - 2010 St. Patrick’s Day Swim Meet
> Anderson Barracuda Masters 2011 St. Patrick’s Day Swim Meet
> Anderson Barracuda Masters St. Patrick's Day Meet
> Anderson Barracuda Masters St. Patrick's Day Meet 2014
> Anderson Barracuda Masters 2015 St. Patrick’s Day Swim Meet
>
How is meetname defined? Is it a varchar or nvarchar?
My only experience is with MS-SQL and C# but reading from a utf-8
encoded file with a StreamReader set to utf-8 and trying to insert that
into varchar fields results in similar issues to what you are showing. I
changed to using nvarchar and it all start working as expected.
More information about the Python-list
mailing list