Unicode
Leam Hall
leamhall at gmail.com
Sun Sep 17 08:44:24 EDT 2017
On 09/17/2017 08:30 AM, Chris Angelico wrote:
> On Sun, Sep 17, 2017 at 9:38 PM, Leam Hall <leamhall at gmail.com> wrote:
>> Still trying to keep this Py2 and Py3 compatible.
>>
>> The Py2 error is:
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6'
>> in position 8: ordinal not in range(128)
>>
>> even when the string is manually converted:
>> name = unicode(self.name)
>>
>> Same sort of issue with:
>> name = self.name.decode('utf-8')
>>
>>
>> Py3 doesn't like either version.
>
> You got a Unicode *EN*code error when you tried to *DE* code. That's a
> quirk of Py2's coercion behaviours, so the error's a bit obscure, but
> it means that you (most likely) actually have a Unicode string
> already. Check what type(self.name) is, and see if the problem is
> actually somewhere else.
>
> (It's hard to give more specific advice based on this tiny snippet, sorry.)
>
> ChrisA
>
Chris, thanks! I see what you mean.
The string source is a SQLite3 database with a bunch of names. Some have
non-ASCII characters. The database is using varchar which seems to be
utf-8, utf-16be or utf-16le. I probably need to purge the data.
What I find interesting is that utf-8 works in the Ruby script that
pulls from the same database. That's what makes me think it's utf-8.
I've tried different things in lines 45 and 61.
https://gist.github.com/LeamHall/054f9915af17dc1b1a33597b9b45d2da
Leam
More information about the Python-list
mailing list