Is there a way to get utf-8 out of a Unicode string?
thebjorn
BjornSteinarFjeldPettersen at gmail.com
Mon Oct 30 02:24:48 EST 2006
I've got a database (ms sqlserver) that's (way) out of my control,
where someone has stored utf-8 encoded Unicode data in regular varchar
fields, so that e.g. the string 'Blåbærsyltetøy' is in the database
as 'Bl\xc3\xa5b\xc3\xa6rsyltet\xc3\xb8y' :-/
Then I read the data out using adodbapi (which returns all strings as
Unicode) and I get u'Bl\xc3\xa5b\xc3\xa6rsyltet\xc3\xb8y'. I couldn't
find any way to get back to the original short of:
def unfk(s):
return eval(repr(s)[1:]).decode('utf-8')
i.e. chopping off the u in the repr of a unicode string, and relying on
eval to interpret the \xHH sequences.
Is there a less hack'ish way to do this?
-- bjorn
More information about the Python-list
mailing list