Convert a list with wrong encoding to utf8
Gregory Ewing
greg.ewing at canterbury.ac.nz
Fri Feb 15 02:27:34 EST 2019
vergos.nikolas at gmail.com wrote:
> I just tried:
>
> names = tuple( [s.encode('latin1').decode('utf8') for s in names] )
>
> but i get
> UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)')
This suggests that the string you're getting from the database *has*
already been correctly decoded, and there is no need to go through the
latin1 re-coding step.
What do you get if you do
print(names)
immediately *before* trying to re-code them?
What *may* be happening is that most of your data is stored in the
database encoded as utf-8, but some of it is actually using a different
encoding, and you're getting confused by the resulting inconsistencies.
I suggest you look carefully at *all* the names in the list, straight
after getting them from the database. If some of them look okay and
some of them look like mojibake, then you have bad data in the database
in the form of inconsistent encodings.
--
Greg
More information about the Python-list
mailing list