Changing filenames from Greeklish => Greek (subprocess complain)
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sun Jun 9 09:13:39 EDT 2013
On Sun, 09 Jun 2013 02:38:13 -0700, Νικόλαος Κούρας wrote:
> s = 'α'
> s = s.encode('iso-8859-7').decode('utf-8')
>
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 0:
> unexpected end of data
>
> Why this error? because 'a' ordinal value > 127 ?
Look at it this way... consider encoding and decoding to be like
translating from one language to another.
Suppose you start with the English word "street". You encode it to German
by looking it up in an English-To-German dictionary:
street -> Straße
The you decode the German by looking "Straße" up in a German-To-English
dictionary:
Straße -> street
and everything is good. But suppose that after encoding the English to
German, you get confused, and think that it is Italian, not German. So
when it comes to decoding, you try to look up 'Staße' in an Italian-To-
English dictionary, and discover that there is no such thing as letter ß
in Italian. So you cannot look the word up, and you get frustrated and
shout "this is rubbish, there's no such thing as ß, that's not a letter!"
Not in Italian, but it is a perfectly good letter in German. But you're
looking it up in the wrong dictionary.
Same thing with UTF-8. You encoded the string 'α' by looking it up in the
"Unicode To ISO-8859-7 bytes" dictionary. Then you try to decode it by
looking for those bytes in the "UTF-8 bytes To Unicode" dictionary. But
you can't find byte 0xe1 on its own in UTF-8 bytes, so Python shouts
"this is rubbish, there's no such thing as 0xe1 on its own in UTF-8!" and
raises UnicodeDecodeError.
Sometimes you don't get an exception. Suppose that you are encoding from
French to German:
qui -> die (both words mean "who" in English)
Now if you get confused, and decode the word 'die' by looking it up in an
English-To-French dictionary, instead of German-To-French, you get:
die -> mourir
So instead of getting 'qui' back again, you get 'mourir'. This is like
mojibake: the results are garbage, but there is no exception raised to
warn you.
--
Steven
More information about the Python-list
mailing list