Encoding sniffer?

Diez B. Roggisch deets at nospam.web.de
Thu Jan 5 15:56:43 EST 2006


> print try_encodings(text, ['ascii', 'utf-8', 'iso8859_1', 'cp1252', 'macroman']

I've fallen into that trap before - it won't work after the iso8859_1. 
The reason is that an eight-bit encoding have all 256 code-points 
assigned (usually, there are exceptions but you have to be lucky to have 
a string that contains a value not assigned in one of them - which is 
highly unlikely)

AFAIK iso-8859-1 has all codepoints taken - so you won't go beyond that 
in your example.


Regards,

Diez



More information about the Python-list mailing list