sjmachin at lexicon.net
Mon Jul 11 02:29:40 CEST 2005
Ivan Van Laningham wrote:
> It seems to me that if I want to try to read an unknown file
> using an exhaustive list of possible encodings ...
Supposing such a list existed:
What do you mean by "unknown file"? That the encoding is unknown?
You are going to try to decode the file from "legacy" to Unicode --
until the first 'success' (defined how?)? But the file could be decoded
by *several* codecs into Unicode without an exception being raised. Just
a simple example: the encodings ['iso-8859-' + x for x in '12459']
define *all* possible 256 characters.
There are various language-guessing algorithms based on e.g. frequency
of ngrams ... try Google.
You "know" the file is in a Unicode-encoding e.g. utf-8, have
successfully decoded it to Unicode, and are going to try to encode the
file in a "legacy" encoding but you don't know which one is appropriate?
Sorry, same "But".
More information about the Python-list