Where to locate existing standard encodings in python
Tim Chase
python.list at tim.thechases.com
Tue Nov 11 19:06:03 EST 2008
>> Content-Type: text/html; charset=utf-8lias
>>
>> For Python to parse this, I had to use Python's list of known encodings
>> in order to determine whether I could even parse the site (for passing
>> it to a string's .encode() method).
>
> You haven't said why you think you need a list of known encodings!
>
> I would have thought that just trying it on some dummy data will let you
> determine very quickly whether the alleged encoding is supported by the
> Python version etc that you are using.
>
> E.g.
>
> | >>> alleged_encoding = "utf-8lias"
> | >>> "any old ascii".decode(alleged_encoding)
> | Traceback (most recent call last):
> | File "<stdin>", line 1, in <module>
> | LookupError: unknown encoding: utf-8lias
I then try to remap the bogus encoding to one it seems most like
(in this case, utf-8) and retry. Having a list of encodings
allows me to either eyeball or define a heuristic to say "this is
the closest match...try this one instead". That mapping can then
be used to update a mapping file so I don't have to think about
it the next time I encounter the same bogus encoding.
-tkc
More information about the Python-list
mailing list