help with (x)html / xml encoding...

Steven Taschuk staschuk at
Fri Mar 21 03:51:11 CET 2003

Quoth lt:
> i'm looking for a way to extract encoding from a file retrieved by urllib,
> i'm planning of creating a "restricted" parser which will only examine <?
> and <meta tags, to check for :
> <meta http-equiv="content-type" content="text/html; charset=xxxencodingxxx">
> or
> <?xml version="1.0" encoding="'xxxencodingxxx'"?>
> do you think that is enough ? how should you do it ?

You should also check the data in urlopen(foo).info() for a
Content-Type header; the value of that header is supposed
to take precedence over either of the above.

Steven Taschuk                  staschuk at
"Telekinesis would be worth patenting."  -- James Gleick

More information about the Python-list mailing list