ANN: entuils 0.2
csad7 at t-online.de
Sun Jul 3 14:56:22 CEST 2005
Some basic helper functions to deal with encodings of files retrieved
Download from http://cthedot.de/encutils/
Changes in 0.2:
Mainly some documentation and internal name changes, some parameter
names have changed as well.
Currently contained functions:
Returns a default encoding for the given Media-Type, e.g. 'utf-8'
Returns (media_type, encoding) information from the Content-Type
HTTP header from a HTTP header dictionary. May be (None, None) e.g. if
no Content-Type header is available.
XML documents have (RFC3023) a default encoding for various
media-types if no explicit charset information is given, which may be
"ascii" or "utf-8", see "encodingByMediaType".
HTML documents have no default encoding.
Returns (media_type, encoding) information from (last) X/HTML
Content-Type meta element.
guessEncoding(httpheaders, text, log=None)
Tries to find encoding of given text and uses information in
httpheaders and textcontent like HTML meta elements or the XML
declaration (this is not implemented yet). Returns the explicit or
implicit encoding or None. Mismatch reports are written to the log.
If there is a similar thing out please let me know (I know the Cookbook
XML autodetection script which I'd like to integrate in a future version).
And I would very much appreciate any feedback about spec compliance,
errors or other problems with the functions too. (Please use
Thanks a lot!
More information about the Python-announce-list