Some basic helper functions to deal with encodings of text files (like HTML, XHTML, XML) via HTTP. Developed for cssutils but looked worth an independent release. Download from http://cthedot.de/encutils/ Included are some unittests. License Creative Commons License http://creativecommons.org/licenses/by/2.0/ Functions: Note: All encodings returned are uppercase. encodingByMediaType(media_type, log=None) Returns a default encoding for the given Media-Type, e.g. 'UTF-8' for media-type='application/xml'. If no default encoding is available returns None. getHTTPInfo(HTTPResponse, log=None) Returns (media_type, encoding) information from the response' Content-Type HTTP header (case of headers is ignored.) May be (None, None) e.g. if no Content-Type header is available. getMetaInfo(text, log=None) Returns (media_type, encoding) information from (first) X/HTML Content-Type <meta> element if available. getXMLEncoding(text, log=None) Parses XML declaration of a document (if present) (simplified). Returns (encoding, explicit). No autodetection of BOM is done yet. If no explicit encoding is found returns ('UTF-8', False). guessEncoding(HTTPResponse, text, log=None) Tries to find the encoding of given text. Uses information in headers of supplied HTTPResponse, possible XML declaration and X/HTML <meta> elements. Returns (encoding, mismatch). Encoding is the explicit or implicit encoding or None and returned always uppercase. Mismatch is True if any mismatches between media_type, XML declaration or textcontent are found. More detailed mismatch reports are written to the optional log. Mismatches are not nessecarily errors! For details see the specifications.. Plan is to integrate XML autodetection (of BOM) in the next release. I would very much welcome any feedback about spec compliance, errors or other problems with the functions (or the tests!). Please use http://cthedot.de/blog/?cat=14 or http://cthedot.de/contact/. Thanks a lot! chris <P><A HREF="http://cthedot.de/encutils/">encutils 0.4</A> - basic helper functions to deal with encodings of text files (17-Aug-05)
participants (1)
-
Christof