ANN: encutils 0.4

Christof csad7 at t-online.de
Wed Aug 17 22:09:26 CEST 2005


Some basic helper functions to deal with encodings of text files (like 
HTML, XHTML, XML) via HTTP. Developed for cssutils but looked worth an 
independent release.

Download from http://cthedot.de/encutils/
Included are some unittests.


License
	Creative Commons License
	http://creativecommons.org/licenses/by/2.0/


Functions:
Note: All encodings returned are uppercase.


encodingByMediaType(media_type, log=None)

     Returns a default encoding for the given Media-Type, e.g. 'UTF-8' 
   for media-type='application/xml'. If no default encoding is available 
returns None.


getHTTPInfo(HTTPResponse, log=None)

     Returns (media_type, encoding) information from the response' 
Content-Type HTTP header (case of headers is ignored.) May be (None, 
None) e.g. if no Content-Type header is available.

getMetaInfo(text, log=None)

     Returns (media_type, encoding) information from (first) X/HTML 
Content-Type <meta> element if available.


getXMLEncoding(text, log=None)

     Parses XML declaration of a document (if present) (simplified). 
Returns (encoding, explicit).
     No autodetection of BOM is done yet. If no explicit encoding is 
found returns ('UTF-8', False).


guessEncoding(HTTPResponse, text, log=None)

     Tries to find the encoding of given text. Uses information in 
headers of supplied HTTPResponse, possible XML declaration and X/HTML 
<meta> elements.
     Returns (encoding, mismatch). Encoding is the explicit or implicit 
encoding or None and returned always uppercase. Mismatch is True if any 
mismatches between media_type, XML declaration or textcontent are found. 
More detailed mismatch reports are written to the optional log.
     Mismatches are not nessecarily errors! For details see the 
specifications..


Plan is to integrate XML autodetection (of BOM) in the next release.


I would very much welcome any feedback about spec compliance, errors or 
other problems with the functions (or the tests!).
Please use http://cthedot.de/blog/?cat=14 or http://cthedot.de/contact/.

Thanks a lot!
chris


<P><A HREF="http://cthedot.de/encutils/">encutils 0.4</A> - basic helper 
functions to deal with encodings of text files (17-Aug-05)


More information about the Python-announce-list mailing list