clean up html document created by Word

Claudio Grondi claudio.grondi at
Sat Mar 31 01:53:18 CEST 2007

jd wrote:
> I am looking for python code (working or sample code) that can take an
> html document created by Microsoft Word and clean it up (if you've
> never had to look at a Word-generated html document, consider yourself
> lucky ;-)  Alternatively, if you know of a non-python solution, I'd
> like to hear about it.
> Thanks...
> -- jeff
There is a Microsoft add-on for Word which helps to reduce the mess 
called 'HTML filter'. Go for it here:

run it and then use afterwards the other in this thread suggested 
'cleaning' methods.


More information about the Python-list mailing list