clean up html document created by Word

Peter Otten __peter__ at
Fri Mar 30 19:40:57 CEST 2007

jkn wrote:

> IIUC, the original poster is asking about 'cleaning up' in the sense
> of removing the swathes of unnecessary and/or redundant 'cruft' that
> Word puts in there, rather than making valid HTML out of invalid HTML.
> Again, IIUC, HTMLtidy does not do this.

>From that very page I linked to:

Tidy can now perform wonders on HTML saved from Microsoft Word 2000! Word
bulks out HTML files with stuff for round-tripping presentation between
HTML and Word. If you are more concerned about using HTML on the Web, check
out Tidy's "Word-2000" config option! Of course Tidy does a good job on
Word'97 files as well!


More information about the Python-list mailing list