clean up html document created by Word
Claudio Grondi
claudio.grondi at freenet.de
Fri Mar 30 19:53:18 EDT 2007
jd wrote:
> I am looking for python code (working or sample code) that can take an
> html document created by Microsoft Word and clean it up (if you've
> never had to look at a Word-generated html document, consider yourself
> lucky ;-) Alternatively, if you know of a non-python solution, I'd
> like to hear about it.
>
> Thanks...
>
> -- jeff
>
There is a Microsoft add-on for Word which helps to reduce the mess
called 'HTML filter'. Go for it here:
http://www.microsoft.com/downloads/details.aspx?FamilyID=209ADBEE-3FBD-482C-83B0-96FB79B74DED&displaylang=EN
run it and then use afterwards the other in this thread suggested
'cleaning' methods.
Claudio
More information about the Python-list
mailing list