Looking Python script to compare two files
Tim Golden
tim.golden at viacom-outdoor.co.uk
Wed Nov 9 04:19:24 EST 2005
[yys2000]
> I want to compare two PDF or WORD files.
Could you be more precise, please?
+ Do you only want to compare PDF-PDF or Word-Word? Or do
you want to be able to do PDF-Word?
+ In either case, are you only bothered about the text, or
is the formatting significant?
+ If it's only text, then use whatever method you want to
extract the text (antiword, ghostscript, COM automation,
xpdf, etc.) and then use the difflib module, or some external
diff tool.
+ If you want a structure/format comparison, you're into quite
difficult territory, I believe. It's easy enough to convert a
Word Doc to PDF if that were needed but PDFs are notoriously
difficult to disentangle, altho' relatively straightforward to
build. There's pdftools
(http://www.boddie.org.uk/david/Projects/Python/pdftools/)
which I can't say I've tried, but even once you've got the document
object into Python, I don't imagine it'll be easy to compare.
+ To do Word-Word comparison, there's more hope on the horizon
(if that's the metaphor I want). Word has built-in comparison
functionality, and recent versions of TortoiseSVN, for example
include a script which will automate Word to do the right thing.
Which is, essentially, one doc, and call its .Compare method
against the other.
TJG
________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
More information about the Python-list
mailing list