Looking Python script to compare two files
tim.golden at viacom-outdoor.co.uk
Wed Nov 9 10:19:24 CET 2005
> I want to compare two PDF or WORD files.
Could you be more precise, please?
+ Do you only want to compare PDF-PDF or Word-Word? Or do
you want to be able to do PDF-Word?
+ In either case, are you only bothered about the text, or
is the formatting significant?
+ If it's only text, then use whatever method you want to
extract the text (antiword, ghostscript, COM automation,
xpdf, etc.) and then use the difflib module, or some external
+ If you want a structure/format comparison, you're into quite
difficult territory, I believe. It's easy enough to convert a
Word Doc to PDF if that were needed but PDFs are notoriously
difficult to disentangle, altho' relatively straightforward to
build. There's pdftools
which I can't say I've tried, but even once you've got the document
object into Python, I don't imagine it'll be easy to compare.
+ To do Word-Word comparison, there's more hope on the horizon
(if that's the metaphor I want). Word has built-in comparison
functionality, and recent versions of TortoiseSVN, for example
include a script which will automate Word to do the right thing.
Which is, essentially, one doc, and call its .Compare method
against the other.
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
More information about the Python-list