Looking Python script to compare two files

David Boddie davidb at mcs.st-and.ac.uk
Thu Nov 10 12:43:17 CET 2005

Tim Golden wrote:

> + PDF: David Boddie's pdftools looks like about the only possibility:
> (ducks as a thousand people jump on him and point out the alternatives)

I might as well do that! Here are a couple of alternatives:


Both of these are arguably more "Pythonic" than my solution, and
the first is also able to write out modified files.

Cameron Laird also maintains a page about PDF conversion tools:


> http://www.boddie.org.uk/david/Projects/Python/pdftools/
> Something like this might do the business. I'm afraid I've
> no idea how to determine where the line-breaks are. This
> was the first time I'd used pdftools, and the fact that
> I could do this much is a credit to its usability!

Thanks for the compliment! The read_text method in the PDFContents
class also lets you extract text from a given page in a document, but
you have to remember that text in PDF files isn't always composed as
a series of lines or paragraphs, and often doesn't even contain
whitespace characters.


More information about the Python-list mailing list