Working with PDFs?
tim.arnold at sas.com
Tue Aug 24 18:32:31 CEST 2010
<jyoung79 at kc.rr.com> wrote in message
news:mailman.2465.1282591017.1673.python-list at python.org...
>> <jyoung79 at kc.rr.com> writes:
>>> - Pull out text from each PDF page (to search for specific words)
>>> - Combine separate pdf documents into one document
>>> - Add bookmarks (with destination settings)
>> PDF Shuffler is a Python app which does PDF merging and splitting very
>> well. I don't think it does anything else, though, but maybe that's
>> where your code comes in?
> Thank you Anssi, MRAB, Terry and Geremy for your replies. I've been
> researching the apps you have recommended. Just curious if anyone has
> used pyPdf? While testing this, it seems to work pretty well for
> combining pdf files (seems to keep the annotation notes nicely also)
> and pulling out the text contents. I'm not sure I'm going to be able
> to find anything that can add bookmarks though. If you have used pyPdf,
> would you mind sharing your thoughts about it?
I use pyPdf and I seem to remember I had to patch it so it didn't crash when
a PDF dictionary contained duplicate keys. (the part that holds the document
properties I think).
Anyway, I use the package to get info from that document properties
dictionary, page count and etc for displaying a build report to users of a
customized LaTeX system. So I'm using LaTeX to generate the PDFs and pyPDF
to glean data about the pdfs after the builds.
I'd like to be able to do more with it, like find out whether any fonts in
the doc are not embedded for example.
More information about the Python-list