Fw: PDF library for reading PDF files

Peter Galfi galfip at freestart.hu
Tue Jan 20 08:59:03 CET 2004

Thanks. I am studying the PDF spec, it just does not seem to be that easy
having to implement all the decompressions, etc. The "information" I am
trying to extract from the PDF file is the text, specifically in a way to
keep the original paragraphs of the text. I have seen so far one shareware
standalone tool that extracts the text (and a lot of other formatting
garbage) into an RTF document keeping the paragraphs as well. I would need
only the text.

Any suggestions?


----- Original Message -----
From: "Andreas Lobinger" <andreas.lobinger at netsurf.de>
Newsgroups: comp.lang.python
To: <python-list at python.org>
Sent: Monday, January 19, 2004 5:02 PM
Subject: Re: Fw: PDF library for reading PDF files


> Peter Galfi schrieb:
> I am looking for a library in Python that would read PDF files and I
> could extract information from the PDF with it. I have searched with
> google, but only found libraries that can be used to write PDF files.
> Any ideas?

Use file, split, zlib and a broad knowledge of the PDF-spec...

Accessing certain objects in the .pdf is not that complicated if
you f.e. try to read the /Info dictionary. Getting text from
actual page content could be very complicated.

Can you explain your 'information' further?

Wishing a happy day

More information about the Python-list mailing list