Fw: PDF library for reading PDF files
galfip at freestart.hu
Tue Jan 20 08:59:03 CET 2004
Thanks. I am studying the PDF spec, it just does not seem to be that easy
having to implement all the decompressions, etc. The "information" I am
trying to extract from the PDF file is the text, specifically in a way to
keep the original paragraphs of the text. I have seen so far one shareware
standalone tool that extracts the text (and a lot of other formatting
garbage) into an RTF document keeping the paragraphs as well. I would need
only the text.
----- Original Message -----
From: "Andreas Lobinger" <andreas.lobinger at netsurf.de>
To: <python-list at python.org>
Sent: Monday, January 19, 2004 5:02 PM
Subject: Re: Fw: PDF library for reading PDF files
> Peter Galfi schrieb:
> I am looking for a library in Python that would read PDF files and I
> could extract information from the PDF with it. I have searched with
> google, but only found libraries that can be used to write PDF files.
> Any ideas?
Use file, split, zlib and a broad knowledge of the PDF-spec...
Accessing certain objects in the .pdf is not that complicated if
you f.e. try to read the /Info dictionary. Getting text from
actual page content could be very complicated.
Can you explain your 'information' further?
Wishing a happy day
More information about the Python-list