Fw: PDF library for reading PDF files
Peter Galfi
galfip at freestart.hu
Tue Jan 20 02:59:03 EST 2004
Thanks. I am studying the PDF spec, it just does not seem to be that easy
having to implement all the decompressions, etc. The "information" I am
trying to extract from the PDF file is the text, specifically in a way to
keep the original paragraphs of the text. I have seen so far one shareware
standalone tool that extracts the text (and a lot of other formatting
garbage) into an RTF document keeping the paragraphs as well. I would need
only the text.
Any suggestions?
Peter
----- Original Message -----
From: "Andreas Lobinger" <andreas.lobinger at netsurf.de>
Newsgroups: comp.lang.python
To: <python-list at python.org>
Sent: Monday, January 19, 2004 5:02 PM
Subject: Re: Fw: PDF library for reading PDF files
Aloha,
> Peter Galfi schrieb:
> I am looking for a library in Python that would read PDF files and I
> could extract information from the PDF with it. I have searched with
> google, but only found libraries that can be used to write PDF files.
> Any ideas?
Use file, split, zlib and a broad knowledge of the PDF-spec...
Accessing certain objects in the .pdf is not that complicated if
you f.e. try to read the /Info dictionary. Getting text from
actual page content could be very complicated.
Can you explain your 'information' further?
Wishing a happy day
LOBI
--
http://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list