Reading PDF files .
igor.stroh at wohnheim.uni-ulm.de
Tue Nov 6 22:50:04 CET 2001
On Tue, 06 Nov 2001 20:05:56 +0100, "Martin von Loewis"
<loewis at informatik.hu-berlin.de> wrote:
> Amit Weisman <weismann at netvision.net.il> writes:
>> Is there a module for reading PDF files ?
> Please have a look at www.reportlab.com
reportlab module doesn't support reading PDFs, it's rather a PDF
generator, and I don't think that Amit is willing to pay $1000,- for the
Amit, you might want to check the possibility to extract text with
pdftotext (it's distributed with the Xpdf package):
>>>from os import popen
>>>text = popen('pdftotext %s -' % <pdfFileName>).read()
'text' contains now only raw text data from specified PDF file, with a
whole bunch of control chars though... but since all you want is to look
up a word, this should be enough :)
More information about the Python-list