Reading PDF files .

Igor Stroh igor.stroh at
Tue Nov 6 22:50:04 CET 2001

On Tue, 06 Nov 2001 20:05:56 +0100, "Martin von Loewis"
<loewis at> wrote:

> Amit Weisman <weismann at> writes:
>> Is there a module for reading PDF files ?
> Please have a look at

reportlab module doesn't support reading PDFs, it's rather a PDF
generator, and I don't think that Amit is willing to pay $1000,- for the
PageCatcher :)

Amit, you might want to check the possibility to extract text with
pdftotext (it's distributed with the Xpdf package):

>>>from os import popen
>>>text = popen('pdftotext %s -' % <pdfFileName>).read()

'text' contains now only raw text data from specified PDF file, with a
whole bunch of control chars though... but since all you want is to look
up a word, this should be enough :)


More information about the Python-list mailing list