Reading Adobe PDF File

Adam Tauno Williams awilliam at
Mon Jan 30 08:22:13 EST 2012

On Sat, 2012-01-28 at 21:59 -0800, Chris Rebert wrote:
> On Sat, Jan 28, 2012 at 9:52 PM, Shrewd Investor <cltung at> wrote:
> > I have a very large Adobe PDF file.  I was hoping to use a script to
> > extract the information for it.  Is there a way to loop through a PDF
> > file using Python?
> Haven't used it myself, but:

It is very prone to hanging and/or crashing.  I haven't yet found a
really reliably way to read text from a PDF.

PyPDF provides a PdfFileReader class with an extractText method.  The
output is indeed the text although it can be a bit thorny to look at.

> > Or do I need to find a way to convert a PDF file into a text file?  If
> > so how?
> The script from the same package happens to do exactly this.

System & Network Administrator [ LPI & NCLA ]
OpenGroupware Developer <>
Adam Tauno Williams

More information about the Python-list mailing list