Script to extract text from PDF files

Lawrence D'Oliveiro
Wed Sep 26 04:19:00 CEST 2007

In message <1190747931.415834.75670 at>, 
byte8bits wrote:

On Sep 25, 3:02 pm, Paul Hankin wrote:
Googling for 'pdf to text python' and following the first link
>> gives
Doesn't work that well...

This is inherent in the nature of PDF: it's a page-description language, not
a document-interchange language. Each text-drawing command can put a block
of text anywhere on the page, so you have no idea, just from parsing the
PDF content, how to join these blocks up into lines, paragraphs, columns

