Script to extract text from PDF files

Lawrence D'Oliveiro ldo at geek-central.gen.new_zealand
Wed Sep 26 04:19:00 CEST 2007

In message <1190747931.415834.75670 at>, 
byte8bits at wrote:

> On Sep 25, 3:02 pm, Paul Hankin <paul.han... at> wrote:
>> Googling for 'pdf to text python' and following the first link
>> gives
> Doesn't work that well...

This is inherent in the nature of PDF: it's a page-description language, not
a document-interchange language. Each text-drawing command can put a block
of text anywhere on the page, so you have no idea, just from parsing the
PDF content, how to join these blocks up into lines, paragraphs, columns

More information about the Python-list mailing list