Help in reading the pdf file
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Sat Mar 28 02:17:40 EDT 2009
En Thu, 26 Mar 2009 18:31:31 -0300, M Kumar <tomanishkb at gmail.com>
escribió:
> I need to read pdf files and extract data from it, is there any way to
> do it
> through python.
If you are interested in the text, I'd use ghostscript pdf2text (you may
invoke it from inside python).
Actually extracting text from a PDF is rather difficult. It's a
"presentation" format (or "display" format); every word in the document
might be absolutely positioned, there is no paragraph structure you can
rely on.
--
Gabriel Genellina
More information about the Python-list
mailing list