pdf2txt
Aurelio Martin
amartin at wpsnetwork.com
Fri May 28 03:21:07 EDT 2004
B P wrote:
> Is there a way via Python or even Perl to capture records from a pdf and
> output a delimited text file? My work has a situation with a trunk
> load of data forms that were scanned as pdfs.
>
> The data needs to be taken from the forms and moved into a database, so
> I figure that comma-delimited format will work fine. The amount of
> man-hours it would take to manually do this is very cost-prohibitive for
> what we have to work with.
>
> I know that a txt2pdf exists, was checking to see if the opposite would
> as well.
>
> BP
You may try XPDF
http://www.foolabs.com/xpdf/
They include source code and some utilities like pdfimages of pdftotext.
Maybe you can call these from Python, or link via a C extension.
Hope this helps
Aurelio
More information about the Python-list
mailing list