igor.stroh at wohnheim.uni-ulm.de
Mon Nov 5 23:14:29 CET 2001
On Mon, 05 Nov 2001 22:52:57 +0100, "Bruno Liénard"
<lienard.bruno at free.fr> wrote:
> I had written a script some time ago to extract directly from PDF file,
> it's quite easy . As I had a very large volume of text to extract (some
> giga of text), I now use PDFTOTEXT which comes with XPDF. I slighly
> modify for my needs. If you are interested, I will look for the script
> in my archives
I'd greatly appreciate it :)
See, I can't use pdftotext since I have several thousands of PDFs to be
processed in a short amount of time... I think invoking pdftotext for each
file would be pretty slow... by the way, the pdf files are _not_ in the
filesystem, the whole stuff is located in a DB (ZopeDB), so I have some
kind of data objects rather then real files...
More information about the Python-list