[BangPypers] Retrieving images from PDFs

Noufal Ibrahim noufal at gmail.com
Tue Dec 29 13:23:08 CET 2009

On Tue, Dec 29, 2009 at 5:49 PM, Shashwat Anand <anand.shashwat at gmail.com>wrote:

> How can we retrieve images from PDFs. I need both images and the text
> beneath the image to form a database. I was able to parse text via PDFMiner
> but was crippled when it leads to images.

Searching my apt cache for python pdf shows a lot of libraries some of which
claim to be able to manage the entire contents of the PDF file in question.
I have also come across some tool to break a PDF down into HTML + image
files (don't remember it's name anymore) which was free software so I'm sure
it's doable.


More information about the BangPypers mailing list