[BangPypers] Retrieving images from PDFs
noufal at gmail.com
Tue Dec 29 13:23:08 CET 2009
On Tue, Dec 29, 2009 at 5:49 PM, Shashwat Anand <anand.shashwat at gmail.com>wrote:
> How can we retrieve images from PDFs. I need both images and the text
> beneath the image to form a database. I was able to parse text via PDFMiner
> but was crippled when it leads to images.
Searching my apt cache for python pdf shows a lot of libraries some of which
claim to be able to manage the entire contents of the PDF file in question.
I have also come across some tool to break a PDF down into HTML + image
files (don't remember it's name anymore) which was free software so I'm sure
More information about the BangPypers