[Tutor] extracting informations (images and text) from a PDF and creating a database from it

Tue Dec 29 08:33:29 CET 2009

I need to make a database from some PDFs. I need to extract logos as well as
the information (i.e. name,address) beneath the logo and fill it up in
database. The logo can be text as well as picture as shown in two of the
screenshots of one of the sample PDF file:
http://imagebin.org/77378
http://imagebin.org/77379
Will converting to html  a good option? Later on I need to apply some image
processing too. What should be the ideal way towards it ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20091229/586bca21/attachment.htm>