[Half-off] How to get textboxes (text blocks) from ps/pdf files?
durumdara
durumdara at gmail.com
Wed Jan 3 07:30:22 EST 2007
Hi!
I need to get textboxes/textblocks from pdf files. I can convert them
into ps.
Is anyone knows about method, trick, routine to I can get the textboxes
from ps or pdf?
(Pythonic, COM, or command line solutions needed.)
I need to redraw them into my application, and user can reorder them,
and next I concat. every text to process it.
I need these infos:
x, y, w, h, text
Example:
page1
textbox1{x:100,y:100;w:600;h:27;text:"TextBox1 /xfc /xfa"}
textbox2{x:100,y:180;w:600;h:27;text:"TextBox2"}
page2
textbox1{x:100,y:100;w:600;h:27;text:"TextBox1"}
textbox2{x:100,y:180;w:600;h:27;text:"TextBox2"}
...
Any solution?
Thanks for it!
dd
ps1:
I tried every pdf2text and pdf2html application. All failed in the
test.
Only one provide good informations, the pdftohtml, because it is
makes divs with abs. position and size and the texts.
But this program is not handle the iso-8859-2 chars, so I lost them.
ps2:
The program must run under Windows XP. So the solution is os specific.
More information about the Python-list
mailing list