[BangPypers] extracting unicode text from pdfs

Gora Mohanty gora at srijan.in
Mon May 24 17:08:04 CEST 2010


On Mon, 24 May 2010 19:13:26 +0530
Eknath Venkataramani <eknath.iyer at gmail.com> wrote:

> I have around 45 pdfs to convert into raw text containing text in
> _HINDI_ . When I use the xpdf package, the generated text is very
> weird, so I'd like to write a program which would convert the pdf
> text into Unicode text as it is.

Probably because xpdf does not have the fonts available. Are these
installed on your system?

> The fonts used in the pdfs:
[...]
APKAPP+Usha-Bold 
[...]

Are you sure that these are indeed Unicode fonts, as your table
seems to suggest? Would it be possible to share a PDF, or a page
or two from one?

Regards,
Gora


More information about the BangPypers mailing list