[BangPypers] extracting unicode text from pdfs
Gora Mohanty
gora at srijan.in
Mon May 24 17:08:04 CEST 2010
On Mon, 24 May 2010 19:13:26 +0530
Eknath Venkataramani <eknath.iyer at gmail.com> wrote:
> I have around 45 pdfs to convert into raw text containing text in
> _HINDI_ . When I use the xpdf package, the generated text is very
> weird, so I'd like to write a program which would convert the pdf
> text into Unicode text as it is.
Probably because xpdf does not have the fonts available. Are these
installed on your system?
> The fonts used in the pdfs:
[...]
APKAPP+Usha-Bold
[...]
Are you sure that these are indeed Unicode fonts, as your table
seems to suggest? Would it be possible to share a PDF, or a page
or two from one?
Regards,
Gora
More information about the BangPypers
mailing list