PDF library for reading PDF files
Josiah Carlson
jcarlson at uci.edu
Tue Jan 20 03:13:52 EST 2004
> Thanks. I am studying the PDF spec, it just does not seem to be that easy
> having to implement all the decompressions, etc. The "information" I am
> trying to extract from the PDF file is the text, specifically in a way to
> keep the original paragraphs of the text. I have seen so far one shareware
> standalone tool that extracts the text (and a lot of other formatting
> garbage) into an RTF document keeping the paragraphs as well. I would need
> only the text.
>
> Any suggestions?
Peter,
Suggestion: extract the document to RTF using that other tool, then use
any one of the few dozen RTF parsers to convert them into plaintext.
- Josiah
More information about the Python-list
mailing list