reading text in pdf, some working sample code
Jon Ribbens
jon+usenet at unequivocal.eu
Wed Nov 22 19:39:48 EST 2017
On 2017-11-21, Daniel Gross <grossd18 at gmail.com> wrote:
> I am new to python and jumped right into trying to read out (english) text
> from PDF files.
That's not a trivial task. However I just released pycpdf, which might
help you out. Check out https://github.com/jribbens/pycpdf which shows
an example of extracting text from PDFs. It may or may not cope with
the particular PDFs you have, as there's quite a lot of variety within
the format.
Example:
pdf = pycpdf.PDF(open("file.pdf", "rb").read())
if pdf.info and pdf.info.get('Title'):
print('Title:', pdf.info['Title'])
for pageno, page in enumerate(pdf.pages):
print('Page', pageno + 1)
print(page.text)
More information about the Python-list
mailing list