Read the table data from PDF files in Python

Rhodri James rhodri at kynesim.co.uk
Wed Apr 24 07:17:18 EDT 2019


On 24/04/2019 10:36, mrawat213 at gmail.com wrote:
> Anyone knows how to fetch the data from PDF file having tables with other text in Python. Need to fetch some cell values based on condition from that table.

Hi there!

If you have any alternatives to doing this, use them.  Extracting data 
from PDFs like this is hugely unreliable because the order in which page 
elements show up in a PDF varies enormously.  What works for one PDF may 
give you complete nonsense for the next.

If you must do it this way, there are modules called PyPDF and PyPDF2 in 
PyPI which will allow you to extract the text from the PDF.  You are on 
your own for working out how to parse the tables out of that text, 
though; the structures in the data you are hoping for simply don't exist.

-- 
Rhodri James *-* Kynesim Ltd


More information about the Python-list mailing list