Read the table data from PDF files in Python
Rhodri James
rhodri at kynesim.co.uk
Wed Apr 24 07:17:18 EDT 2019
On 24/04/2019 10:36, mrawat213 at gmail.com wrote:
> Anyone knows how to fetch the data from PDF file having tables with other text in Python. Need to fetch some cell values based on condition from that table.
Hi there!
If you have any alternatives to doing this, use them. Extracting data
from PDFs like this is hugely unreliable because the order in which page
elements show up in a PDF varies enormously. What works for one PDF may
give you complete nonsense for the next.
If you must do it this way, there are modules called PyPDF and PyPDF2 in
PyPI which will allow you to extract the text from the PDF. You are on
your own for working out how to parse the tables out of that text,
though; the structures in the data you are hoping for simply don't exist.
--
Rhodri James *-* Kynesim Ltd
More information about the Python-list
mailing list