The library's API is pretty simple and intuitive too! You can check it out in the README :) On Sat, Sep 29, 2018 at 1:06 AM Vinayak Mehta <vmehta94@gmail.com> wrote:
Hello David!
Yes, I've created a wiki page comparing Camelot with other open source tools and libraries. tabula-py is a wrapper over tabula-java, which is used by Tabula. You can check out the comparison of Camelot with Tabula here <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools#tabula>. As you can see in the comparison, it outperforms Tabula in almost all cases!
While Tabula either gives either good output or fails miserably, Camelot gives you complete control over the extraction process with various configuration parameters! You can check out this section of the README <https://github.com/socialcopsdev/camelot#why-camelot> for more information. Camelot also lets you plot various geometries like detected lines, intersections, tables in the PDF to debug and improve table extraction! You can check out this part of the documentation <https://camelot-py.readthedocs.io/en/latest/user/advanced.html#plot-geometry> for more information on that.
Try it out!
Vinayak
On Sat, Sep 29, 2018 at 12:34 AM David Mertz <mertz@gnosis.cx> wrote:
Have you compared your tool with existing ones, such as https://blog.chezo.uno/tabula-py-extract-table-from-pdf-into-python-datafram... ?
What notable difference in API and/or accuracy do you have?
On Fri, Sep 28, 2018 at 2:32 PM Vinayak Mehta <vmehta94@gmail.com> wrote:
I've created a Jupyter notebook which shows an example of how Camelot makes it easy to extract tables out of PDFs.
In the example, I scrape a PDF from an Indian disease outbreaks data source[1] using requests, extract tables from each page of the PDF using Camelot and then concat those tables. Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 :)
[1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689
On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta <vmehta94@gmail.com> wrote:
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
_______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.