Python library to extract data tables from PDF files
Hello everyone! I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot I've created a wiki page <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3! I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request! Looking forward to hearing from you all! Thanks for your time! Vinayak
I've created a Jupyter notebook which shows an example of how Camelot makes it easy to extract tables out of PDFs. In the example, I scrape a PDF from an Indian disease outbreaks data source[1] using requests, extract tables from each page of the PDF using Camelot and then concat those tables. Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 :) [1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689 On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta <vmehta94@gmail.com> wrote:
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
Have you compared your tool with existing ones, such as https://blog.chezo.uno/tabula-py-extract-table-from-pdf-into-python-datafram... ? What notable difference in API and/or accuracy do you have? On Fri, Sep 28, 2018 at 2:32 PM Vinayak Mehta <vmehta94@gmail.com> wrote:
I've created a Jupyter notebook which shows an example of how Camelot makes it easy to extract tables out of PDFs.
In the example, I scrape a PDF from an Indian disease outbreaks data source[1] using requests, extract tables from each page of the PDF using Camelot and then concat those tables. Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 :)
[1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689
On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta <vmehta94@gmail.com> wrote:
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
_______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
Hello David! Yes, I've created a wiki page comparing Camelot with other open source tools and libraries. tabula-py is a wrapper over tabula-java, which is used by Tabula. You can check out the comparison of Camelot with Tabula here <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools#tabula>. As you can see in the comparison, it outperforms Tabula in almost all cases! While Tabula either gives either good output or fails miserably, Camelot gives you complete control over the extraction process with various configuration parameters! You can check out this section of the README <https://github.com/socialcopsdev/camelot#why-camelot> for more information. Camelot also lets you plot various geometries like detected lines, intersections, tables in the PDF to debug and improve table extraction! You can check out this part of the documentation <https://camelot-py.readthedocs.io/en/latest/user/advanced.html#plot-geometry> for more information on that. Try it out! Vinayak On Sat, Sep 29, 2018 at 12:34 AM David Mertz <mertz@gnosis.cx> wrote:
Have you compared your tool with existing ones, such as https://blog.chezo.uno/tabula-py-extract-table-from-pdf-into-python-datafram... ?
What notable difference in API and/or accuracy do you have?
On Fri, Sep 28, 2018 at 2:32 PM Vinayak Mehta <vmehta94@gmail.com> wrote:
I've created a Jupyter notebook which shows an example of how Camelot makes it easy to extract tables out of PDFs.
In the example, I scrape a PDF from an Indian disease outbreaks data source[1] using requests, extract tables from each page of the PDF using Camelot and then concat those tables. Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 :)
[1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689
On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta <vmehta94@gmail.com> wrote:
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
_______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
The library's API is pretty simple and intuitive too! You can check it out in the README :) On Sat, Sep 29, 2018 at 1:06 AM Vinayak Mehta <vmehta94@gmail.com> wrote:
Hello David!
Yes, I've created a wiki page comparing Camelot with other open source tools and libraries. tabula-py is a wrapper over tabula-java, which is used by Tabula. You can check out the comparison of Camelot with Tabula here <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools#tabula>. As you can see in the comparison, it outperforms Tabula in almost all cases!
While Tabula either gives either good output or fails miserably, Camelot gives you complete control over the extraction process with various configuration parameters! You can check out this section of the README <https://github.com/socialcopsdev/camelot#why-camelot> for more information. Camelot also lets you plot various geometries like detected lines, intersections, tables in the PDF to debug and improve table extraction! You can check out this part of the documentation <https://camelot-py.readthedocs.io/en/latest/user/advanced.html#plot-geometry> for more information on that.
Try it out!
Vinayak
On Sat, Sep 29, 2018 at 12:34 AM David Mertz <mertz@gnosis.cx> wrote:
Have you compared your tool with existing ones, such as https://blog.chezo.uno/tabula-py-extract-table-from-pdf-into-python-datafram... ?
What notable difference in API and/or accuracy do you have?
On Fri, Sep 28, 2018 at 2:32 PM Vinayak Mehta <vmehta94@gmail.com> wrote:
I've created a Jupyter notebook which shows an example of how Camelot makes it easy to extract tables out of PDFs.
In the example, I scrape a PDF from an Indian disease outbreaks data source[1] using requests, extract tables from each page of the PDF using Camelot and then concat those tables. Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 :)
[1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689
On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta <vmehta94@gmail.com> wrote:
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
_______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
Very interesting, and congrats, Vinayak. As a person interested in both PDF generation [1] and PDF text extraction [2], I'm interested to know what issues you faced w.r.t. accuracy of text extraction and also formatting. [1] I'm the creator of xtopdf, a Python toolkit for PDF generation from other file formats; http://slides.com/vasudevram/xtopdf http://bitbucket.org/vasudevram/xtopdf [2] I worked on a project to extract text from PDF files. It was done using a C library (xpdf), though, not a Python one. However, the text extraction accuracy issues (some of which are technical issues inherent in the PDF format, according to the vendor of xpdf, Glyph and Cog) are language-independent. There were things like characters getting transposed, missing characters, junk characters sometimes, etc. (I also wrote a heuristics program to detect some such issues, but that too could only reject the bad extracts, not make them correct.) So the extraction was not 100% accurate, at least in my project. Also, like I said, that vendor said the issues are inherent in PDF, partly related to it being a canvas-based model, not a text-based one. I'll try to check out your project some time later. Cheers, Vasudev -- vi quickstart: https://gumroad.com/l/vi_quick Web site: https://vasudevram.github.io Blog: https://jugad2.blogspot.com Products: https://gumroad.com/vasudevram
While Tabula either gives either good output or fails miserably, Camelot gives you complete control over the extraction process with various configuration parameters! You can check out this section of the README <https://github.com/socialcopsdev/camelot#why-camelot> for more information. Camelot also lets you plot various geometries like detected lines, intersections, tables in the PDF to debug and improve table extraction! You can check out this part of the documentation <https://camelot-py.readthedocs.io/en/latest/user/advanced.html#plot-geometry> for more information on that.
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
Thanks Vasudev! [1] xtopdf looks great! will check it out. [2] I've faced similar issues w.r.t.junk characters, which may happen when the PDF contains an incorrect ToUnicode map, though I still have to dig deeper and I'm not 100% sure. I've also faced an issue where duplicate strings are assigned to the same cell. You can check it out on Github <https://github.com/socialcopsdev/camelot/issues/103>. I suspect that since PDF is a canvas-based model and not a text-based one, like you said, text is just transposed a bit further to make it look like bold text. I'll probably write a detailed blog post about the issues I faced while development :) Thanks for checking it out! On Sat, Sep 29, 2018 at 1:26 AM Vasudev Ram <vasudevram@gmail.com> wrote:
Very interesting, and congrats, Vinayak.
As a person interested in both PDF generation [1] and PDF text extraction [2], I'm interested to know what issues you faced w.r.t. accuracy of text extraction and also formatting.
[1] I'm the creator of xtopdf, a Python toolkit for PDF generation from other file formats;
http://slides.com/vasudevram/xtopdf
http://bitbucket.org/vasudevram/xtopdf
[2] I worked on a project to extract text from PDF files. It was done using a C library (xpdf), though, not a Python one. However, the text extraction accuracy issues (some of which are technical issues inherent in the PDF format, according to the vendor of xpdf, Glyph and Cog) are language-independent. There were things like characters getting transposed, missing characters, junk characters sometimes, etc. (I also wrote a heuristics program to detect some such issues, but that too could only reject the bad extracts, not make them correct.)
So the extraction was not 100% accurate, at least in my project. Also, like I said, that vendor said the issues are inherent in PDF, partly related to it being a canvas-based model, not a text-based one.
I'll try to check out your project some time later.
Cheers, Vasudev -- vi quickstart: https://gumroad.com/l/vi_quick Web site: https://vasudevram.github.io Blog: https://jugad2.blogspot.com Products: https://gumroad.com/vasudevram
While Tabula either gives either good output or fails miserably, Camelot gives you complete control over the extraction process with various configuration parameters! You can check out this section of the README <https://github.com/socialcopsdev/camelot#why-camelot> for more information. Camelot also lets you plot various geometries like detected lines, intersections, tables in the PDF to debug and improve table extraction! You can check out this part of the documentation < https://camelot-py.readthedocs.io/en/latest/user/advanced.html#plot-geometry
for more information on that.
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page < https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Tabl...
comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
Thanks Vasudev!
NP.
[1] xtopdf looks great! will check it out.
Cool! Thanks.
[2] I've faced similar issues w.r.t.junk characters, which may happen when the PDF contains an incorrect ToUnicode map, though I still have to dig deeper and I'm not 100% sure. I've also faced an issue where duplicate strings are assigned to the same cell. You can check it out on Github. I suspect that since PDF is a canvas-based model and not a text-based one, like you said, text is just transposed a bit further to make it look like bold text. I'll probably write a detailed blog post about the issues I faced while development :)
Good idea :)
Thanks for checking it out!
NP. -- vi quickstart: https://gumroad.com/l/vi_quick Web site: https://vasudevram.github.io Blog: https://jugad2.blogspot.com Products: https://gumroad.com/vasudevram
Hi, Vinayak! Good work, thanks for sharing. :) I'm the creator of the rows library[http://turicas.info/rows] and implemented PDF support early this year (with 3 different strategies) -- it's not released on PyPI yet since I'm fixing some bugs before releasing the next version, but you can try it out by installing: pip install git+https://github.com/turicas/rows.git@feature/plugin-pdf#egg=rows pdfminer.six cached-property It's 100% written in Python and also has a command-line interface (so you can run `rows convert http://example.com/file.pdf newfile.(csv|xls|xlsx|html|sqlite)` or even `rows query "SELECT * FROM table1 WHERE some_condition" http://example.com/file.pdf --output=result.xls`). The idea behind the extraction algorithms is to be flexible, so you can plug your own if you want (depending on how the PDF is created, the objects will be very different and you cannot use the same ordering/grouping strategy). I'm now implementing support to extract tables from images (and also from PDFs with images), but it's probably not going to the next version since I need a better OCR tool. What do you think in joining efforts so we can have better libraries? I'm going to test the PDFs you've cited with my code so we can compare better. Feel free to contact me directly or join the chat at https://gitter.im/turicas/rows Cheers, Álvaro Justen "Turicas" turicas.info / @turicas (twitter, github, youtube) +55 41 999 311 221 On Fri, Sep 28, 2018 at 11:43 AM Vinayak Mehta <vmehta94@gmail.com> wrote:
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak _______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community
Thanks Alvaro! rows looks top-notch, I'll check it out! I too have support for extracting tables from images on my roadmap, will drop by the rows gitter channel to discuss this further! :) On Sat, Sep 29, 2018 at 1:40 AM Álvaro Justen [Turicas] < alvarojusten@gmail.com> wrote:
Hi, Vinayak! Good work, thanks for sharing. :)
I'm the creator of the rows library[http://turicas.info/rows] and implemented PDF support early this year (with 3 different strategies) -- it's not released on PyPI yet since I'm fixing some bugs before releasing the next version, but you can try it out by installing:
pip install git+https://github.com/turicas/rows.git@feature/plugin-pdf#egg=rows pdfminer.six cached-property
It's 100% written in Python and also has a command-line interface (so you can run `rows convert http://example.com/file.pdf newfile.(csv|xls|xlsx|html|sqlite)` or even `rows query "SELECT * FROM table1 WHERE some_condition" http://example.com/file.pdf --output=result.xls`).
The idea behind the extraction algorithms is to be flexible, so you can plug your own if you want (depending on how the PDF is created, the objects will be very different and you cannot use the same ordering/grouping strategy).
I'm now implementing support to extract tables from images (and also from PDFs with images), but it's probably not going to the next version since I need a better OCR tool. What do you think in joining efforts so we can have better libraries? I'm going to test the PDFs you've cited with my code so we can compare better. Feel free to contact me directly or join the chat at https://gitter.im/turicas/rows
Cheers, Álvaro Justen "Turicas" turicas.info / @turicas (twitter, github, youtube) +55 41 999 311 221 On Fri, Sep 28, 2018 at 11:43 AM Vinayak Mehta <vmehta94@gmail.com> wrote:
Hello everyone!
I recently released a Python library which lets users extract data
tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page comparing it to other open source PDF table
extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its
useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak _______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community
Have you published it on pypi ? Make it really easy for people to install it if they need it. On 28/09/2018 07:31, Vinayak Mehta wrote:
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
_______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community -- Anthony Flury *Email* : anthony.flury@btinternet.com <mailto:Anthony.flury@btinternet.com> *Twitter* : @TonyFlury <https://twitter.com/TonyFlury/>
Hi Anthony! Yes it's on PyPI! You can install it using "pip install camelot-py" Also, I just released a web interface for the library! You can check it out here: https://github.com/camelot-dev/excalibur You can install it using "pip install excalibur-py" or download the Windows/Linux executable from the releases page. Keep looking up! Vinayak On Sat, Oct 20, 2018 at 5:36 PM Anthony Flury via PSF-Community < psf-community@python.org> wrote:
Have you published it on pypi ? Make it really easy for people to install it if they need it. On 28/09/2018 07:31, Vinayak Mehta wrote:
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
_______________________________________________ PSF-Community mailing listPSF-Community@python.orghttps://mail.python.org/mailman/listinfo/psf-community
-- Anthony Flury *Email* : anthony.flury@btinternet.com <Anthony.flury@btinternet.com> *Twitter* : @TonyFlury <https://twitter.com/TonyFlury/> _______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community
Hey, Vinayak, Cool product names, man! You have my moral support. I vote for the next one being called Merlin - a wizard - get it? I must rethink better names for my own products :) Cheers ... On 10/22/18, Vinayak Mehta <vmehta94@gmail.com> wrote:
Hi Anthony!
Yes it's on PyPI! You can install it using "pip install camelot-py"
Also, I just released a web interface for the library! You can check it out here: https://github.com/camelot-dev/excalibur
You can install it using "pip install excalibur-py" or download the Windows/Linux executable from the releases page.
Keep looking up!
Vinayak
On Sat, Oct 20, 2018 at 5:36 PM Anthony Flury via PSF-Community < psf-community@python.org> wrote:
Have you published it on pypi ? Make it really easy for people to install it if they need it. On 28/09/2018 07:31, Vinayak Mehta wrote:
Hello everyone!
I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools> comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
_______________________________________________ PSF-Community mailing listPSF-Community@python.orghttps://mail.python.org/mailman/listinfo/psf-community
-- Anthony Flury *Email* : anthony.flury@btinternet.com <Anthony.flury@btinternet.com> *Twitter* : @TonyFlury <https://twitter.com/TonyFlury/> _______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community
-- vi quickstart: https://gumroad.com/l/vi_quick Web site: https://vasudevram.github.io Blog: https://jugad2.blogspot.com Products: https://gumroad.com/vasudevram
Thanks for the support Vasudev! They're both themed after the Arthurian legend. https://excalibur-py.readthedocs.io/en/master/user/intro.html#what-s-in-a-na... On Mon, Oct 22, 2018 at 11:02 PM Vasudev Ram <vasudevram@gmail.com> wrote:
Hey, Vinayak,
Cool product names, man!
You have my moral support.
I vote for the next one being called Merlin - a wizard - get it?
I must rethink better names for my own products :)
Cheers ...
Hi Anthony!
Yes it's on PyPI! You can install it using "pip install camelot-py"
Also, I just released a web interface for the library! You can check it out here: https://github.com/camelot-dev/excalibur
You can install it using "pip install excalibur-py" or download the Windows/Linux executable from the releases page.
Keep looking up!
Vinayak
On Sat, Oct 20, 2018 at 5:36 PM Anthony Flury via PSF-Community < psf-community@python.org> wrote:
Have you published it on pypi ? Make it really easy for people to install it if they need it. On 28/09/2018 07:31, Vinayak Mehta wrote:
Hello everyone!
I recently released a Python library which lets users extract data
On 10/22/18, Vinayak Mehta <vmehta94@gmail.com> wrote: tables
out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot
I've created a wiki page < https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Tabl...
comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3!
I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request!
Looking forward to hearing from you all!
Thanks for your time!
Vinayak
_______________________________________________ PSF-Community mailing listPSF-Community@python.orghttps:// mail.python.org/mailman/listinfo/psf-community
-- Anthony Flury *Email* : anthony.flury@btinternet.com <Anthony.flury@btinternet.com> *Twitter* : @TonyFlury <https://twitter.com/TonyFlury/> _______________________________________________ PSF-Community mailing list PSF-Community@python.org https://mail.python.org/mailman/listinfo/psf-community
-- vi quickstart: https://gumroad.com/l/vi_quick Web site: https://vasudevram.github.io Blog: https://jugad2.blogspot.com Products: https://gumroad.com/vasudevram
participants (5)
-
Anthony Flury
-
David Mertz
-
Vasudev Ram
-
Vinayak Mehta
-
Álvaro Justen [Turicas]