Re: [PSF-Community] Python library to extract data tables from PDF files

1 Oct 2018


      ...
Thanks Vasudev!
NP.
...
[1]  xtopdf looks great! will check it out.
Cool! Thanks.
...
[2] I've faced similar issues w.r.t.junk characters, which may happen when the PDF contains an incorrect ToUnicode map, though I still have to dig deeper and I'm not 100% sure. I've also faced an issue where duplicate strings are assigned to the same cell. You can check it out on Github. I suspect that since PDF is a canvas-based model and not a text-based one, like you said, text is just transposed a bit further to make it look like bold text. I'll probably write a detailed blog post about the issues I faced while development :)
Good idea :)
...
Thanks for checking it out!
NP.

-- 
vi quickstart: https://gumroad.com/l/vi_quick
Web site:      https://vasudevram.github.io
Blog:             https://jugad2.blogspot.com
Products:      https://gumroad.com/vasudevram

Re: [PSF-Community] Python library to extract data tables from PDF files

Vasudev Ram