xtopdf looks great! will check it out.
 I've faced similar issues w.r.t.junk characters, which may happen when the PDF contains an incorrect ToUnicode map, though I still have to dig deeper and I'm not 100% sure. I've also faced an issue where duplicate strings are assigned to the same cell. You can check it out on Github. I suspect that since PDF is a canvas-based model and not a text-based one, like you said, text is just transposed a bit further to make it look like bold text. I'll probably write a detailed blog post about the issues I faced while development :)
Good idea :)
Thanks for checking it out!