searching pdf files for certain info

Tom Willis tom.willis at gmail.com
Tue Feb 22 20:30:47 EST 2005


Ah that makes sense. I only see the behavior in pdftotext. ps2ascii
doesn't give me the layout , which for my purposes, I certainly need.

Thanks for the info, Looks like I'll keep searching for that silver bullet.:(


On Tue, 22 Feb 2005 20:07:50 -0500, rbt <rbt at athop1.ath.vt.edu> wrote:
> Tom Willis wrote:
> > Well sporadic spaces in strings would cause problems would it not?
> >
> > an example....
> >
> >
> > The String: "Patient Face Sheet"--->pdftotext--->"P a tie n t Face Sheet"
> >
> > I'm just curious if you see anything like that, since I really have no
> > clue about ps or pdf etc...but I have a strong desire to replace a
> > really flaky commercial tool. And if I can do it with free stuff, all
> > the better my boss will love me.
> 
> No, I do not see that type of behavior. I'm looking for strings that
> resemble SS numbers. So my strings look like this: nnn-nn-nnnn.
> 
> The ps2ascii util in ghostscript reproduces strings in the format that I
> expect. BTW, I'm not using pdftotext. I'm using *ps2ascii*.
> --
> http://mail.python.org/mailman/listinfo/python-list
> 


-- 
Thomas G. Willis
http://paperbackmusic.net



More information about the Python-list mailing list