US taxes: parsing postscript and/or pdf?
Harry George
hgg9140 at seanet.com
Wed Jan 2 18:58:43 EST 2002
Alan Miller <ajm at enteract.com> writes:
> Harry George (hgg9140 at seanet.com) wrote:
> >If not, my next approach is to download IRS forms in PDF or
> >Postscript, and then manipulate those templates. That requires
> >parsing the pdf or postscript, detecting named fields, putting in new
> >data, and regenrating the printable format.
>
> If the IRS-provided PDF forms are fillable PDFs, look into FDFs.
> Information should be available on FDF formats.
>
> If you're looking at parsing the raw PDF or PS and trying to determine
> what's supposed to be fillable and what's not, good luck.
>
> Your best bet would probably be to check on what the PDFs actually
> support right now, and consider getting Acrobat to add fillable fields
> to them if needed.
>
They appear to be dumb PDF's, intended only for printing. I've
discovered the postscripts are not actually valid postscripts
(generated by MS tools -- who would have guessed) -- so a generic
postscript parser wouldn't work after all.
Given the limited number of forms (and thus the limited number of
fields), it still seems like a possible avenue. Worht a try anyway.
> ajm
--
Harry George
hgg9140 at seanet.com
More information about the Python-list
mailing list