US taxes: parsing postscript and/or pdf?

Harry George hgg9140 at seanet.com
Wed Jan 2 18:58:43 EST 2002


Alan Miller <ajm at enteract.com> writes:

> Harry George (hgg9140 at seanet.com) wrote:
> >If not, my next approach is to download IRS forms in PDF or
> >Postscript, and then manipulate those templates.  That requires
> >parsing the pdf or postscript, detecting named fields, putting in new
> >data, and regenrating the printable format.
> 
> If the IRS-provided PDF forms are fillable PDFs, look into FDFs.  
> Information should be available on FDF formats.
> 
> If you're looking at parsing the raw PDF or PS and trying to determine 
> what's supposed to be fillable and what's not, good luck.  
> 
> Your best bet would probably be to check on what the PDFs actually 
> support right now, and consider getting Acrobat to add fillable fields 
> to them if needed.
>

They appear to be dumb PDF's, intended only for printing.  I've
discovered the postscripts are not actually valid postscripts
(generated by MS tools -- who would have guessed) -- so a generic
postscript parser wouldn't work after all.

Given the limited number of forms (and thus the limited number of
fields), it still seems like a possible avenue.  Worht a try anyway.

> ajm

-- 
Harry George
hgg9140 at seanet.com



More information about the Python-list mailing list