I have been playing with parsing pdf files in python. The format of .pdf is documented on Adobe's web site. If it weren't for the encryption and compression options you could simply work on the files directly. Cheers, Nick.