[Image-SIG] PIL produces corrupt PDFs

Nicholas Riley njriley at uiuc.edu
Thu Nov 20 23:02:55 EST 2003


Hi,

(Apologies to anyone on the reportlab-users list who gets two copies
of this; I'm approaching this problem from both the 'producer' and
'consumer' sides in the hope of finding a solution quickly.)

I'm writing a document management system which uses PIL to take
scanned documents and make PDFs out of them.  Using PIL is fast and
easy, but the PDFs don't seem to be readable by anything other than
Acrobat Reader.

This is what I get from ReportLab's pageCatcher:

Traceback (most recent call last):
  File "copy.py", line 7, in ?
    copyPages('d:\\pil.pdf', cvs)
  File "_p:rlextra\pageCatcher\pageCatcher.py", line 1242, in copyPages
  File "_p:rlextra\pageCatcher\pageCatcher.py", line 678, in parse
  File "_p:rlextra\pageCatcher\pageCatcher.py", line 784, in getindirectObject
  File "_p:rlextra\pageCatcher\pageCatcher.py", line 822, in gettrue
ValueError: `endobj` keyword not found 1290 '5 0 obj\n<<\n/Length 3'

And from Ghostscript:

GS>(//brando/d$/pil.pdf) run
Processing pages 1 through 1.
Page 1
   **** Warning: stream missing 'endstream'.
   **** Unknown operator: 'xref'
   **** Unknown operator: 'f'
   **** Unknown operator: 'n'
   **** Unknown operator: 'n'
   **** Unknown operator: 'n'
   **** Unknown operator: 'n'
   **** Unknown operator: 'n'
   **** Unknown operator: 'trailer'
   **** Unknown operator: 'startxref'
   **** Unknown operator: '%%EOF'
Error: /stackunderflow in --pop--
Operand stack:

Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval-
-   2   %stopped_push   --nostringval--   --nostringval--   %loop_continue   2
 3   %oparray_pop   --nostringval--   --nostringval--   false   1   %stopped_pus
h   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopp
ed_push   --nostringval--   1   3   %oparray_pop   1   3   %oparray_pop   1   3
  %oparray_pop   --nostringval--   2   1   1   --nostringval--   %for_pos_int_co
ntinue   --nostringval--   --nostringval--   --nostringval--   --nostringval--
 --nostringval--   false   1   %stopped_push   --nostringval--   %loop_continue
  0   --nostringval--   %repeat_continue
Dictionary stack:
   --dict:1105/1123(ro)(G)--   --dict:0/20(G)--   --dict:74/200(L)--   --dict:74
/200(L)--   --dict:104/127(ro)(G)--   --dict:232/347(ro)(G)--   --dict:20/24(L)-
-   --dict:4/6(L)--   --dict:20/20(L)--
Current allocation mode is local
Current file position is 26

when parsing a simple PDF generated as follows:

import Image

Image.new('RGB', (100,100)).save('d:\\pil.pdf')

Since PIL can't read back its own PDFs, I've got several hundred
pseudo-PDFs stuck in this format and I can't get them out.

Any ideas would be gratefully appreciated.

Thanks,

-- 
=Nicholas Riley <njriley at uiuc.edu> | <http://www.uiuc.edu/ph/www/njriley>



More information about the Image-SIG mailing list