plex

Thu Sep 5 13:10:24 EDT 2002

I am writing a plex lexer/scanner to parse a pdf file.  Here is the
first part, which extracts the streams.

I would like to do this a bit more efficiently, namely, to read the
streams in multicharacter chunks rather than one character at a time.
As it is, the function add_stream has to be called for every character
in the stream.  

Any advice how to do this?

Here is the code:

from Plex import *

def add_stream(scanner, text):
    scanner.thisStream += text

def end_stream(scanner, text):
    print 'BeginStream: ', scanner.thisStream, 'EndStream:'  # do something with the stream here
    scanner.thisStream = ''
    scanner.begin('')

lexicon = Lexicon([
  (AnyChar, IGNORE),
  (Bol + Str("stream
") + Eol, Begin('stream')),
  State('stream' , [
    ( Bol + Str("endstream") + Eol, end_stream ), 
    ( AnyChar, add_stream),
    ]),
  ])

filename = "test.pdf"
f = open(filename, "r")
scanner = Scanner(lexicon, f, filename)
scanner.thisStream = ''

while 1:
  token = scanner.read()
  if token[0] is None:
    break

Thanks,
John Hunter