plex
John Hunter
jdhunter at ace.bsd.uchicago.edu
Thu Sep 5 13:10:24 EDT 2002
I am writing a plex lexer/scanner to parse a pdf file. Here is the
first part, which extracts the streams.
I would like to do this a bit more efficiently, namely, to read the
streams in multicharacter chunks rather than one character at a time.
As it is, the function add_stream has to be called for every character
in the stream.
Any advice how to do this?
Here is the code:
from Plex import *
def add_stream(scanner, text):
scanner.thisStream += text
def end_stream(scanner, text):
print 'BeginStream: ', scanner.thisStream, 'EndStream:' # do something with the stream here
scanner.thisStream = ''
scanner.begin('')
lexicon = Lexicon([
(AnyChar, IGNORE),
(Bol + Str("stream
") + Eol, Begin('stream')),
State('stream' , [
( Bol + Str("endstream") + Eol, end_stream ),
( AnyChar, add_stream),
]),
])
filename = "test.pdf"
f = open(filename, "r")
scanner = Scanner(lexicon, f, filename)
scanner.thisStream = ''
while 1:
token = scanner.read()
if token[0] is None:
break
Thanks,
John Hunter
More information about the Python-list
mailing list