My first Python program -- a lexer
thomas at mlynarczyk-webdesign.de
Mon Nov 10 14:26:19 CET 2008
John Machin schrieb:
>> On the other hand: If all my tokens are "mutually exclusive" then,
> But they won't *always* be mutually exclusive (another example is
> relational operators (< vs <=, > vs >=)) and AFAICT there is nothing
> useful that the lexer can do with an assumption/guess/input that they
> are mutually exclusive or not.
"<" vs. "<=" can be handled with lookaheads (?=...) / (?!...) in regular
expressions. True, the lexer cannot do anything useful with the
assumption that all tokens are mutually exclusive. But if they are,
there will be no ambiguity and I am guaranteed to get always the same
sequence of tokens from the same input string.
> Your Lexer class should promise to check the regexes in the order
> given. Then the users of your lexer can arrange the order to suit
Yes. So there's no way around a list of tuples instead of dict().
> Your code uses dict methods; this forces your callers to *create* a
> mapping. However (as I said) your code doesn't *use* that mapping --
> there is no RHS usage of dict[key] or dict.get(key) etc. In fact I'm
> having difficulty imagining what possible practical use there could be
> for a mapping from token-name to regex.
Sorry, but I still don't quite get it.
for name, regex in self.tokens.iteritems():
self.result.append( ( name, match, self.line ) )
What I do here is take a name and its associated regex and then store a
tuple (name, match, line). In a simpler version of the lexer, I might
store only `name` instead of the tuple. Is your point that the lexer
doesn't care what `name` actually is, but simply passes it through from
the tokenlist to the result?
> To *best* see whitespace (e.g. Is that a TAB or multiple spaces?), use
(Just having modified my code accordingly:) Ah, yes, indeed, that is
> General advice: What you think you see is often not what you've
> actually got. repr() is your friend; use it.
Lesson learned :-)
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
More information about the Python-list