My first Python program -- a lexer

Thomas Mlynarczyk thomas at
Mon Nov 10 14:26:19 CET 2008

John Machin schrieb:

>> On the other hand: If all my tokens are "mutually exclusive" then,

> But they won't *always* be mutually exclusive (another example is
> relational operators (< vs <=, > vs >=)) and AFAICT there is nothing
> useful that the lexer can do with an assumption/guess/input that they
> are mutually exclusive or not.

"<" vs. "<=" can be handled with lookaheads (?=...) / (?!...) in regular 
expressions. True, the lexer cannot do anything useful with the 
assumption that all tokens are mutually exclusive. But if they are, 
there will be no ambiguity and I am guaranteed to get always the same 
sequence of tokens from the same input string.

> Your Lexer class should promise to check the regexes in the order
> given. Then the users of your lexer can arrange the order to suit
> themselves.

Yes. So there's no way around a list of tuples instead of dict().

> Your code uses dict methods; this forces your callers to *create* a
> mapping. However (as I said) your code doesn't *use* that mapping --
> there is no RHS usage of dict[key] or dict.get(key) etc. In fact I'm
> having difficulty imagining what possible practical use there could be
> for a mapping from token-name to regex.

Sorry, but I still don't quite get it.

for name, regex in self.tokens.iteritems():
     # ...
     self.result.append( ( name, match, self.line ) )

What I do here is take a name and its associated regex and then store a 
tuple (name, match, line). In a simpler version of the lexer, I might 
store only `name` instead of the tuple. Is your point that the lexer 
doesn't care what `name` actually is, but simply passes it through from 
the tokenlist to the result?

> To *best* see whitespace (e.g. Is that a TAB or multiple spaces?), use
> %r.

(Just having modified my code accordingly:) Ah, yes, indeed, that is 
much better!

> General advice: What you think you see is often not what you've
> actually got. repr() is your friend; use it.

Lesson learned :-)


Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!

More information about the Python-list mailing list