My first Python program -- a lexer

Thomas Mlynarczyk thomas at mlynarczyk-webdesign.de
Sun Nov 9 23:33:30 CET 2008


John Machin schrieb:

> [...] You have TWO problems: (1) Reporting the error location as
> (offset from the start of the file) instead of (line number, column
> position) would get you an express induction into the User Interface
> Hall of Shame. 

Of course. For the actual message I would use at least the line number. 
Still, the offset could be used to compute line/column in case of an 
error, so I wouldn't really need to store line/column with each token, 
but only the offset. And provide a method to "convert" offset values 
into line/column tuples.

> (2) In the case of a file with lines terminated by \r
> \n, the offset is ambiguous.

If I explicitly state that the offset counts newlines as one character? 
But you're right: the offset would be for internal use only - what gets 
reported is line/column.

>>> dict.iter<anything>() will return its results in essentially random
>>> order.

> A list of somethings does seem indicated.

On the other hand: If all my tokens are "mutually exclusive" then, in 
theory, the order in which they are tried, should not matter, as at most 
one token could match at any given offset. Still, having the most 
frequent tokens being tried first should improve performance.

> A dict is a hashtable, intended to provide a mapping from keys to
> values. It's not intended to have order. In any case your code doesn't
> use the dict as a mapping.

I map token names to regular expressions. Isn't that a mapping?

>>>> return "\n".join(
>>>>     [ "[L:%s]\t[O:%s]\t[%s]\t'%s'" %

> The first 3 are %s, the last one is '%s'

I only put the single quotes so I could better "see" whitespace in the 
output. Anyway, this method is just to be able to check if the lexer 
does what it's supposed to do -- in the final version I will probably 
get rid of it.

Thanks & greetings,
Thomas

-- 
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
(Coluche)



More information about the Python-list mailing list