My first Python program -- a lexer
thomas at mlynarczyk-webdesign.de
Sun Nov 9 23:33:30 CET 2008
John Machin schrieb:
> [...] You have TWO problems: (1) Reporting the error location as
> (offset from the start of the file) instead of (line number, column
> position) would get you an express induction into the User Interface
> Hall of Shame.
Of course. For the actual message I would use at least the line number.
Still, the offset could be used to compute line/column in case of an
error, so I wouldn't really need to store line/column with each token,
but only the offset. And provide a method to "convert" offset values
into line/column tuples.
> (2) In the case of a file with lines terminated by \r
> \n, the offset is ambiguous.
If I explicitly state that the offset counts newlines as one character?
But you're right: the offset would be for internal use only - what gets
reported is line/column.
>>> dict.iter<anything>() will return its results in essentially random
> A list of somethings does seem indicated.
On the other hand: If all my tokens are "mutually exclusive" then, in
theory, the order in which they are tried, should not matter, as at most
one token could match at any given offset. Still, having the most
frequent tokens being tried first should improve performance.
> A dict is a hashtable, intended to provide a mapping from keys to
> values. It's not intended to have order. In any case your code doesn't
> use the dict as a mapping.
I map token names to regular expressions. Isn't that a mapping?
>>>> return "\n".join(
>>>> [ "[L:%s]\t[O:%s]\t[%s]\t'%s'" %
> The first 3 are %s, the last one is '%s'
I only put the single quotes so I could better "see" whitespace in the
output. Anyway, this method is just to be able to check if the lexer
does what it's supposed to do -- in the final version I will probably
get rid of it.
Thanks & greetings,
Ce n'est pas parce qu'ils sont nombreux à avoir tort qu'ils ont raison!
More information about the Python-list