Its return value will replace the $x"..." combination in the token stream, as a literal token.
Why just one token, and why just literal. Returning an arbitrary sequence of tokens seems more natural. This would allow e.g. Tim Berners-Lee to have basically what he wants (and asked for in his talk at IPC10) in terms of extended syntax for graphs, just with some $x in front.
1. I wasn't sure how easy it would be to return an arbitrary sequence of tokens.
2. I wasn't sure how appropriate it was to make users understand the internals of the parser in that way. Transforming a magic token into a literal Python object is easy to understand. Transforming it into an arbitrary sequence of tokens is more powerful but harder to understand. (And harder to claim as analogous with u"...", 123L, etc., though I'm not sure that matters.)
I had a similar idea right after Tim's talk, but could not articulate it clearly enough in a chat with Guido right afterwards, and later I didn't follow through with it. It seems to me that your proposal is detailed and precise enough (while my idea was rather vague) and that, by returning an arbitrary sequence of tokens, it will let Tim embed whatever funky syntax it requires.
If we want to be able to generate arbitrary sequences of tokens, I think I'd prefer a more flexible input syntax.
This power is also the downside of the whole idea of course -- no guarantee that somebody can't use this mechanism to produce highly obfuscated programs. But I think that such a somebody could already obfuscate quite effectively in other ways, and the risk of abuse shouldn't stop this interesting proposal.
I am inclined to agree.
... return Rational(numerator, denominator)
Hmmm, how would this "return a literal token"? It returns an instance of Rational -- how does the parser treat this instance as a literal token?
I thought this use would have to return the sequence of tokens for identifier 'Rational', open parenthesis, literal (value of) numerator, comma, literal (value of) denominator, closed parenthesis -- which in turn is why I thought of an arbitrary sequence of tokens. If a single instance of any arbitrary class may be returned and get treated as a literal token by the parser, then that's much better (maybe I don't know Python's parser well enough, but I don't clearly see how that would be done).
I don't know Python's parser well enough either :-). However: it can accept NUMBER and STRING tokens. As far as the grammar is concerned, they are exactly the same (except that multiple STRING tokens are implicitly concatenated). As far as everything else is concerned, they are very nearly exactly the same. We could have a LITERAL token, treated in the same sort of way as NUMBER and STRING. That was what I was intending; certainly not returning the token-sequence <Rational>, <(>, <numerator>, <,>, <denominator>, <)> !
- Is this insane?
Hope not, since I like it.
Hmm. The other proposal I know you and I both like is the adaptation protocol. This is not necessarily a good omen. :-)
- Is "$" the best character?
Among the few available ones, I think I slightly prefer "@" for this use, but there's little to choose IMHO.
Curiously, "@" was the first option I thought of for this. I didn't have any very concrete reason for switching to "$".