[Python-Dev] Re[3]: User extendable literal modifiers ?!

Fri, 27 Sep 2002 13:58:42 +0100 (BST)

> >     Its return value will replace the $x"..." combination in
> >     the token stream, as a literal token.
> 
> Why just one token, and why just literal.  Returning an
> arbitrary sequence of tokens seems more natural.  This
> would allow e.g. Tim Berners-Lee to have basically what
> he wants (and asked for in his talk at IPC10) in terms of
> extended syntax for graphs, just with some $x in front.

1. I wasn't sure how easy it would be to return an
arbitrary sequence of tokens.

2. I wasn't sure how appropriate it was to make users
understand the internals of the parser in that way.
Transforming a magic token into a literal Python object
is easy to understand. Transforming it into an arbitrary
sequence of tokens is more powerful but harder to
understand. (And harder to claim as analogous with
u"...", 123L, etc., though I'm not sure that matters.)

> I had a similar idea right after Tim's talk, but could not
> articulate it clearly enough in a chat with Guido right
> afterwards, and later I didn't follow through with it.  It
> seems to me that your proposal is detailed and precise
> enough (while my idea was rather vague) and that, by
> returning an arbitrary sequence of tokens, it will let
> Tim embed whatever funky syntax it requires.

If we want to be able to generate arbitrary sequences
of tokens, I think I'd prefer a more flexible input
syntax.

> This power is also the downside of the whole idea of
> course -- no guarantee that somebody can't use this
> mechanism to produce highly obfuscated programs.
> But I think that such a somebody could already
> obfuscate quite effectively in other ways, and the
> risk of abuse shouldn't stop this interesting proposal.

I am inclined to agree.

> >         ...     return Rational(numerator, denominator)
> 
> Hmmm, how would this "return a literal token"?  It returns
> an instance of Rational -- how does the parser treat this
> instance as a literal token?
> 
> I thought this use would have to return the sequence of
> tokens for identifier 'Rational', open parenthesis, literal
> (value of) numerator, comma, literal (value of) denominator,
> closed parenthesis -- which in turn is why I thought of an
> arbitrary sequence of tokens.  If a single instance of any
> arbitrary class may be returned and get treated as a
> literal token by the parser, then that's much better (maybe
> I don't know Python's parser well enough, but I don't
> clearly see how that would be done).

I don't know Python's parser well enough either :-).
However: it can accept NUMBER and STRING tokens.
As far as the grammar is concerned, they are exactly
the same (except that multiple STRING tokens are
implicitly concatenated). As far as everything else
is concerned, they are very nearly exactly the same.
We could have a LITERAL token, treated in the same
sort of way as NUMBER and STRING. That was what I was
intending; certainly not returning the token-sequence
<Rational>, <(>, <numerator>, <,>, <denominator>, <)> !

> >   - Is this insane?
> 
> Hope not, since I like it.

Hmm. The other proposal I know you and I both like is
the adaptation protocol. This is not necessarily a good
omen. :-)

> >   - Is "$" the best character?
> 
> Among the few available ones, I think I slightly prefer "@"
> for this use, but there's little to choose IMHO.

Curiously, "@" was the first option I thought of for this.
I didn't have any very concrete reason for switching to "$".

-- 
g