[Python-ideas] Transportable indent level markers. >>>===<<<

Ron Adam ron3200 at gmail.com
Thu Dec 15 04:19:08 CET 2011


On Thu, 2011-12-15 at 11:29 +1000, Nick Coghlan wrote:
> On Thu, Dec 15, 2011 at 11:08 AM, Ron Adam <ron3200 at gmail.com> wrote:
> > It only changes the pre-tokenized representation of the language.  To do
> > braces correctly, it would require much deeper changes.
> 
> This comment makes me think you may be working off a misunderstanding
> of the way Python's tokenisation works.

In tokenizer.c there is the function tok_get() at line 1292.  The first
part of it counts spaces and/or tabs and checks for other things and
then returns either DEDENT, INDENT, or possibly an ERRORTOKEN.  

Since this would not add any new tokens, I referred to it as a
pretokenized representation.

The way it would work is to detects these characters and use it to
determine wheather to return DEDENT, or INDENT tokens in place of any
surrounding white space immediately before or after it.

If these are found in the middle of a line, it would backup with
tok_backup(), to just before the token, and then ruturn a NEWLINE.

On the next tok_get() call, it would repeat.  It's would not be hard to
do at all. 


> Suites in Python's token stream are *already* explicitly delimited:
> "INDENT" and "DEDENT" tokens respectively mark the beginning and end
> of each suite. There's no such concept in the token stream as "don't
> change the indent level" - that's the assumed behaviour.

You are thinking of later in the chain I think.  We doing the change
before that is decided.  Think of them as hints for the tokenizer.  The
"don't change the indent level" hint, ';;;' is just a hint to ignore the
white space here and accept the previous level.  Because of that, the
text source is able to be white space insensitive as far as indent
levels is concerned.

> So an explicitly delimited syntax that just offers an alternative way
> to get INDENT and DEDENT tokens into the stream would be fairly
> straightforward.

Right. :-)

> You'd probably also want an explicit ";;" token to force a
> token.NEWLINE into the token stream.

That isn't needed.  Any of these in the middle of a line will add a new
line and back up, so the next call to tok_get() will find it, and so on.

  ;;; x = 3 ;;; y = 2 ;;; def add(x,y): \\\ return x + y

So this would be converted as the tokenizer goes over it to..

  x = 3
  y = 2
  def add(x,y):
      return x + y


Line continuations '\' are handled in pretty much the same way, they
just eat the next newline, and continue.  There is no token for a line
continuation.  These would work on the same level,

Cheers,
  Ron





More information about the Python-ideas mailing list