VIM and [Python] block jumping

Sat Jul 10 17:09:32 EDT 1999

[C. Laurence Gonsalves]
> ...
> The blockMotion function doesn't account for lines that are
> continuations of other lines. That means that triple-quoted string
> literals, lines with more {[('s that }])'s, and lines joined with \ can
> screw it up. Are there any other situations?

No, those are the only ways a line can get continued in Python.  Note that
there are subtleties, like that a backslash preceding a newline doesn't
*always* mean continuation (it doesn't at the tail end of a comment!).

> I'll try to post a corrected version later. Some of those problems are
> pretty nasty though (triple-quoted strings in particular).

Also single-quoted strings continued via backslash.  Strings are the real
headache, because the special characters you're looking for lose their magic
when they're inside a string.

[Neel Krishnaswami]
> ...
> But since you can script vim with Python, it might be possible to use
> the tokenizer in the standard library (module tokenize) to handle this
> cleanly.  Likely you'd need to extend it a bit to handle the syntax
> errors that show up in half-written code gracefully ...

This can be made to work, but I found it too slow to bear in an editor --
"subjectively instantaneous" is the goal there.  Most of the tokens tokenize
delivers are simply irrelevant to this task, so most of the work that goes
into producing them is wasted.

The current CVS version of IDLE has a new module PyParse.py, which does the
minimum necessary to determine whether a line is part of a continuation, and
if so why.  It can do this 100x faster than tokenize, even in
non-pathological programs.  The next version of IDLE uses this to do Emacs
pymode-like "very smart" indentation, and the next version of Mark Hammond's
PythonWin also uses this module.

It's GUI-independent (straight self-contained Python), but as called from
AutoIndent.py (also shared between IDLE and PythonWin) takes major advantage
of the host system's colorization, to get a fast answer to the question "is
this random character in a string?".  The colorizer generally knows this
already, for every character in the program, and AutoIdent/PyParse use that
to find a good (close) place to begin reparsing the file.

If the host can't answer that question reliably, then PyParse needs to
reparse the entire file from its start in order to get a guaranteed-correct
answer, so that's what it does.  PyParse is fast enough that this is *still*
"subjectively instantaneous" for most files, but becomes a noticeable delay
near the end of long files (like e.g. Tkinter.py).

pyparse-is-incomprehensible-but-that's-why-it's-fast<wink>-ly y'rs  - tim