[Python-Dev] Py2.3 Todo List

Gustavo Niemeyer niemeyer@conectiva.com
Fri, 20 Jun 2003 12:16:54 -0300


> on midsummer's eve?  I'm supposed to be eating pickled herring and
> drinking schapps, not trying to decipher C code...

I'm glad I was wrong.. :-)

> > Some version of Gustavo's patch is here:
> > 
> >     http://www.python.org/sf/757624
> 
> looking at the patch, I'm 95% confident that it's the right thing (or
> close enough to the right thing ;-)

Cool! :-)

> but reading the unified patch is not exactly trivial; a brief prose

Indeed. It looks awful. :-)

> description of the new mechanism would be nice.

Basically, MAX_REPEAT and MIN_REPEAT were changed from

<REPEAT> <skip> <min> <max> ... <(MAX|MIN)_UNTIL> ...

to

<(MAX|MIN)_REPEAT> <skip> <min> <max> ... <SUCCESS> ...

and all logic was moved from (MAX|MIN)_UNTIL to (MAX|MIN)_REPEAT.

In the implementation, the main change was turning mark_stack into a
generic data_stack, and using it to push the state of each iteration
from MAX_REPEAT, so that it can pop them out while backtracking.
Another way to implement this was to simply test tail-matching
forwards, but while this would save memory and be easier to implement,
it'd certainly affect performance.

MIN_REPEAT was quite straightforward, as it tests tail-matching
forwards, thus no state saving is necessary.

> have you benchmarked this on "real-world" examples, and on more than one
> platform?  before-and-after figures for xmllib/tokenize on large source files
> would be a good indication on the performance impact (if any).

No, I haven't done with real programs. OTOH, I've done tests with large
streams which explored the worse case of both algorithms, and it has
shown no negative impacts, and indeed it's faster in some situations.
For example, I could remove the MIN_REPEAT_ONE opcode, since the new
generic MIN_REPEAT implementation is as fast as the old specific
implementation. I'm also checking if it is possible to move some of the
intelligence in MAX_REPEAT_ONE to MAX_REPEAT.

Of course, I'll be thankful for any further benchmarks done on this
code.

> (and to be slightly nitpicking, I think it's good style to keep the
> alphabetical order when adding stuff to lists that are already in
> alphabetical order, unless you have really good reasons no to...)

I'm sorry. I thought it was ordered based on logic proximity, but I
should have looked more carefuly.

Thanks for reviewing it!

-- 
Gustavo Niemeyer
http://niemeyer.net