[Python-Dev] Why Foo is better than Baz
Guido van Rossum
guido at CNRI.Reston.VA.US
Mon May 3 17:32:09 CEST 1999
> I looked at it a bit when Tcl 8.1 was in beta; it derives from
> Henry Spencer's 1998-vintage code, which seems to try to do a lot of
> optimization and analysis. It may even compile DFAs instead of NFAs
> when possible, though it's hard for me to be sure. This might give it
> a substantial speed advantage over engines that do less analysis, but
> I haven't benchmarked it. The code is easy to read, but difficult to
> understand because the theory underlying the analysis isn't explained
> in the comments; one feels there should be an accompanying paper to
> explain how everything works, and it's why I'm not sure if it really
> is producing DFAs for some expressions.
>
> Tcl seems to represent everything as UTF-8 internally, so
> there's only one regex engine; there's .
Hmm... I looked when Tcl 8.1 was in alpha, and I *think* that at that
point the regex engine was compiled twice, once for 8-bit chars and
once for 16-bit chars. But this may have changed.
I've noticed that Perl is taking the same position (everything is
UTF-8 internally). On the other hand, Java distinguishes 16-bit chars
from 8-bit bytes. Python is currently in the Java camp. This might
be a good time to make sure that we're still convinced that this is
the right thing to do!
> The code is scattered over
> more files:
>
> amarok generic>ls re*.[ch]
> regc_color.c regc_locale.c regcustom.h regerrs.h regfree.c
> regc_cvec.c regc_nfa.c rege_dfa.c regex.h regfronts.c
> regc_lex.c regcomp.c regerror.c regexec.c regguts.h
> amarok generic>wc -l re*.[ch]
> 742 regc_color.c
> 170 regc_cvec.c
> 1010 regc_lex.c
> 781 regc_locale.c
> 1528 regc_nfa.c
> 2124 regcomp.c
> 85 regcustom.h
> 627 rege_dfa.c
> 82 regerror.c
> 18 regerrs.h
> 308 regex.h
> 952 regexec.c
> 25 regfree.c
> 56 regfronts.c
> 388 regguts.h
> 8896 total
> amarok generic>
>
> This would be an issue for using it with Python, since all
> these files would wind up scattered around the Modules directory. For
> comparison, pypcre.c is around 4700 lines of code.
I'm sure that if it's good code, we'll find a way. Perhaps a more
interesting question is whether it is Perl5 compatible. I contacted
Henry Spencer at the time and he was willing to let us use his code.
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list