Self optimizing iterable

Carl Banks pavlovevidence at gmail.com
Fri Jul 17 21:35:33 EDT 2009


On Jul 17, 5:40 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Zac Burns <zac... at gmail.com> writes:
> > An example use case for this would be for something like a large table
> > of regular expressions that would be iterated over trying to match in
> > some string. If some regular expressions are more statistically more
> > successful then the iteration will generally be short.
>
> Generally if you are matching against a bunch of regexps, they will
> tend to match overlapping sets, so you want to control precisely what
> order they are tested in.  Having stuff bubble around based on hit
> counts doesn't sound good.

As a corrollary, if you happen to be matching non-overlapping sets,
then it is a red flag that there could be some other (better) way to
dispatch than a linear regexp search.  For instance, if all your
regexps start with a different keyword, then you should dispatch on
that keyword using a dictionary.  A regexp can be used after
dispatching to extract parameters if necessary.

It's possible to have disjoint regexps without a simple dispatch
criterion, but I'd guess that's less common.


Carl Banks



More information about the Python-list mailing list