filtering out "bad" regular expressions from user input

Skip Montanaro skip at mojam.com
Fri Sep 29 14:37:56 EDT 2000


    Andrew> Skip Montanaro wrote:
    >> I'm looking for a better approach to filtering out bad regular expressions.

    Andrew> Here's one approach:

    Andrew> Parse the regular expression (get /F's sre_parse from 2.0 - it's
    Andrew> very nice).

Thanks, I'll check into that.

    Andrew> Simplify things by not allowing lookahead/behind assertions.
    Andrew> Since you're using MySQL's regexp engine, you would also check
    Andrew> for any other construct it doesn't support.

    Andrew> Convert the tree into "real" characters, so that "category_word"
    Andrew> is "_ABCDEF...Zabc...z", etc.

    Andrew> The time slowdown occurs for backtracking, especially if there are
    Andrew> multiple levels of backtracking.

As usual, I specified the problem incompletely.  The problem isn't so much
regular expressions that perform poorly when matched against particular
strings.  It is that some very simple regular expressions (like ".*") can
match all (or almost all) records in a database of 20,000 or so rows.
Marshalling and returning that amount of information from MySQL to my
front-end code takes a substantial amount of time, as you might imagine.

Skip






More information about the Python-list mailing list