filtering out "bad" regular expressions from user input
Andrew Dalke
dalke at acm.org
Fri Sep 29 12:47:31 EDT 2000
Skip Montanaro wrote:
>I'm looking for a better approach to filtering out bad regular expressions.
Here's one approach:
Parse the regular expression (get /F's sre_parse from 2.0 - it's very
nice).
Simplify things by not allowing lookahead/behind assertions. Since
you're using MySQL's regexp engine, you would also check for any
other construct it doesn't support.
Convert the tree into "real" characters, so that "category_word"
is "_ABCDEF...Zabc...z", etc.
The time slowdown occurs for backtracking, especially if there are
multiple levels of backtracking. Backtracking can only occur if
there are is a branch where either route accepts the same character.
The only ones to be concerned about are +, * and {}s, so look for
those nodes and make sure the next node in the expression doesn't
contain the same character.
The other option is to write a regex engine which uses a DFA instead
of an NFA, but that's not a good solution for your problem.
Andrew
dalke at acm.org
More information about the Python-list
mailing list