filtering out "bad" regular expressions from user input
Skip Montanaro
skip at mojam.com
Fri Sep 29 14:37:56 EDT 2000
Andrew> Skip Montanaro wrote:
>> I'm looking for a better approach to filtering out bad regular expressions.
Andrew> Here's one approach:
Andrew> Parse the regular expression (get /F's sre_parse from 2.0 - it's
Andrew> very nice).
Thanks, I'll check into that.
Andrew> Simplify things by not allowing lookahead/behind assertions.
Andrew> Since you're using MySQL's regexp engine, you would also check
Andrew> for any other construct it doesn't support.
Andrew> Convert the tree into "real" characters, so that "category_word"
Andrew> is "_ABCDEF...Zabc...z", etc.
Andrew> The time slowdown occurs for backtracking, especially if there are
Andrew> multiple levels of backtracking.
As usual, I specified the problem incompletely. The problem isn't so much
regular expressions that perform poorly when matched against particular
strings. It is that some very simple regular expressions (like ".*") can
match all (or almost all) records in a database of 20,000 or so rows.
Marshalling and returning that amount of information from MySQL to my
front-end code takes a substantial amount of time, as you might imagine.
Skip
More information about the Python-list
mailing list