Parser or regex ?
Fuzzyman
fuzzyman at gmail.com
Fri Dec 16 09:38:54 EST 2005
Hello all,
I'm writing a module that takes user input as strings and (effectively)
translates them to function calls with arguments and keyword
arguments.to pass a list I use a sort of 'list constructor' - so the
syntax looks a bit like :
checkname(arg1, "arg 2", 'arg 3', keywarg="value",
keywarg2='value2', default=list("val1", 'val2'))
Worst case anyway :-)
I can handle this with regular expressions but they are becoming truly
horrible. I wonder if anyone has any suggestions on optimising them. I
could hand write a parser - which would be more code, probably slower -
but less error prone. (Regualr expressions are subject to obscure
errors - especially the ones I create).
The trouble is that I have to pull out the separate arguments, then
pull apart the keyword arguments and the list keyword arguments. This
makes it a 'multi-pass' task - and I wondered if there was a better way
to do it.
As I use ``findall`` to pull out all the arguments - so I also have to
use a *very similar* regex to first check that there are no errors (as
findall will just miss out badly formed parts of the input).
My current approach is :
pull out the checkname and *all* the arguments using :
'(.+?)\((.*)\)'
I then have :
_paramstring = r'''
(?:
(
(?:
[a-zA-Z_][a-zA-Z0-9_]*\s*=\s*list\(
(?:
\s*
(?:
(?:".*?")| # double quotes
(?:'.*?')| # single quotes
(?:[^'",\s\)][^,\)]*?) # unquoted
)
\s*,\s*
)*
(?:
(?:".*?")| # double quotes
(?:'.*?')| # single quotes
(?:[^'",\s\)][^,\)]*?) # unquoted
)? # last one
\)
)|
(?:
(?:".*?")| # double quotes
(?:'.*?')| # single quotes
(?:[^'",\s=][^,=]*?)| # unquoted
(?: # keyword argument
[a-zA-Z_][a-zA-Z0-9_]*\s*=\s*
(?:
(?:".*?")| # double quotes
(?:'.*?')| # single quotes
(?:[^'",\s=][^,=]*?) # unquoted
)
)
)
)
(?:
(?:\s*,\s*)|(?:\s*$) # comma
)
)
'''
I can use ``_paramstring`` with findall to pull out all the arguments.
However - as I said, I first need to check that the entrie input is
well formed. So I do a match against :
_matchstring = '^%s*' % _paramstring
Having done a match I can use findall and ``_paramstring`` to pull out
*all* the parameters as a list - and go through each one checking if it
is a single argument, keyword argument or list constructor.
For keyword arguments and lists constructors I use another regular
expression (the appropriate part of _paramstring basically) to pull out
the values from that.
Now this approach works - but it's hardly "optimal" (for some value of
optimal). I wondered if anyone could suggest a better approach.
All the best,
Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
More information about the Python-list
mailing list