Issue with regular expressions

Paul McGuire ptmcg at austin.rr.com
Tue Apr 29 16:20:44 CEST 2008


On Apr 29, 8:46 am, Julien <jpha... at gmail.com> wrote:
> I'd like to select terms in a string, so I can then do a search in my
> database.
>
> query = '   "  some words"  with and "without    quotes   "  '
> p = re.compile(magic_regular_expression)   $ <--- the magic happens
> m = p.match(query)
>
> I'd like m.groups() to return:
> ('some words', 'with', 'and', 'without quotes')
>
> Is that achievable with a single regular expression, and if so, what
> would it be?
>

Julien -

I dabbled with re's for a few minutes trying to get your solution,
then punted and used pyparsing instead.  Pyparsing will run slower
than re, but many people find it much easier to work with readable
class names and instances rather than re's typoglyphics:

from pyparsing import OneOrMore, Word, printables, dblQuotedString,
removeQuotes

# when a quoted string is found, remove the quotes,
# then strip whitespace from the contents
dblQuotedString.setParseAction(removeQuotes,
                   lambda s:s[0].strip())

# define terms to be found in query string
term = dblQuotedString | Word(printables)
query_terms = OneOrMore(term)

# parse query string to extract terms
query = '   "  some words"  with and "without    quotes   "  '
print tuple(query_terms.parseString(query))

Gives:
('some words', 'with', 'and', 'without    quotes')

The pyparsing wiki is at http://pyparsing.wikispaces.com.  You'll find
an examples page that includes a search query parser, and pointers to
a number of online documentation and presentation sources.

-- Paul



More information about the Python-list mailing list