Issue with regular expressions

harvey.thomas at informa.com harvey.thomas at informa.com
Tue Apr 29 10:50:46 EDT 2008


On Apr 29, 2:46 pm, Julien <jpha... at gmail.com> wrote:
> Hi,
>
> I'm fairly new in Python and I haven't used the regular expressions
> enough to be able to achieve what I want.
> I'd like to select terms in a string, so I can then do a search in my
> database.
>
> query = '   "  some words"  with and "without    quotes   "  '
> p = re.compile(magic_regular_expression)   $ <--- the magic happens
> m = p.match(query)
>
> I'd like m.groups() to return:
> ('some words', 'with', 'and', 'without quotes')
>
> Is that achievable with a single regular expression, and if so, what
> would it be?
>
> Any help would be much appreciated.
>
> Thanks!!
>
> Julien

You can't do it simply and completely with regular expressions alone
because of the requirement to strip the quotes and normalize
whitespace, but its not too hard to write a function to do it. Viz:

import re

wordre = re.compile('"[^"]+"|[a-zA-Z]+').findall
def findwords(src):
    ret = []
    for x in wordre(src):
        if x[0] == '"':
            #strip off the quotes and normalise spaces
            ret.append(' '.join(x[1:-1].split()))
        else:
            ret.append(x)
    return ret

query = '   "  Some words"  with    and "without    quotes   "  '
print findwords(query)

Running this gives
['Some words', 'with', 'and', 'without quotes']

HTH

Harvey



More information about the Python-list mailing list