splitting words with brackets
Tim Chase
python.list at tim.thechases.com
Wed Jul 26 16:37:25 EDT 2006
> "a (b c) d [e f g] h i"
> should be splitted to
> ["a", "(b c)", "d", "[e f g]", "h", "i"]
>
> As speed is a factor to consider, it's best if there is a
> single line regular expression can handle this. I tried
> this but failed:
> re.split(r"(?![\(\[].*?)\s+(?!.*?[\)\]])", s). It work
> for "(a b) c" but not work "a (b c)" :(
>
> Any hint?
[and later added]
> sorry i forgot to give a limitation: if a letter is next
> to a bracket, they should be considered as one word. i.e.:
> "a(b c) d" becomes ["a(b c)", "d"] because there is no
> blank between "a" and "(".
>>> import re
>>> s ='a (b c) d [e f g] h ia abcd(b c)xyz d [e f g] h i'
>>> r = re.compile(r'(?:\S*(?:\([^\)]*\)|\[[^\]]*\])\S*)|\S+')
>>> r.findall(s)
['a', '(b c)', 'd', '[e f g]', 'h', 'ia', 'abcd(b c)xyz', 'd',
'[e f g]', 'h', 'i']
I'm sure there's a *much* more elegant pyparsing solution to
this, but I don't have the pyparsing module on this machine.
It's much better/clearer and will be far more readable when
you come back to it later.
However, the above monstrosity passes the tests I threw at
it.
-tkc
More information about the Python-list
mailing list