trouble with regex?
MRAB
python at mrabarnett.plus.com
Thu Oct 8 12:42:21 EDT 2009
inhahe wrote:
> Can someone tell me why this doesn't work?
>
> colorre = re.compile ('('
> '^'
> '|'
> '(?:'
> '\x0b(?:10|11|12|13|14|15|0\\d|\\d)'
> '(?:'
> ',(?:10|11|12|13|14|15|0\\d|\\d)'
> ')?'
> ')'
> ')(.*?)')
>
> I'm trying to extract mirc color codes.
>
> this works:
>
> colorre = re.compile ('\x0b(?:10|11|12|13|14|15|0\\d|\\d)'
> '(?:'
> ',(?:10|11|12|13|14|15|0\\d|\\d)'
> ')?'
> )
>
> but I wanted to modify it so that it returns me groups of (color code,
> text after the code), except for the first text at the beginning of the
> string before any color code, for which it should return ('', text).
> that's what the first paste above is trying to do, but it doesn't work.
> here are some results:
>
> >>> colorre.findall('a\x0b1,1')
> [('', ''), ('\x0b1,1', '')]
> >>> colorre.findall('a\x0b1,1b')
> [('', ''), ('\x0b1,1', '')]
> >>> colorre.findall('ab')
> [('', '')]
> >>> colorre.findall('\x0b1,1')
> [('', '')]
> >>> colorre.findall('\x0b1,1a')
> [('', '')]
> >>>
>
> i can easily work with the string that does work and just use group
> starting and ending positions, but i'm curious as to why i can't get it
> working teh way i want :/
>
The problem with the regex is that .*? is a lazy repeat: it'll try to
match as few characters as possible, which is why the second group is
always ''. Try a greedy repeat instead, but matching only
non-backspaces:
colorre = re.compile('('
'^'
'|'
'(?:'
'\x0b(?:10|11|12|13|14|15|0\\d|\\d)'
'(?:'
',(?:10|11|12|13|14|15|0\\d|\\d)'
')?'
')'
')([^\x0b]*)')
More information about the Python-list
mailing list