regex help: splitting string gets weird groups

gry georgeryoung at gmail.com
Thu Apr 8 20:49:01 CEST 2010


[ python3.1.1, re.__version__='2.2.1' ]
I'm trying to use re to split a string into (any number of) pieces of
these kinds:
1) contiguous runs of letters
2) contiguous runs of digits
3) single other characters

e.g.   555tHe-rain.in#=1234   should give:   [555, 'tHe', '-', 'rain',
'.', 'in', '#', '=', 1234]
I tried:
>>> re.match('^(([A-Za-z]+)|([0-9]+)|([-.#=]))+$', '555tHe-rain.in#=1234').groups()
('1234', 'in', '1234', '=')

Why is 1234 repeated in two groups?  and why doesn't "tHe" appear as a
group?  Is my regexp illegal somehow and confusing the engine?

I *would* like to understand what's wrong with this regex, though if
someone has a neat other way to do the above task, I'm also interested
in suggestions.



More information about the Python-list mailing list