Strange re behavior: normal?
Michael Janssen
Janssen at rz.uni-frankfurt.de
Sun Aug 17 13:06:48 EDT 2003
Robin Munn wrote:
> How is re.split supposed to work? This wasn't at all what I expected:
>>>>import re
>>>>re.split(r'\b', 'a b c d')
> ['a b c d']
the code (INSTALL_DIR/Modul/_sre.c function pattern_split) seems to show
this behavior on intention. At least this if-clause has no other purpose
to my eyes:
if (state.start == state.ptr) { # empty string? mj
if (last == state.end)
break;
/* skip one character */
state.start = (void*) ((char*) state.ptr + state.charsize);
continue;
}
Well, I belive it's good choice, to not split a string by an empty
string, but when you really want (version with empty results on start
and end omitted):
def boundary_split(s):
back = []
while 1:
try:
# r'.\b' and +1 prevents endless loop
pos = re.search(r'.\b', s, re.DOTALL).start()+1
except AttributeError:
if s: back.append(s)
break
back.append(s[:pos])
s = s[pos:]
return back
boundary_split('a b c d')
#['a', ' ', 'b', ' ', 'c', ' ', 'd']
What's the good of splitting by boundaries? Someone else wanted this a
few days ago on tutor and I can't figure out a reason by now.
Michael
More information about the Python-list
mailing list