[Python-Dev] re.split on empty patterns
Tim Peters
tim.peters at gmail.com
Sat Aug 7 19:58:16 CEST 2004
[A.M. Kuchling]
<amk at amk.ca> wrote:
> The re.split() method ignores zero-length pattern matches. Patch
> #988761 adds an emptyok flag to split that causes zero-length matches
> to trigger a split.
...
> IMHO this feature is clearly useful,
Yes it is! Or, more accurately, it can be, when it's intended to
match an empty string. It's a bit fuzzy because regexps are so
error-prone, and writing a regexp that matches an empty string by
accident is easy.
> and would be happy to commit the patch as-is.
Haven't looked at the patch, though.
> Question: do we want to make this option the new default? Existing
> patterns that can produce zero-length matches would change their
> meanings:
>
> >>> re.split('x*', 'abxxxcdefxxx')
> ['ab', 'cdef', '']
> >>> re.split('x*', 'abxxxcdefxxx', emptyok=True)
> ['', 'a', 'b', '', 'c', 'd', 'e', 'f', '', '']
>
> (I think the result of the second match points up a bug in the patch;
> the empty strings in the middle seem wrong to me. Assume that gets
> fixed.)
Agreed.
> Anyway, we therefore can't just make this the default in 2.4. We
> could trigger a warning when emptyok is not supplied and a split
> pattern results in a zero-length match; users could supply
> emptyok=False to avoid the warning. Patterns that never have a
> zero-length match would never get the warning. 2.5 could then set
> emptyok to True.
>
> Note: raising the warning might cause a serious performance hit for
> patterns that get zero-length matches a lot, which would make 2.4
> slower in certain cases.
If you don't intend to change the default, there's no problem. I like
"no problem". This isn't so useful so often that it can't afford to
wait for Python 3 to change. In the meantime, "emptyok" is an odd
name since it's always "ok" to have an empty match. "split0=True"
reads better to me, since the effect is to split on a 0-length match.
split_on_empty_match would be too wordy.
> Thoughts? Does this need a PEP?
It will if an argument starts now <wink>.
More information about the Python-Dev
mailing list