A problem with re.split()

Fri Jun 4 02:24:15 EDT 1999

[Hrvoje Niksic]
> I understand that the idea with "re" module is for it to behave as
> closely to Perl's regular expressions as possible (which is why the
> order of arguments in string.split and re.split is different,
> string.split being the logical one).

The regexp *language* was meant to be compatible with (a snapshot of
then-current) Perl's, but there wasn't much desire to mimic the rest.

WRT argument ordering, this was deliberate but for a different reason:
every function in the string module takes "the string" as its first argument
because that's the ordering that makes the most sense if you think of the
string as being an object and the function a method of that object.
Likewise every function in the re module takes "the regexp" as its first
argument.  Intra-module consistency was judged to be more important than
inter-module consistency in the case of "split".  Not a pure win, but still
seems the lesser evil.

> The problem is with re.split() in this case:
>
> $ python
> Python 1.5.2 (#3, May 23 1999, 19:57:40)  [GCC 2.8.1] on sunos5
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> import re
> >>> re.split('', 'foo')
> ['foo']
>
> Perl splits it to ['f', 'o', 'o']:
>
> $ perl -e 'print join(":", split("", "foo"))'
> f:o:o

Python won't change here (IMO) -- re.split needed to be compatible with the
old regsub.split too, and (IMO) that had a more sensible rule than Perl's:
when splitting on a pattern, never take an empty match as a split point.
That's really all that's going on here; e.g., try

perl -e 'print join(":", split(/b?/, "foobar"))'

Did you expect f:o:o:a:r?  I like Python's ["foo", "ar"] much better.

BTW, Python's string.split also refuses to split on an empty separator.

Also note:

>>> list("foo")
['f', 'o', 'o']
>>>

adopting-perl's-language-didn't-imply-adopting-its-accent<wink>-ly y'rs  -
tim