Behavior of re.split on empty strings is unexpected

Mon Aug 2 15:52:09 EDT 2010

On 08/02/2010 09:41 PM, John Nagle wrote:
> On 8/2/2010 11:02 AM, MRAB wrote:
>> John Nagle wrote:
>>> The regular expression "split" behaves slightly differently than
>>> string split:
> occurrences of pattern", which is not too helpful.
>>>
>> It's the plain str.split() which is unusual in that:
>>
>> 1. it splits on sequences of whitespace instead of one per occurrence;
> 
>    That can be emulated with the obvious regular expression:
> 
>     re.compile(r'\W+')
> 
>> 2. it discards leading and trailing sequences of whitespace.
> 
>    But that can't, or at least I can't figure out how to do it.

[ s in rexp.split(long_s) if s ]

> 
>> It just happens that the unusual one is the most commonly used one, if
>> you see what I mean! :-)
> 
>    The no-argument form of "split" shouldn't be that much of a special
> case.
> 
>                     John Nagle
>