Behavior of re.split on empty strings is unexpected

John Nagle nagle at animats.com
Mon Aug 2 17:22:25 EDT 2010


On 8/2/2010 12:52 PM, Thomas Jollans wrote:
> On 08/02/2010 09:41 PM, John Nagle wrote:
>> On 8/2/2010 11:02 AM, MRAB wrote:
>>> John Nagle wrote:
>>>> The regular expression "split" behaves slightly differently than
>>>> string split:
>> occurrences of pattern", which is not too helpful.
>>>>
>>> It's the plain str.split() which is unusual in that:
>>>
>>> 1. it splits on sequences of whitespace instead of one per occurrence;
>>
>>     That can be emulated with the obvious regular expression:
>>
>>      re.compile(r'\W+')
>>
>>> 2. it discards leading and trailing sequences of whitespace.
>>
>>     But that can't, or at least I can't figure out how to do it.
>
> [ s in rexp.split(long_s) if s ]

    Of course I can discard the blank strings afterward, but
is there some way to do it in the "split" operation?  If
not, then the default case for "split()" is too non-standard.

    (Also, "if s" won't work;   if s != ''   might)

				John Nagle



More information about the Python-list mailing list