Behavior of re.split on empty strings is unexpected
John Nagle
nagle at animats.com
Mon Aug 2 17:22:25 EDT 2010
On 8/2/2010 12:52 PM, Thomas Jollans wrote:
> On 08/02/2010 09:41 PM, John Nagle wrote:
>> On 8/2/2010 11:02 AM, MRAB wrote:
>>> John Nagle wrote:
>>>> The regular expression "split" behaves slightly differently than
>>>> string split:
>> occurrences of pattern", which is not too helpful.
>>>>
>>> It's the plain str.split() which is unusual in that:
>>>
>>> 1. it splits on sequences of whitespace instead of one per occurrence;
>>
>> That can be emulated with the obvious regular expression:
>>
>> re.compile(r'\W+')
>>
>>> 2. it discards leading and trailing sequences of whitespace.
>>
>> But that can't, or at least I can't figure out how to do it.
>
> [ s in rexp.split(long_s) if s ]
Of course I can discard the blank strings afterward, but
is there some way to do it in the "split" operation? If
not, then the default case for "split()" is too non-standard.
(Also, "if s" won't work; if s != '' might)
John Nagle
More information about the Python-list
mailing list