Behavior of re.split on empty strings is unexpected

John Nagle nagle at
Mon Aug 2 19:34:25 CEST 2010

The regular expression "split" behaves slightly differently than string 

 >>> import re
 >>> kresplit = re.compile(r'[^\w\&]+',re.UNICODE)	

 >>> kresplit2.split("   HELLO    THERE   ")
['', 'HELLO', 'THERE', '']

 >>> kresplit2.split("VERISIGN INC.")
['VERISIGN', 'INC', '']

I'd thought that "split" would never produce an empty string, but
it will.

The regular string split operation doesn't yield empty strings:

 >>> "   HELLO   THERE ".split()

If I try to get the functionality of string split with re:

 >>> s2 = "   HELLO   THERE  "
 >>> kresplit4 = re.compile(r'\W+', re.UNICODE)
 >>> kresplit4.split(s2)
['', 'HELLO', 'THERE', '']

I still get empty strings.

The documentation just describes re.split as "Split string by the 
occurrences of pattern", which is not too helpful.

					John Nagle

More information about the Python-list mailing list