Behavior of re.split on empty strings is unexpected

John Nagle nagle at animats.com
Mon Aug 2 13:34:25 EDT 2010


The regular expression "split" behaves slightly differently than string 
split:

 >>> import re
 >>> kresplit = re.compile(r'[^\w\&]+',re.UNICODE)	

 >>> kresplit2.split("   HELLO    THERE   ")
['', 'HELLO', 'THERE', '']

 >>> kresplit2.split("VERISIGN INC.")
['VERISIGN', 'INC', '']

I'd thought that "split" would never produce an empty string, but
it will.

The regular string split operation doesn't yield empty strings:

 >>> "   HELLO   THERE ".split()
['HELLO', 'THERE']

If I try to get the functionality of string split with re:

 >>> s2 = "   HELLO   THERE  "
 >>> kresplit4 = re.compile(r'\W+', re.UNICODE)
 >>> kresplit4.split(s2)
['', 'HELLO', 'THERE', '']

I still get empty strings.

The documentation just describes re.split as "Split string by the 
occurrences of pattern", which is not too helpful.

					John Nagle



More information about the Python-list mailing list