Behavior of re.split on empty strings is unexpected
John Nagle
nagle at animats.com
Mon Aug 2 13:34:25 EDT 2010
The regular expression "split" behaves slightly differently than string
split:
>>> import re
>>> kresplit = re.compile(r'[^\w\&]+',re.UNICODE)
>>> kresplit2.split(" HELLO THERE ")
['', 'HELLO', 'THERE', '']
>>> kresplit2.split("VERISIGN INC.")
['VERISIGN', 'INC', '']
I'd thought that "split" would never produce an empty string, but
it will.
The regular string split operation doesn't yield empty strings:
>>> " HELLO THERE ".split()
['HELLO', 'THERE']
If I try to get the functionality of string split with re:
>>> s2 = " HELLO THERE "
>>> kresplit4 = re.compile(r'\W+', re.UNICODE)
>>> kresplit4.split(s2)
['', 'HELLO', 'THERE', '']
I still get empty strings.
The documentation just describes re.split as "Split string by the
occurrences of pattern", which is not too helpful.
John Nagle
More information about the Python-list
mailing list