[Tutor] Splitting on punctuation
eryksun
eryksun at gmail.com
Mon Jun 10 12:12:18 CEST 2013
On Mon, Jun 10, 2013 at 4:27 AM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
>
>>>> string.punctuation
> '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>>> re.split("[" + string.punctuation + "]+", "yes, but no. But: yes, no")
> ['yes', ' but no', ' But', ' yes', ' no']
Even though you didn't use re.escape(), that almost works, except for
backslash. Since the string doesn't start with ^ or end with ],
neither is treated specially. Also, because string.punctuation is
sorted, the range ,-. is valid, and even correct:
>>> pat = re.compile('[,-.]', re.DEBUG)
in
range (44, 46)
>>> map(ord, ',-.')
[44, 45, 46]
However, the otherwise harmless escape \] does consume the backslash.
So remember to use re.escape.
Without re.escape:
>>> pat1 = re.compile('[%s]+' % string.punctuation)
>>> pat1.split(r'yes, but no... But: yes\no')
['yes', ' but no', ' But', ' yes\\no']
With re.escape:
>>> pat2 = re.compile('[%s]+' % re.escape(string.punctuation))
>>> pat2.split(r'yes, but no... But: yes\no')
['yes', ' but no', ' But', ' yes', 'no']
More information about the Tutor
mailing list