[CentralOH] 2014-01-17_道場_Scribbles_ 落書/惡文? Much more

Eric Floehr eric at intellovations.com
Tue Jan 21 21:29:16 CET 2014


>
> re.split("(\s|\.|\!|\?)+", "Blah. Blah!Blah! Blah Blah?Blah")
>
> Results in:
>
> ['Blah', ' ', 'Blah', '!', 'Blah', ' ', 'Blah', ' ', 'Blah', '?', 'Blah']
>
> Where as in Java:
>
> "Blah. Blah!Blah! Blah Blah?Blah".split("(\\s|\\.|\\!|\\?)+")
>
> Results in:
>
> [Blah,Blah,Blah,Blah,Blah,Blah]
>
>
> Don't blame me. I didn't break it.
>


Parens are a "capturing group" versus brackets which are a "character
class".  Python's split will "capture" matched groups in the split, whereas
Java doesn't. In other contexts, using parens indicates to the regex engine
you'd like to keep the match to refer to it later.

Perhaps Python's implementation is less broken in that regard.

To be correct in both Java and Python, this regex should be using brackets.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/centraloh/attachments/20140121/a00d6665/attachment.html>


More information about the CentralOH mailing list