[ python-Bugs-852532 ] ^$ won't split on empty line
SourceForge.net
noreply at sourceforge.net
Sun Jul 11 05:32:35 CEST 2004
Bugs item #852532, was opened at 2003-12-02 05:01
Message generated for change (Comment added) made by mkc
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=852532&group_id=5470
Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: Postponed
Priority: 5
Submitted By: Jan Burgy (jburgy)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: ^$ won't split on empty line
Initial Comment:
Python 2.3.2 (#49, Oct 2 2003, 20:02:00) [MSC v.1200
32 bit (Intel)] on win32
>>> import re
>>> re.compile('^$', re.MULTILINE).split('foo\n\nbar')
['foo\n\nbar']
I expect ['foo\n', '\nbar'], since, according to the
documentation $ "in MULTILINE mode also matches
before a newline".
Thanks, Jan
----------------------------------------------------------------------
Comment By: Mike Coleman (mkc)
Date: 2004-07-10 22:32
Message:
Logged In: YES
user_id=555
I made a patch that addresses this (#988761).
----------------------------------------------------------------------
Comment By: Jan Burgy (jburgy)
Date: 2004-01-14 05:07
Message:
Logged In: YES
user_id=618572
Since I really needed the functionality described above, I
came up with a broke-around. It's a sufficient replacement,
maybe it belongs in some FAQ:
>>> import re
>>> re.sub('(?im)^$', '\f', 'foo\n\nbar').split('\f')
['foo\n', '\nbar']
Another "magic" byte could replace '\f'...
Regards, Jan
----------------------------------------------------------------------
Comment By: Mike Coleman (mkc)
Date: 2003-12-31 23:28
Message:
Logged In: YES
user_id=555
Hi, I was going to file this bug just now myself, as this
seems like a really useful feature. For example, I've
several times wanted to split on '^' or '^(?=S)' (to split
up a data file into paragraphs that start with an initial
S). Instead I have to do something like '\n(?=S)', which is
rather more hideous.
To answer tim_one's challenge, yes, I *do* expect splitting
by 'x*' to break a string into letters, now that I've
thought about it. To not do so is a bizarre and surprising
behavior, IMO. (Patient: Doctor, when I split on this
nonsense pattern I get nonsense! Doctor: Then don't do that.)
The fix should be near this line in _sre.c, I think.
if (state.start == state.ptr) {
I could work on a patch if you'll take it...
Mike
----------------------------------------------------------------------
Comment By: Fredrik Lundh (effbot)
Date: 2003-12-11 07:42
Message:
Logged In: YES
user_id=38376
Split never splits on empty substrings; see Tim's answer for a
brief discussion.
Fred, can you perhaps add something to the documentation?
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-12-02 09:20
Message:
Logged In: YES
user_id=31435
Confirmed on Pythons 2.1.3, 2.2.3, 2.3.2, and current CVS.
More generally, split() doesn't appear to split on any empty
(0-length) match. For example,
>>> pat = re.compile(r'\b')
>>> pat.split('(a b)')
['(a b)']
>>> pat.findall('(a b)') # but the pattern matches 4 places
['', '', '', '']
>>>
That's probably a design constraint, but isn't documented.
For example, if you split "abc" by the pattern x*, what do you
expect? The pattern matches (with length 0) at 4 places,
but I bet most people would be surprised to get
['', 'a', 'b', 'c', '']
back instead of (as they do get)
['abc']
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=852532&group_id=5470
More information about the Python-bugs-list
mailing list