split on blank lines

Jan Burgy jburgy at hotmail.com
Tue Dec 2 11:32:25 CET 2003

Duncan Booth <duncan at NOSPAMrcp.co.uk> wrote in message news:<Xns9444932BB7A17duncanrcpcouk at>...
> jburgy at hotmail.com (Jan Burgy) wrote in 
> news:807692de.0312010610.4461c0e3 at posting.google.com:
> > can somebody tell me why (using Python 2.3.2)
> > 
> >>>> import re
> >>>> re.compile(r"^$", re.MULTILINE).split("foo\n\nbar\n\nbaz")
> > ['foo\n\nbar\n\nbaz']
> > 
> > ? Being used to Perl semantics, I expect
> > 
> > ['foo\n', 'bar\n', 'baz']
> > 
> > or something equivalent without the '\n' characters in the result
> > strings. I have found that
> > 
> >>>> re.compile(r"^\n", re.MULTILINE).split("foo\n\nbar\n\nbaz")
> > ['foo\n', 'bar\n', 'baz']
> > 
> > I prefer the first version however because my intent is stated more
> > clearly. Could this be a bug in sre.py (I looked at the code for a
> > good two minutes but then my head started hurting)
> > 
> Given that re.compile("^$", re.MULTILINE).findall("foo\n\nbar\n\nbaz") 
> returns ['', ''] I would agree this looks like a bug. You could submit a 
> bug report on Sourceforge.
> Of course, if you really want to state your intentions, you could just use:
>    >>> "foo\n\nbar\n\nbaz".split('\n\n')
>    ['foo', 'bar', 'baz']
> as you aren't doing anything here that obviously benefits from regex 
> obfuscation.

Thank you Duncan for your input. You're right, I will post a bug
report on sourceforge. Why, you ask, do I split on "^$" and not simply
"\n\n"? Simply because I'm dealing with an idiotic file format (not my
own mind you) and that I really want to split on "^\t*$" (I agree with
you that it's a rather arbitrary definition of a blank line, once
again, not mine). When the above didn't work, I spent a long time
questioning my understanding of regular expressions until I could
simplify my code to the minimal amount that still yielded the error.
Sometimes I wish that Python contained more elements from AWK (in
particularly "RS" for instance)



Being an actuary is a lot harder than being a mathematician: it is
enough for a mathematician to prove that he or she is right.

More information about the Python-list mailing list