Regular Expression Question
Kragen Sitaker
kragen at dnaco.net
Tue Apr 3 20:03:25 EDT 2001
In article <3aca5b94_2 at news.nwlink.com>,
Wesley Witt <wesw at wittfamily.com> wrote:
>This is probably a simple question, but I can't seem to find the answer
>anywhere.
>
>I want a regular expression that will match ALL lines that do NOT contain
>the string "skip". I have some backup logs that I need to filter the noise
>out of.
Don't do this. Your successor maintainers will curse you, your boss
will fire you, and your dog will pee on you.
Just say:
for line in file.getlines():
if string.find(line, 'skip') == -1:
outfile.write(line)
But if you're curious:
You can match a line not containing 's' simply: re.compile("^[^s]*$").
You can match a line not containing 'sk' with more difficulty:
re.compile("^([^s]|s+[^sk])*$")
'ski' is a little harder; I think there's an easier way to do this, but
I don't know what it is:
re.compile("^([^s]|(s(ks)*)+([^sk]|k[^is]))*$")
(I think there's an easier way because the above RE is not strictly
deterministic --- it has to push two states when it sees 'sk', one
for k's followed by s and one followed by [^is].)
All of these REs have a bug: if a prefix of the evil sequence occurs at
the end of a line, they fail. I'm not sure how to fix that, and I
don't want to extend it to 'skip'.
--
<kragen at pobox.com> Kragen Sitaker <http://www.pobox.com/~kragen/>
Perilous to all of us are the devices of an art deeper than we possess
ourselves.
-- Gandalf the White [J.R.R. Tolkien, "The Two Towers", Bk 3, Ch. XI]
More information about the Python-list
mailing list