Weird problem matching with REs
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Sun May 29 09:09:47 EDT 2011
On Sun, 29 May 2011 06:45:30 -0500, Andrew Berg wrote:
> I have an RE that should work (it even works in Kodos [1], but not in my
> code), but it keeps failing to match characters after a newline.
Not all regexes are the same. Different regex engines accept different
symbols, and sometimes behave differently, or have different default
behavior. That your regex works in Kodos but not Python might mean you're
writing a Kodus regex instead of a Python regex.
> I'm writing a little program that scans the webpage of an arbitrary
> application and gets the newest version advertised on the page.
Firstly, most of the code you show is irrelevant to the problem. Please
simplify it to the shortest, most simple example you can give. That would
be a simplified piece of text (not the entire web page!), the regex, and
the failed attempt to use it. The rest of your code is just noise for the
purposes of solving this problem.
Secondly, you probably should use a proper HTML parser, rather than a
regex. Resist the temptation to use regexes to rip out bits of text from
HTML, it almost always goes wrong eventually.
> I was able to make a regex that matches in my code, but it shouldn't:
> http://x264.nl/x264/64bit/8bit_depth/revision.\n{1,3}[0-9]{4}.\n{1,3}/
x264.\n{1,3}.\n{1,3}.exe
What makes you think it shouldn't match?
By the way, you probably should escape the dots, otherwise it will match
strings containing any arbitrary character, rather than *just* dots:
http://x264Znl ...blah blah blah
--
Steven
More information about the Python-list
mailing list