[New-bugs-announce] [issue12325] regex matches incorrectly on literal dot (99.9% confirmed)
Cal Leeming
report at bugs.python.org
Mon Jun 13 14:24:51 CEST 2011
New submission from Cal Leeming <cal.leeming at simplicitymedialtd.co.uk>:
I believe I might have found a bug in the Python re libraries. Here is a complete debug of what is happening (my apologies for the nature of the actual text). I have ran this regex through RegexBuddy (and a few other tools), and all of them do the correct action (which is to not do any replacement), apart from Python. I haven't yet tried this in another language.
------------ ORIGINAL TEXT ------------
>>313229176
me and a buddy and his girlfriend were watching tv once and this blabbering idiot starts talking about this scientific study she heard about where they built a fake city and only one guy didn't know that it was a fake. we all paused for a second and i said "the truman show?" and she says "yeah! that was the name of it!" me my buddy and his girlfriend all catch eyes and are baffled at how stupid she was
----------------------------------------
------------ TEXT AFTER REGEX SUB ------------
me and a buddy and his girlfriend were http://watching.tv once and this blabbering idiot starts talking about this scientific study she heard about where they built a fake city and only one guy didn't know that it was a fake.we all paused for a second and i said "the truman show?" and she says "yeah! that was the name of it!" me my buddy and his girlfriend all catch eyes and are baffled at how stupid she was
-----------------------------------------------
----------- REPLACED TEXT -----------
watching tv
http://watching.tv
-----------------------------------------------
---- REGEX ----
_t = re.compile(r"(^| )((?:[\w\-]{2,}?\.|)(?:[\w\-]{2,}?)(?:\.com|\.net|\.org|\.co\.uk|\.tv|\.ly))", flags = re.IGNORECASE | re.MULTILINE | re.DEBUG)
---- COMMAND ----
_t.sub("\\1http://\\2", original_message_here)
---- REGEX DEBUG ----
subpattern 1
branch
at at_beginning
or
literal 32
subpattern 2
subpattern None
branch
min_repeat 2 65535
in
category category_word
literal 45
literal 46
or
subpattern None
min_repeat 2 65535
in
category category_word
literal 45
subpattern None
literal 46
branch
literal 99
literal 111
literal 109
or
literal 110
literal 101
literal 116
or
literal 111
literal 114
literal 103
or
literal 99
literal 111
literal 46
literal 117
literal 107
or
literal 116
literal 118
or
literal 108
literal 121
----------
components: Regular Expressions
messages: 138234
nosy: Cal.Leeming
priority: normal
severity: normal
status: open
title: regex matches incorrectly on literal dot (99.9% confirmed)
versions: Python 2.7
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12325>
_______________________________________
More information about the New-bugs-announce
mailing list