[Python-bugs-list] RE: Bug in regular expression matcher (PR#112)

rhoward@ontrack.com rhoward@ontrack.com
Mon, 18 Oct 1999 17:41:21 -0400 (EDT)


Thank You both for responding.

The problem is, indeed, the RE that I am using.  I discovered the problem
shortly after reporting the "bug."

Because parts of the RE are pasted together to from a full RE, I didn't
realize that the problem was in one of the "component" expressions, since
they are stored in a separate file.

I apologize for jumping the gun on this and posting it as a bug.

Rick Howard
Ontrack Data International

> -----Original Message-----
> From: Guido van Rossum [mailto:bugs-py@python.org]
> Sent: Monday, October 18, 1999 3:22 PM
> To: rhoward@ontrack.com
> Subject: Re: Bug in regular expression matcher (PR#112)
>
>
> According to Andrew Kuchling, this is a bug in the regular expression.
> He had to infer some more information -- in particular, you seem to
> be using the regex module, not the re module.
>
> Here is his email:
>
> Guido van Rossum writes:
> >When using the re:
> >\(\([a-zA-Z]:\)?\([\][a-zA-Z0-9 $%'\_@~`!()^#&+,;=[-]+\)+\)?
> >[a-zA-Z0-9$%'\_@~`!()^#&+,;=[-]+\.[dD][lL][lL]
> >The re matcher goes into an infinite loop on this input:
> >\XXX\X\X\\\\\X\\\XXXX\X\X\\\\\X\\\\X\\\XXX\\\\\\\kX
>
>  From the use of \( \), I'd assume this applies to the regex module,
> not the re module.  If you chop off the part of the pattern I've put
> on line 2, you find that the first part always matches the whole
> string.  This stems from an error in the pattern: [\] should be [\\].
> That bit of the pattern is intended to match \ followed by a chunk of
> characters, but because of the error, the ] is escaped and the whole
> thing is treated as the character set ][a-zA-Z0-9 $%'\_@~`!()^#&+,;=[-
> , followed by a +, followed by a +.  (The \_ is also odd, because you
> don't need to escape the _; I suspect that should be \\_.  Even better
> would be [^\\]+, which is *much* shorter.)  This produces a
> combinatoric explosion, as every possible combination of groupings is
> tried, but all fail.
>
> Translating to re's syntax and writing the pattern
> with re.VERBOSE, you get:
>
> pat = re.compile(r"""
>

>   ([a-zA-Z]:)?  # Match a drive letter
>   ([\\][a-zA-Z0-9 $%'\_@~`!()^#&+,;=[-]+)+  # Match any number of
> directories
> )?
>  # Match a filename ending in .dll
>  [a-zA-Z0-9$%'_@~`!()^#&+,;=[-]+ \. [dD][lL][lL]""", re.VERBOSE)
>
> Note that this pattern seems to be trying to match filenames ending in
> .dll; it would be much easier to do "path,filename =
> os.path.split(filename) ; if string.lower(filename[-4:]) == '.dll':
> ...".
>