[ python-Bugs-738361 ] crash error in glob.glob;
directories with brackets
SourceForge.net
noreply at sourceforge.net
Mon Sep 13 04:55:56 CEST 2004
Bugs item #738361, was opened at 2003-05-15 12:06
Message generated for change (Settings changed) made by progoth
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=738361&group_id=5470
Category: Extension Modules
Group: Python 2.2.2
>Status: Deleted
Resolution: None
Priority: 5
Submitted By: Steven Scott (progoth)
Assigned to: Nobody/Anonymous (nobody)
Summary: crash error in glob.glob; directories with brackets
Initial Comment:
I'm attaching a zip file containing a python file and
directory structure to test this.
I ran into this bug in real life work, so, as contrived
as the bug test may look, it happens.
I was writing a function which recurses through
directories and does stuff with the files it finds.
glob.glob() doesn't return any files inside a directory
named [_]
glob.glob() crashes on a directory named [A--_B]. I
tried a few different combinations of characters inside
brackets, but this was the only one I could get it to
crash on.
the crash happens during the regular expression
compilation, as probably can be surmised by seeing the
characters which cause it ( [] ). it also may be a
combination of that and using \ as the directory
delimiter since this is win32.
File "C:\temp\globbug\bug.py", line 5, in test
fs = glob.glob( path + '\*' )
File "C:\Python22\lib\glob.py", line 24, in glob
list = glob(dirname)
File "C:\Python22\lib\glob.py", line 37, in glob
sublist = glob1(dirname, basename)
File "C:\Python22\lib\glob.py", line 50, in glob1
return fnmatch.filter(names,pattern)
File "C:\Python22\lib\fnmatch.py", line 47, in filter
_cache[pat] = re.compile(res)
File "C:\Python22\lib\sre.py", line 179, in compile
return _compile(pattern, flags)
File "C:\Python22\lib\sre.py", line 229, in _compile
raise error, v # invalid expression
sre_constants.error: bad character range
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-05-17 19:43
Message:
Logged In: YES
user_id=31435
The heart of the problem seems to be the comment in
fnmatch.py's translate() docstring:
"""Translate a shell PATTERN to a regular expression.
There is no way to quote meta-characters.
"""
So it looks like an undocumented design limitation.
----------------------------------------------------------------------
Comment By: Steven Scott (progoth)
Date: 2003-05-16 15:32
Message:
Logged In: YES
user_id=61663
So a co-worker pointed out that you could have directorys
like mine, but say, numbered:
[A--_B]1
[A--_B]2
etc
say you wanted a pattern like '[A--_B]?' to get them
all....that's not a valid directory, so it definitely needs
to do some wildcard expansion...but it doesn't need to mess
with what's inside the brackets.
fnmatch probably shouldn't throw an exception in any
case...regardless, we're of the opinion that the only
logical way around this issue of wildcard characters in
filenames is to have the programmer escape stuff manually.
so r"\[A--_B]?" would be what is needed.
python/glob/fnmatch can't read the programmer's mind in a
pattern with wildcards which ones are supposed to be pattern
or not.
to take this route, fnmatch would have to be modified to
recognize characters that are \-escaped, because it doesn't
at the moment.
or maybe that's not the best solution.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2003-05-16 14:20
Message:
Logged In: YES
user_id=80475
Okay. See if you can come-up with a more elegant patch
that only touches the glob module. If you can see a
straight-forward way to test it, then some unittests would
be nice also.
----------------------------------------------------------------------
Comment By: Steven Scott (progoth)
Date: 2003-05-16 00:41
Message:
Logged In: YES
user_id=61663
brackets are valid file/dir names in unix, too. in fact, if I'm not mistaken, the
only 2 characters not allowed in unix file names are / and \0. I don't see
how it's not a bug if glob tries to read the files in a directory that exists
and crashes (or doesn't read them).
as for how it should be fixed, I have no idea. my patch isn't very elegant.
btw, I just ran this on unix (after changing the \ to / in the test script) and
the exact same behavior was exhibited.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2003-05-16 00:24
Message:
Logged In: YES
user_id=80475
This doesn't seem like a bug to me. Those strange names
have the Unix style magic characters in them.
Unfortunately, brackets are valid file/dir names in
Windows.
If anything were changed, I would prefer strengthening the
magic character recognizer from:
magic_check = re.compile('[*?[]')
to something that can treat ill-formed bracket expressions
as being non-magic.
When posting a bug report, please avoid zip files and
multiple test scripts. It is enough to include in the text of
the report something like this:
glob.glob('[_]/*') # fails to recognize a win directory
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=738361&group_id=5470
More information about the Python-bugs-list
mailing list