[ python-Bugs-738361 ] crash error in glob.glob; directories with brackets

SourceForge.net noreply at sourceforge.net
Mon Sep 13 04:55:56 CEST 2004


Bugs item #738361, was opened at 2003-05-15 12:06
Message generated for change (Settings changed) made by progoth
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=738361&group_id=5470

Category: Extension Modules
Group: Python 2.2.2
>Status: Deleted
Resolution: None
Priority: 5
Submitted By: Steven Scott (progoth)
Assigned to: Nobody/Anonymous (nobody)
Summary: crash error in glob.glob; directories with brackets

Initial Comment:
I'm attaching a zip file containing a python file and
directory structure to test this.

I ran into this bug in real life work, so, as contrived
as the bug test may look, it happens.

I was writing a function which recurses through
directories and does stuff with the files it finds.

glob.glob() doesn't return any files inside a directory
named [_]

glob.glob() crashes on a directory named [A--_B].  I
tried a few different combinations of characters inside
brackets, but this was the only one I could get it to
crash on.

the crash happens during the regular expression
compilation, as probably can be surmised by seeing the
characters which cause it ( [] ).  it also may be a
combination of that and using \ as the directory
delimiter since this is win32.

  File "C:\temp\globbug\bug.py", line 5, in test
    fs = glob.glob( path + '\*' )
  File "C:\Python22\lib\glob.py", line 24, in glob
    list = glob(dirname)
  File "C:\Python22\lib\glob.py", line 37, in glob
    sublist = glob1(dirname, basename)
  File "C:\Python22\lib\glob.py", line 50, in glob1
    return fnmatch.filter(names,pattern)
  File "C:\Python22\lib\fnmatch.py", line 47, in filter
    _cache[pat] = re.compile(res)
  File "C:\Python22\lib\sre.py", line 179, in compile
    return _compile(pattern, flags)
  File "C:\Python22\lib\sre.py", line 229, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-05-17 19:43

Message:
Logged In: YES 
user_id=31435

The heart of the problem seems to be the comment in 
fnmatch.py's translate() docstring:

    """Translate a shell PATTERN to a regular expression.

    There is no way to quote meta-characters.
    """
So it looks like an undocumented design limitation.

----------------------------------------------------------------------

Comment By: Steven Scott (progoth)
Date: 2003-05-16 15:32

Message:
Logged In: YES 
user_id=61663

So a co-worker pointed out that you could have directorys
like mine, but say, numbered:
[A--_B]1
[A--_B]2
etc
say you wanted a pattern like '[A--_B]?' to get them
all....that's not a valid directory, so it definitely needs
to do some wildcard expansion...but it doesn't need to mess
with what's inside the brackets.
fnmatch probably shouldn't throw an exception in any
case...regardless, we're of the opinion that the only
logical way around this issue of wildcard characters in
filenames is to have the programmer escape stuff manually. 
so r"\[A--_B]?" would be what is needed. 
python/glob/fnmatch can't read the programmer's mind in a
pattern with wildcards which ones are supposed to be pattern
or not.
to take this route, fnmatch would have to be modified to
recognize characters that are \-escaped, because it doesn't
at the moment.
or maybe that's not the best solution.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-05-16 14:20

Message:
Logged In: YES 
user_id=80475

Okay.  See if you can come-up with a more elegant patch 
that only touches the glob module.  If you can see a 
straight-forward way to test it, then some unittests would 
be nice also.

----------------------------------------------------------------------

Comment By: Steven Scott (progoth)
Date: 2003-05-16 00:41

Message:
Logged In: YES 
user_id=61663

brackets are valid file/dir names in unix, too.  in fact, if I'm not mistaken, the 
only 2 characters not allowed in unix file names are / and \0.  I don't see 
how it's not a bug if glob tries to read the files in a directory that exists 
and crashes (or doesn't read them).

as for how it should be fixed, I have no idea.  my patch isn't very elegant.

btw, I just ran this on unix (after changing the \ to / in the test script) and 
the exact same behavior was exhibited.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-05-16 00:24

Message:
Logged In: YES 
user_id=80475

This doesn't seem like a bug to me.  Those strange names 
have the Unix style magic characters in them.  
Unfortunately, brackets are valid file/dir names in 
Windows.

If anything were changed, I would prefer strengthening the 
magic character recognizer from:
   magic_check = re.compile('[*?[]')
to something that can treat ill-formed bracket expressions 
as being non-magic.

When posting a bug report, please avoid zip files and 
multiple test scripts.  It is enough to include in the text of 
the report something like this:
    glob.glob('[_]/*')   # fails to recognize a win directory
    

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=738361&group_id=5470


More information about the Python-bugs-list mailing list