[Python-ideas] Add OS-dependent automatic glob support

Steven D'Aprano steve at pearwood.info
Mon Jan 5 03:07:35 CET 2015


On Mon, Jan 05, 2015 at 12:08:06PM +1100, Chris Angelico wrote:
> On Mon, Jan 5, 2015 at 11:57 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> > On Sun, Jan 04, 2015 at 06:10:01PM -0500, random832 at fastmail.us wrote:
> > [...]
> >> Right now, none of this is done. If you pass *.txt on the command line
> >> to a python script, it will attempt to open a file called "*.txt".
> >
> > On Windows, you will get Python's definition of globbing. On POSIX
> > systems, the globbing module will do nothing, because the shell will
> > most likely have already expanded the wild-cards
> 
> You're assuming that there are no wildcard characters actually
> included in the file names, which on Unix systems is perfectly
> possible. How can you reliably manipulate a file called *.txt when
> there are foo.txt and bar.txt in the same directory? With most Unix
> programs, you escape the asterisk to get it past the shell, the
> program does no globbing of its own, and you're safe. If your Python
> script unconditionally globs its file names, you're stuck.

You seem to be right. With double wildcard expansion (by the shell and 
by the python script) there doesn't seem to be a straightforward way to 
get just the file name with the wildcard in it:

[steve at ando junk]$ cat testglob.py
import sys
import glob
for arg in sys.argv[1:]:
    print(arg + ":-")
    for name in glob.glob(arg):
        print("  " + name)

[steve at ando junk]$ ls
eggs.jpg  *.jpg  testglob.py
[steve at ando junk]$ python3.3 testglob.py *.jpg
eggs.jpg:-
  eggs.jpg
*.jpg:-
  eggs.jpg
  *.jpg
[steve at ando junk]$ python3.3 testglob.py \*.jpg
*.jpg:-
  eggs.jpg
  *.jpg
[steve at ando junk]$ python3.3 testglob.py \\*.jpg
\*.jpg:-


I guess the simple solutions are to either do a platform check first, or 
to provide a --no-glob command-line switch to turn globbing off. But I 
guess people won't think of that until its reported as a bug :-)


> >> Another separate but related issue is the fact that windows wildcards do
> >> not behave in the same way as python glob patterns.
> >
> > I don't understand why you say this. As I understand it, there is no
> > such thing as "Windows wildcards" as every application which wants to
> > support wildcards has to implement their own. If you want to know what
> > kinds of globbing wildcards the application supports, you have to read
> > the application documentation. (Or guess.)
> >
> > I am not a Windows expert, so I may have been misinformed. Anyone care
> > to comment?
> 
> There's a very standard interpretation, which I think is codified in
> the FindFirstFile API, although I'm not sure of the details.

Do all Windows apps call FindFirstFile?


> >> and left bracket is a valid character in
> >> filenames unlike ? and *.
> >
> > And on POSIX systems, *all* wildcards are valid characters in file
> > names. If you want to specify a file called literally "*.txt", or
> > "spam[and eggs?].jpg", you have to escape the wildcards. Windows is no
> > different.
> 
> No, Windows *is* different, because you simply aren't allowed to have
> those characters in file names:

I know that. The point that Random raised is that some wildcards are 
legal in Windows file names. Yes, and *all* wildcards are legal in 
POSIX filenames, so whatever problems we have with escaping are going to 
occur on both platforms, as you have already pointed out above.

In any case, having a platform specific globbing module is looking more 
and more useful. Does this need a PEP?


-- 
Steven


More information about the Python-ideas mailing list