[Python-ideas] Add OS-dependent automatic glob support

Chris Angelico rosuav at gmail.com
Mon Jan 5 02:08:06 CET 2015


On Mon, Jan 5, 2015 at 11:57 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sun, Jan 04, 2015 at 06:10:01PM -0500, random832 at fastmail.us wrote:
> [...]
>> Right now, none of this is done. If you pass *.txt on the command line
>> to a python script, it will attempt to open a file called "*.txt".
>
> On Windows, you will get Python's definition of globbing. On POSIX
> systems, the globbing module will do nothing, because the shell will
> most likely have already expanded the wild-cards

You're assuming that there are no wildcard characters actually
included in the file names, which on Unix systems is perfectly
possible. How can you reliably manipulate a file called *.txt when
there are foo.txt and bar.txt in the same directory? With most Unix
programs, you escape the asterisk to get it past the shell, the
program does no globbing of its own, and you're safe. If your Python
script unconditionally globs its file names, you're stuck.

>> Another separate but related issue is the fact that windows wildcards do
>> not behave in the same way as python glob patterns.
>
> I don't understand why you say this. As I understand it, there is no
> such thing as "Windows wildcards" as every application which wants to
> support wildcards has to implement their own. If you want to know what
> kinds of globbing wildcards the application supports, you have to read
> the application documentation. (Or guess.)
>
> I am not a Windows expert, so I may have been misinformed. Anyone care
> to comment?

There's a very standard interpretation, which I think is codified in
the FindFirstFile API, although I'm not sure of the details.

>> and left bracket is a valid character in
>> filenames unlike ? and *.
>
> And on POSIX systems, *all* wildcards are valid characters in file
> names. If you want to specify a file called literally "*.txt", or
> "spam[and eggs?].jpg", you have to escape the wildcards. Windows is no
> different.

No, Windows *is* different, because you simply aren't allowed to have
those characters in file names:

>>> open("*.txt","w")
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    open("*.txt","w")
OSError: [Errno 22] Invalid argument: '*.txt'

Unix allows anything other than a slash or NUL, so all wildcards have
to be escaped to get them past the shell; or, alternatively, you can
just use a non-shell way of starting a program, like Python's own
subprocess module.

> The glob module supports escaping of wildcards, doesn't it? If so, we
> have no problem. If not, that's a bug, or at least an obvious and
> important piece of missing functionality.

I'm not sure that's the solution, because then Unix users would have
to double-escape everything.

It can be done, of course, but this is the exact sort of complication
that single-platform developers often don't even think of. Suppose you
develop on Windows, and just unconditionally glob all your
arguments... your program will work fine on Unix, until it's asked to
deal with a file with an asterisk in it, and your user complains very
loudly about how it misbehaved... and maybe destroyed a bunch of
files. Oops. Not good.

ChrisA


More information about the Python-ideas mailing list