[Python-ideas] Add OS-dependent automatic glob support

random832 at fastmail.us random832 at fastmail.us
Mon Jan 5 00:10:01 CET 2015


On Sun, Jan 4, 2015, at 16:02, Ethan Furman wrote:
> On 01/04/2015 11:24 AM, random832 at fastmail.us wrote:
> 
> > If being cross-platform isn't easy, it won't happen. You see it now with
> > the lack of any support for "call glob on arguments on windows and not
> > on unix" [because the shell handles it on unix] whether directly, in
> > argparse, or in fileinput
> 
> Could you elaborate on this point?

On Unix, as I assume you know, the shell is responsible for interpreting
the kind of wildcard patterns that glob uses (plus shell-specific
extensions), and passing a list of proper filenames as the child
process's argv.

On Windows, this does not happen - the child process is simply passed a
single string with the whole command line. Python (or the C Runtime
Library that the python interpreter is linked against) converts this to
a list of strings for individual arguments based on spaces and quotes,
but does not interpret wildcard patterns in any of the arguments.

The C Runtime Library _can_ do this automatically - this is done on MSVC
by linking the "setargv.obj" library routine, which replaces a standard
internal routine that does not expand wildcards, but it does a poor job
because it does not know which arguments are intended to be filenames vs
other strings, and there is traditionally no way to escape them [since *
and ? aren't allowed in filenames, there's no reason not to allow "some
directory\*.txt", all in quotes, as an argument that will be handled as
a wildcard]

The appropriate place to expand them would be after you know you intend
to treat a list of arguments as a list of filenames, rather than at
program start - after options are parsed, for example (so an option with
an argument with an asterisk in it doesn't get turned into multiple
arguments), or if a list is being passed in to the fileinput module.
This should also only be done on windows, and not on other platforms
(since on other platforms this is supposed to be handled by the shell
rather than the child process).


Right now, none of this is done. If you pass *.txt on the command line
to a python script, it will attempt to open a file called "*.txt".

----

Another separate but related issue is the fact that windows wildcards do
not behave in the same way as python glob patterns. Bracketed character
classes are not supported, and left bracket is a valid character in
filenames unlike ? and *. There are some subtleties around dots ("*.*"
will match all filenames, even with no dot. "*." matches filenames
without any dot.), they're case-insensitive (I think glob does handle
this part, but not in the same way as the platform in some cases), and
they can match the short-form alternate filenames [8 characters, dot, 3
characters], so "*.htm" will typically match most files ending in
".html" as well as those ending in ".htm".

It might be useful to provide a way to make glob behave in the
windows-specific way (using the platform-specific functions
FindFirstFileEx and RtlIsNameInExpression on windows.)


More information about the Python-ideas mailing list