Draft PEP: Automatic Globbing of Filenames in argparse on Windows

PEP: XXX Title: Automatic Globbing of Filenames in argparse on Windows Version: $Revision$ Last-Modified: $Date$ Author: Kef Schecter <furrykef@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 14-Aug-2015 Python-Version: 3.6 Post-History: Abstract ======== This PEP proposes to add functionality to argparse to allow glob (wildcard) expressions to be handled automagically on Windows. Motivation ========== For many command-line tools, it is handy to be able to specify wildcards in order to operate on more than one file at a time. On Unix-like systems, this is handled automatically by the shell. On Windows, however, the default shell does not have this behavior, nor does Microsoft's PowerShell. Yet Windows users generally expect wildcards to work. For example, most built-in commands such as ``dir`` and ``type`` accept wildcard arguments, and have since the early days of MS-DOS. It is already possible for programmers to work around this issue, but it is a bit cumbersome and it is easy to make the behavior almost, but not quite, correct. Moreover, since Python has a "batteries included" philosophy, and this is a very common feature, it is the author's opinion that the correct functionality should be available out of the box. How It Must Be Done Currently ============================= :: if platform.system() == 'Windows': filenames = [] for filename in args.files: if '*' in filename or '?' in filename or '[' in filename: filenames += glob.glob(filename) else: filenames.append(filename) args.files = filenames Why This Is a Problem ===================== - Authors, especially those who use Unix-like systems, will usually not bother to add this code unless users specifically request it, and perhaps not even then. How often have you seen this code in a program? - It is easy to forget the platform check or not understand why it is necessary. Automatically globbing filenames on a Unix-like system is wrong because the shell is supposed to handle it already; on such a system, if the program sees a name like ``*.txt``, then it means the user explicitly specified the name of a file that, improbable as it may seem, has an asterisk in its filename. On Windows, filenames with the characters ``*`` and ``?`` in their name are not possible, so this is partially irrelevant on Windows even when using a Unix-like shell such as bash (but see `Square Brackets`_ below). - It is easy to forget to check the string for wildcard characters before passing it to glob.glob. If the user specifies a filename with no wildcards such as ``foo.txt``, and foo.txt does not exist, then glob.glob will silently ignore the file, giving the program no opportunity to print a message such as "No file named foo.txt". - glob.glob may not be quite the right function to use. See `Square Brackets`_ below. - It is boilerplate code that is applicable to a large number of programs without change, which suggests it belongs in a library. Solution ======== Add a keyword argument to argparse.ArgumentParser.add_argument called ``glob``. If it is true, it will automatically glob filenames using code much like the boilerplate code given earlier in `How It Must Be Done Currently`_. This argument is only meaningful when nargs is set to an appropriate value such as '+' or '*'. The default value of this argument should be False. This ensures backward compatibility with existing programs that assume wildcards are not expanded, such as a program that accepts a regex as an argument. A possibly better behavior might be to make this argument default to True (enabling the functionality automagically without the programmer needing to be aware of it) and only expand wildcard arguments that are not provided in quotes, similar to how Unix-like shells behave. However, there appears to be no simple way to tell whether an argument was supplied in quotes or not; the strings in sys.argv already have had the quotes removed. Square Brackets =============== It has been noted above that the characters ``*`` and ``?`` will never appear in filenames on Windows. However, the characters ``[`` and ``]``, which glob.glob uses for wildcards, **can** be used in filenames, and may not be especially uncommon. There are three possible ways of handling this: 1. Use a version of glob.glob without the wildcard functionality that ``[`` and ``]`` provide. This type of wildcard has never been standard for wildcard arguments to MS-DOS or Windows command-line programs. 2. Specify some kind of escaping mechanism; for example, ``\[foo\].txt`` would refer to a file that has ``[`` and ``]`` in its filename. This may not be intuitive behavior for Windows users. 3. Keep glob.glob's standard functionality. Programs using this feature will not be able to operate on files that have square brackets in their names. Of these, the first should adhere best to the principle of least surprise. Windows users do not expect square brackets to form wildcard expressions. If they want such functionality, they will probably already be using a shell such as bash that handles it for them. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

On Fri, Aug 14, 2015 at 7:32 PM, Kef Schecter <furrykef@gmail.com> wrote:
How does this interact with the 'fileinput' module? Can you tie in with that?
Disgusting. :) Definitely needs to be buried away.
+1
-1. Since Windows users aren't generally used to escaping arguments (eg compare Windows's "dir /s *.py" to Unix's "find . -name \*.py", where the latter will be backslash-protected), I would advise against glob expansion any time the program doesn't explicitly ask for it.
+1. It also shouldn't do any other form of expansion (eg braces) that people wouldn't expect of a standard Windows command like dir or copy. One other difference from glob.glob() that I'd recommend: If the spec doesn't match any files, return it unchanged, as bash does. (glob.glob will return an empty list.) Otherwise, sounds good to me - make it easy to DTRT on all platforms. ChrisA

On Fri, Aug 14, 2015 at 6:47 AM, Chris Angelico <rosuav@gmail.com> wrote:
The fileinput module likewise does not expand wildcards and will choke on an argument such as "*.txt", but I think for that module it would be safe to make wildcard expansion automatic on Windows, since it's known that all the arguments are filenames.

On Fri, Aug 14, 2015, at 05:32, Kef Schecter wrote:
-1, "C:\Some Directory With Spaces\*.txt" is a valid wildcard for most Windows programs.
Especially if you're using backslash for quoting - how do you differentiate it from backslash as a directory separator? Vim manages, but it's a bit headache-inducing to look at.
I actually think #1 is the correct way, but it's worth noting that using [[] for left bracket already works. This solution is also used in Vim (even on Windows) and MS-SQL. Right bracket needs no escaping since it's outside brackets to begin with (but you can if you want). You can even quote * and ? that way (filenames won't have them on Windows, but can on Unix).
Speaking of least surprise... while _most_ of the quirks of Windows wildcards are extremely obscure and can probably be safely ignored for the purpose of this feature (my objection in the scandir discussion was to relying on the _real_ Windows wildcard implementation in a cross-platform function without explicitly documenting it, rather than saying it needs to be emulated), the fact that you can use *.* to match all files (including those that don't include a dot) and *. to match only files that do not include a dot is well-known. Filenames on Windows cannot actually end with a dot.

On Fri, Aug 14, 2015 at 7:32 PM, Kef Schecter <furrykef@gmail.com> wrote:
How does this interact with the 'fileinput' module? Can you tie in with that?
Disgusting. :) Definitely needs to be buried away.
+1
-1. Since Windows users aren't generally used to escaping arguments (eg compare Windows's "dir /s *.py" to Unix's "find . -name \*.py", where the latter will be backslash-protected), I would advise against glob expansion any time the program doesn't explicitly ask for it.
+1. It also shouldn't do any other form of expansion (eg braces) that people wouldn't expect of a standard Windows command like dir or copy. One other difference from glob.glob() that I'd recommend: If the spec doesn't match any files, return it unchanged, as bash does. (glob.glob will return an empty list.) Otherwise, sounds good to me - make it easy to DTRT on all platforms. ChrisA

On Fri, Aug 14, 2015 at 6:47 AM, Chris Angelico <rosuav@gmail.com> wrote:
The fileinput module likewise does not expand wildcards and will choke on an argument such as "*.txt", but I think for that module it would be safe to make wildcard expansion automatic on Windows, since it's known that all the arguments are filenames.

On Fri, Aug 14, 2015, at 05:32, Kef Schecter wrote:
-1, "C:\Some Directory With Spaces\*.txt" is a valid wildcard for most Windows programs.
Especially if you're using backslash for quoting - how do you differentiate it from backslash as a directory separator? Vim manages, but it's a bit headache-inducing to look at.
I actually think #1 is the correct way, but it's worth noting that using [[] for left bracket already works. This solution is also used in Vim (even on Windows) and MS-SQL. Right bracket needs no escaping since it's outside brackets to begin with (but you can if you want). You can even quote * and ? that way (filenames won't have them on Windows, but can on Unix).
Speaking of least surprise... while _most_ of the quirks of Windows wildcards are extremely obscure and can probably be safely ignored for the purpose of this feature (my objection in the scandir discussion was to relying on the _real_ Windows wildcard implementation in a cross-platform function without explicitly documenting it, rather than saying it needs to be emulated), the fact that you can use *.* to match all files (including those that don't include a dot) and *. to match only files that do not include a dot is well-known. Filenames on Windows cannot actually end with a dot.
participants (3)
-
Chris Angelico
-
Kef Schecter
-
random832@fastmail.us