Curly braces expansion in shell-like matcher modules
Hi, A few days ago I submitted a feature request for the fnmatch module to allow curly braces shell-like expansions: http://bugs.python.org/issue9584 The patch I submitted was found to be incorrect, and I'm working on it (I already have a patch that seems to be working and I need to test it extensively). However, some concerns were raised about the fnmatch module not being the correct place to do that, if the goal is to mimic shell behavior. Expansion would have to be done before, generating the list of patterns that would then be given to fnmatch. It was suggested that I took this issue on Python-ideas, so here we go. What do people think? Should it be done in fnmatch, expanding the curly braces to the regex? Should it be done before that? If the latter, what would be an appropriate module to expand the braces? glob? Another module? FWIW (i.e not much), my opinion is leaning towards implementing it in glob.py: - add '{' to the magic_check regex - in glob1 (which is called when the pattern 'has_magic'), expand the braces and then call fnmatch.filter() on each resulting pattern That would respect more what is done in shells like Bash, and it makes it also more straight-forward to implement. Cheers, -- Mathieu
On Tue, Aug 17, 2010 at 8:13 AM, Mathieu Bridon <bochecha@fedoraproject.org> wrote:
If the latter, what would be an appropriate module to expand the braces? glob? Another module?
Since normal ("sh-like") Unix shells apply this generally, I'd be inclined to have a function to do this in shlex. -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> "A storm broke loose in my mind." --Albert Einstein
On Tue, 17 Aug 2010 14:13:26 +0200 Mathieu Bridon <bochecha@fedoraproject.org> wrote:
However, some concerns were raised about the fnmatch module not being the correct place to do that, if the goal is to mimic shell behavior. Expansion would have to be done before, generating the list of patterns that would then be given to fnmatch.
I don't think mimicking shell behaviour should be a design goal of fnmatch or any other stdlib module. Shells are multiple and, besides, users are generally not interested in reproducing shell behaviour when they use Python; they simply are looking for useful functionality. IMO, fnmatch is the right place for such an enhancement. (and, as the doc states, “glob uses fnmatch() to match pathname segments”).
FWIW (i.e not much), my opinion is leaning towards implementing it in glob.py: - add '{' to the magic_check regex - in glob1 (which is called when the pattern 'has_magic'), expand the braces and then call fnmatch.filter() on each resulting pattern
That would respect more what is done in shells like Bash, and it makes it also more straight-forward to implement.
It also introduces a bizarrely inconsistent behaviour in the dubious name of compatibility. Let's not reproduce the quirks of Unix shells, which are hardly a reference in beautiful UI and API design. Regards Antoine.
On Tue, Aug 17, 2010 at 9:03 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
IMO, fnmatch is the right place for such an enhancement. (and, as the doc states, “glob uses fnmatch() to match pathname segments”).
This is a good reason not to push the implementation down into glob, actually: the expansion may cross segment boundaries: for{bar/turtle,car/monkey}_test.* should expand to the two patterns: foobar/turtle_test.* foocar/monkey_test.* -Fred -- Fred L. Drake, Jr. <fdrake at gmail.com> "A storm broke loose in my mind." --Albert Einstein
Hi, On Tue, 2010-08-17 at 09:14 -0400, Fred Drake wrote:
On Tue, Aug 17, 2010 at 9:03 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
IMO, fnmatch is the right place for such an enhancement. (and, as the doc states, “glob uses fnmatch() to match pathname segments”).
This is a good reason not to push the implementation down into glob, actually: the expansion may cross segment boundaries:
for{bar/turtle,car/monkey}_test.*
should expand to the two patterns:
foobar/turtle_test.* foocar/monkey_test.*
Then I have the correct behavior with the attached patch against the glob module. :) (I still have to write some proper unit tests for it, this is only a working proof of concept) Note that I wrote this patch against the Python trunk, and tested it on Python 2.5 (Windows XP) and Python 2.6 (Fedora 13). (I didn't have time to actually build the Python trunk and run the unit tests yet) To test it, I use the following dictionary where keys are the patterns I want to try and values are the corresponding expected output. d = { 'foo.txt': 'foo.txt', 'foo-{bar,baz}.txt': 'foo-bar.txt foo-baz.txt', 'foo-{bar,baz-{toto,plouf}}.txt': 'foo-bar.txt foo-baz-plouf.txt foo-baz-toto.txt', 'foo-{bar,baz}-{toto,plouf}.txt': 'foo-bar-plouf.txt foo-bar-toto.txt foo-baz-plouf.txt foo-baz-toto.txt', 'foo-{}.txt': 'foo-{}.txt', 'foo-{bar}.txt': 'foo-{bar}.txt', 'foo-{bar.txt': 'foo-{bar.txt', 'foo-bar}.txt': 'foo-bar}.txt', 'foo-{bar{baz,plouf}.txt': 'foo-{barbaz.txt foo-{barplouf.txt', 'foo-{bar,baz}-{toto}.txt': 'foo-bar-{toto}.txt foo-baz-{toto}.txt', 'foo-{bar,baz}-{toto.txt': 'foo-bar-{toto.txt foo-baz-{toto.txt', 'foo-{bar,baz}-toto}.txt': 'foo-bar-toto}.txt foo-baz-toto}.txt', 'tmp/foo.txt': 'tmp/foo.txt', 'tmp/foo-{bar,baz}.txt': 'tmp/foo-bar.txt tmp/foo-baz.txt', 'tmp/foo-{bar,baz-{toto,plouf}}.txt': 'tmp/foo-bar.txt tmp/foo-baz-plouf.txt tmp/foo-baz-toto.txt', 'tmp/foo-{bar,baz}-{toto,plouf}.txt': 'tmp/foo-bar-plouf.txt tmp/foo-bar-toto.txt tmp/foo-baz-plouf.txt tmp/foo-baz-toto.txt', 'tmp/foo-{}.txt': 'tmp/foo-{}.txt', 'tmp/foo-{bar}.txt': 'tmp/foo-{bar}.txt', 'tmp/foo-{bar.txt': 'tmp/foo-{bar.txt', 'tmp/foo-bar}.txt': 'tmp/foo-bar}.txt', 'tmp/foo-{bar{baz,plouf}.txt': 'tmp/foo-{barbaz.txt tmp/foo-{barplouf.txt', 'tmp/foo-{bar,baz}-{toto}.txt': 'tmp/foo-bar-{toto}.txt tmp/foo-baz-{toto}.txt', 'tmp/foo-{bar,baz}-{toto.txt': 'tmp/foo-bar-{toto.txt tmp/foo-baz-{toto.txt', 'tmp/foo-{bar,baz}-toto}.txt': 'tmp/foo-bar-toto}.txt tmp/foo-baz-toto}.txt', '{tmp,tmp2}/foo.txt': 'tmp2/foo.txt tmp/foo.txt', '{tmp,tmp2}/foo-{bar,baz}.txt': 'tmp2/foo-bar.txt tmp2/foo-baz.txt tmp/foo-bar.txt tmp/foo-baz.txt', '{tmp,tmp2}/foo-{bar,baz-{toto,plouf}}.txt': 'tmp2/foo-bar.txt tmp2/foo-baz-plouf.txt tmp2/foo-baz-toto.txt tmp/foo-bar.txt tmp/foo-baz-plouf.txt tmp/foo-baz-toto.txt', '{tmp,tmp2}/foo-{bar,baz}-{toto,plouf}.txt': 'tmp2/foo-bar-plouf.txt tmp2/foo-bar-toto.txt tmp2/foo-baz-plouf.txt tmp2/foo-baz-toto.txt tmp/foo-bar-plouf.txt tmp/foo-bar-toto.txt tmp/foo-baz-plouf.txt tmp/foo-baz-toto.txt', '{tmp,tmp2}/foo-{}.txt': 'tmp2/foo-{}.txt tmp/foo-{}.txt', '{tmp,tmp2}/foo-{bar}.txt': 'tmp2/foo-{bar}.txt tmp/foo-{bar}.txt', '{tmp,tmp2}/foo-{bar.txt': 'tmp2/foo-{bar.txt tmp/foo-{bar.txt', '{tmp,tmp2}/foo-bar}.txt': 'tmp2/foo-bar}.txt tmp/foo-bar}.txt', '{tmp,tmp2}/foo-{bar{baz,plouf}.txt': 'tmp2/foo-{barbaz.txt tmp2/foo-{barplouf.txt tmp/foo-{barbaz.txt tmp/foo-{barplouf.txt', '{tmp,tmp2}/foo-{bar,baz}-{toto}.txt': 'tmp2/foo-bar-{toto}.txt tmp2/foo-baz-{toto}.txt tmp/foo-bar-{toto}.txt tmp/foo-baz-{toto}.txt', '{tmp,tmp2}/foo-{bar,baz}-{toto.txt': 'tmp2/foo-bar-{toto.txt tmp2/foo-baz-{toto.txt tmp/foo-bar-{toto.txt tmp/foo-baz-{toto.txt', '{tmp,tmp2}/foo-{bar,baz}-toto}.txt': 'tmp2/foo-bar-toto}.txt tmp2/foo-baz-toto}.txt tmp/foo-bar-toto}.txt tmp/foo-baz-toto}.txt', 'tm{p/foo,p2/foo}.txt': 'tmp2/foo.txt tmp/foo.txt', 'foo-bar*{txt,xml}': 'foo-bar-plouf.txt foo-bar-toto.txt foo-bar-toto}.txt foo-bar-{toto.txt foo-bar-{toto}.txt foo-bar.txt foo-bar}.txt', 'foo?bar.{txt,xml}': 'foo-bar.txt', } (note that those actually correspond to files I have in the current folder so that they match) Anyone can think about other interesting patterns involving braces? Also, if the consensus is that glob is not the proper place, it would be pretty straight-forward to do it in a module that would expand the braces before calling glob on the resulting patterns. -- Mathieu
On 8/17/2010 3:39 PM, Mathieu Bridon wrote:
Note that I wrote this patch against the Python trunk, and tested it on Python 2.5 (Windows XP) and Python 2.6 (Fedora 13). (I didn't have time to actually build the Python trunk and run the unit tests yet)
Just to make sure you are aware, the current trunk is the py3k branch. That is what 3.2 is being released from and what the patch needs to work with. The one labelled 'trunk' was 2.x and is now frozen (though it is, in a sense, continued as the maint27 branch). At some point, summarize this thread on the tracker and give a link. http://bugs.python.org/issue9584 -- Terry Jan Reedy
participants (5)
-
Antoine Pitrou
-
Fred Drake
-
Greg Ewing
-
Mathieu Bridon
-
Terry Reedy