Make fnmatch.filter accept a tuple of patterns
Frequently, while globbing, one needs to work with multiple extensions. I’d like to propose for fnmatch.filter to handle a tuple of patterns (while preserving the single str argument functionality, alas str.endswith), as a first step for glob.i?glob to accept multiple patterns as well. Here is the implementation I came up with: https://github.com/python/cpython/compare/master...andresdelfino:fnmatch-mul... If this is deemed reasonable, I’ll write tests and documentation updates. Any opinion?
Andre Delfino writes:
Frequently, while globbing, one needs to work with multiple extensions. I’d like to propose for fnmatch.filter to handle a tuple of patterns (while preserving the single str argument functionality, alas str.endswith),
This is one of those famous 3-line functions, though: import fnmatch def multifilter(names, *patterns): result = [] for p in patterns: result.extend(fnmatch.filter(names, p) return result It's a 3-line function in 5 lines, OK, but still.
as a first step for glob.i?glob to accept multiple patterns as well.
If you're going to improve the glob module, why not use bash or zsh extended globbing ('**', '{a,b}') as the model? This is more powerful, and already familiar to many users.
On Sat, Nov 3, 2018 at 4:49 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Andre Delfino writes:
Frequently, while globbing, one needs to work with multiple extensions. I’d like to propose for fnmatch.filter to handle a tuple of patterns (while preserving the single str argument functionality, alas str.endswith),
This is one of those famous 3-line functions, though:
import fnmatch def multifilter(names, *patterns): result = [] for p in patterns: result.extend(fnmatch.filter(names, p) return result
It's a 3-line function in 5 lines, OK, but still.
And like many "hey it's this easy" demonstrations, that isn't quite identical, as a single file can match multiple patterns (but shouldn't be in the result multiple times). Whether that's an important distinction or not remains to be seen, but I do know of situations where this would have bitten me. ChrisA
Chris Angelico writes:
On Sat, Nov 3, 2018 at 4:49 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Andre Delfino writes:
Frequently, while globbing, one needs to work with multiple extensions. I’d like to propose for fnmatch.filter to handle a tuple of patterns (while preserving the single str argument functionality, alas str.endswith),
This is one of those famous 3-line functions, though:
import fnmatch def multifilter(names, *patterns): result = [] for p in patterns: result.extend(fnmatch.filter(names, p)) return result
It's a 3-line function in 5 lines, OK, but still.
And like many "hey it's this easy" demonstrations, that isn't quite identical, as a single file can match multiple patterns
Sure. I would have written it with set.union() on general principles except I forgot how to say "union", didn't feel like looking it up, and wanted to keep the def as close to 3 lines as I could without being obfuscated (see below). I wonder how many people would fall into the trap I did. (I don't consider myself a great programmer, but maybe that's all the more reason for this? Not-so-great minds think alike? :-) I was really more interested in the second question, though. Why invent yet another interface when we already have one that is well-known and more powerful? P.S. I can't resist. This is horrible, but: def multifilter(names, *patterns): return list(set().union(*[fnmatch.filter(names, p) for p in patterns])) Who even needs a function? ;-)
On Sat, Nov 3, 2018, 1:30 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp wrote:
P.S. I can't resist. This is horrible, but:
def multifilter(names, *patterns): return list(set().union(*[fnmatch.filter(names, p) for p in patterns]))
Yes, that is a horrible spelling for: {fnmatch.filter(names, p) for p in patterns} ;-)
On 2018-11-03 17:45, David Mertz wrote:
On Sat, Nov 3, 2018, 1:30 PM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp <mailto:turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
P.S. I can't resist. This is horrible, but:
def multifilter(names, *patterns): return list(set().union(*[fnmatch.filter(names, p) for p in patterns]))
Yes, that is a horrible spelling for:
{fnmatch.filter(names, p) for p in patterns}
;-)
But it has the advantage that it works. :-)
On Sat, Nov 3, 2018 at 3:03 PM MRAB <python@mrabarnett.plus.com> wrote:
Yes, that is a horrible spelling for:
{fnmatch.filter(names, p) for p in patterns}
But it has the advantage that it works. :-)
Indeed! Excellent point :-). I definitely should not post untested code from my tablet. This is still slightly less horrible, but I recognize it's starting to border on horrible: {n for p in patterns for n in fnmatch.filter(names, p)} This seems worse: set(chain(*(fnmatch.filter(names, p) for p in patterns))) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Sun, Nov 4, 2018 at 4:29 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Chris Angelico writes:
On Sat, Nov 3, 2018 at 4:49 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Andre Delfino writes:
Frequently, while globbing, one needs to work with multiple extensions. I’d like to propose for fnmatch.filter to handle a tuple of patterns (while preserving the single str argument functionality, alas str.endswith),
This is one of those famous 3-line functions, though:
import fnmatch def multifilter(names, *patterns): result = [] for p in patterns: result.extend(fnmatch.filter(names, p)) return result
It's a 3-line function in 5 lines, OK, but still.
And like many "hey it's this easy" demonstrations, that isn't quite identical, as a single file can match multiple patterns
Sure. I would have written it with set.union() on general principles except I forgot how to say "union", didn't feel like looking it up, and wanted to keep the def as close to 3 lines as I could without being obfuscated (see below). I wonder how many people would fall into the trap I did. (I don't consider myself a great programmer, but maybe that's all the more reason for this? Not-so-great minds think alike? :-)
A very fair point; and still supporting the notion that "it's a 3-line function" doesn't instantly silence the need. TBH, it's the moments when we AREN'T great programmers that we need the language to help us out. Why is it that we love strong rules and tight exceptions? Because they tell us when we've done something stupid, and help us to fix that bug with a minimum of fuss :)
I was really more interested in the second question, though. Why invent yet another interface when we already have one that is well-known and more powerful?
That kind of globbing might also solve the use-cases, but I'm worried about backward compatibility. Creating more glob-special characters could potentially change the meaning of globs that are already in use. I don't personally glob files with braces in their names, but someone somewhere is doing it (and I do have a bunch of files with UUIDs in their names, mainly in Wine directories); adding a feature like that might break code, or alternatively, would have to be fnmatch_with_braces(). In contrast, accepting a tuple of strings can't possibly break any working code that uses individual strings.
P.S. I can't resist. This is horrible, but:
def multifilter(names, *patterns): return list(set().union(*[fnmatch.filter(names, p) for p in patterns]))
Who even needs a function? ;-)
.... wow. I do want to make one small change to it, though: instead of list() at the end of the chain, I'd use sorted(). You're throwing away the original order of file names, so it'd look tidier to return them in order, rather than in whichever order iterating over the set gives them. Also, I am a very very bad person for suggesting an 'improvement' to a function of that nature. That is... a piece of art. Modern art, the sort where you go "This is incomprehensible therefore it is beautiful". :) ChrisA
On Sat, Nov 03, 2018 at 02:49:00AM +0900, Stephen J. Turnbull wrote:
If you're going to improve the glob module, why not use bash or zsh extended globbing ('**', '{a,b}') as the model? This is more powerful, and already familiar to many users.
I thought it did support extended globbing? https://docs.python.org/3/library/glob.html#glob.glob But brace expansion should be a thing. For backwards compatibility reasons, we probably need a switch to turn it on, or a separate function call, or maybe a deprecation period. -- Steve
Several attempts in this thread to write "simpler" code that does this job proved to be buggy. I feel there's some ground to say that having this in the library could prevent other less talented writers make those or even bigger mistakes. I feel a tuple to specify multiple patterns fits nicely and it's easier to explain that an extended syntax. I have created an issue and opened a PR with doc and tests, if this is deemed something worth to have. Issue: https://bugs.python.org/issue41429 PR: https://github.com/python/cpython/pull/21666
On Wed, Mar 24, 2021 at 10:35:59PM -0000, adelfino@gmail.com wrote:
Several attempts in this thread to write "simpler" code that does this job proved to be buggy. I feel there's some ground to say that having this in the library could prevent other less talented writers make those or even bigger mistakes.
I feel a tuple to specify multiple patterns fits nicely and it's easier to explain that an extended syntax.
I have created an issue and opened a PR with doc and tests, if this is deemed something worth to have.
Issue: https://bugs.python.org/issue41429 PR: https://github.com/python/cpython/pull/21666
Why it must be a tuple and not an iterable? Not even a list? Oleg. -- Oleg Broytman https://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
This follows the example of str.startswith/str.endswith, but yes, it could be any iterable.
(If the answer is for me - pity it lacks any context.) On Thu, Mar 25, 2021 at 10:51:06PM -0000, adelfino@gmail.com wrote:
This follows the example of str.startswith/str.endswith, but yes, it could be any iterable.
It could but currently cannot. The code checks for ``tuple`` literally: pats = pat if isinstance(pat, tuple) else (pat,) Ref: https://github.com/python/cpython/pull/21666/files#diff-5078809c85d4fa8475e7... The test should be inverted - check for string, not for iterator/iterable: pats = (pat,) if isinstance(pat, string) else pat # Assume ``pat`` is an iterable Oleg. -- Oleg Broytman https://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
On Fri, Mar 26, 2021 at 12:30:25AM +0100, Oleg Broytman <phd@phdru.name> wrote:
(If the answer is for me - pity it lacks any context.)
On Thu, Mar 25, 2021 at 10:51:06PM -0000, adelfino@gmail.com wrote:
This follows the example of str.startswith/str.endswith, but yes, it could be any iterable.
It could but currently cannot. The code checks for ``tuple`` literally:
pats = pat if isinstance(pat, tuple) else (pat,)
Ref: https://github.com/python/cpython/pull/21666/files#diff-5078809c85d4fa8475e7...
The test should be inverted - check for string, not for iterator/iterable:
pats = (pat,) if isinstance(pat, string) else pat # Assume ``pat`` is an iterable
Sorry, ``str``, not string. Oleg. -- Oleg Broytman https://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
Oleg Broytman wrote:
This follows the example of str.startswith/str.endswith, but yes, it could be any iterable. It could but currently cannot. The code checks for ``tuple`` literally:
(If the answer is for me - pity it lacks any context.) On Thu, Mar 25, 2021 at 10:51:06PM -0000, adelfino@gmail.com wrote: pats = pat if isinstance(pat, tuple) else (pat,) Ref: https://github.com/python/cpython/pull/21666/files#diff-5078809c85d4fa8475e7... The test should be inverted - check for string, not for iterator/iterable: pats = (pat,) if isinstance(pat, string) else pat # Assume ``pat`` is an iterable Oleg.
Yeah, I mean it could by changing the branch, but first it should be decided if the idea is worthwhile. Otherwise, we will be changed something that may not be committed.
On 24/03/2021 23:35, adelfino@gmail.com wrote:
Several attempts in this thread to write "simpler" code that does this job proved to be buggy. I feel there's some ground to say that having this in the library could prevent other less talented writers make those or even bigger mistakes.
I feel a tuple to specify multiple patterns fits nicely and it's easier to explain that an extended syntax.
I have created an issue and opened a PR with doc and tests, if this is deemed something worth to have.
Here's a sketch for yet another implementation: Change _compile_pattern() and maybe translate() to return a regex that matches multiple patterns. Then you could leave the already over-complicated implementation of filter() alone and would support tuples or iterators of patterns in fnmatch() and fnmatchcase(), too.
participants (9)
-
adelfino@gmail.com
-
Andre Delfino
-
Chris Angelico
-
David Mertz
-
MRAB
-
Oleg Broytman
-
Peter Otten
-
Stephen J. Turnbull
-
Steven D'Aprano