itertools.compress default selectors
Hi! I propose to enhance "itertools.compress" in such a way so if you don't provide selectors, then "data" itself is used as a selectors. So "compress(a)" would be equivalent to "compress(a, a)" For example:
from itertools import compress
[*compress([0, 1, 2, 3]))] [1, 2, 3]
[*compress(["", "CFLAGS=-O3"])] ["CFLAGS=-O3"]
opts = compress([None, "", "-filove-python", "CFLAGS=-O3"]) " ".join(opts) '-filove-python CFLAGS=-O3'
What do you think guys about this? Perhaps it was proposed by someone else? Thanks! Stepan Dyatkovskiy
You can already do this using filter(None, a) >>> list(filter(None, [None, "", "-filove-python", "CFLAGS=-O3"])) ['-filove-python', 'CFLAGS=-O3'] There's arguably a minor readability improvement (compress(a) suggests "remove the unneeded elements") but I'm not sure that's enough to justify the change. On the other hand, it's not like there's any obviously better default value for the second argument... I guess overall I'm fairly indifferent to the change. Paul On Mon, 13 Sept 2021 at 13:07, <ml@dyatkovskiy.com> wrote:
Hi!
I propose to enhance "itertools.compress" in such a way so if you don't provide selectors, then "data" itself is used as a selectors. So "compress(a)" would be equivalent to "compress(a, a)"
For example:
from itertools import compress
[*compress([0, 1, 2, 3]))] [1, 2, 3]
[*compress(["", "CFLAGS=-O3"])] ["CFLAGS=-O3"]
opts = compress([None, "", "-filove-python", "CFLAGS=-O3"]) " ".join(opts) '-filove-python CFLAGS=-O3'
What do you think guys about this? Perhaps it was proposed by someone else?
Thanks! Stepan Dyatkovskiy _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LVD63O... Code of Conduct: http://python.org/psf/codeofconduct/
My 2c: I don't remember ever seeing itertools.compress (although I've occasionally browsed through itertools), and I've certainly never used it. However, having worked out what it does, my first reaction was that this would be a harmless and mildly convenient change. But might it not lead to bugs if the second argument were accidentally omitted? Best wishes Rob Cliffe On 13/09/2021 12:07, ml@dyatkovskiy.com wrote:
Hi!
I propose to enhance "itertools.compress" in such a way so if you don't provide selectors, then "data" itself is used as a selectors. So "compress(a)" would be equivalent to "compress(a, a)"
For example:
from itertools import compress
[*compress([0, 1, 2, 3]))] [1, 2, 3]
[*compress(["", "CFLAGS=-O3"])] ["CFLAGS=-O3"]
opts = compress([None, "", "-filove-python", "CFLAGS=-O3"]) " ".join(opts) '-filove-python CFLAGS=-O3'
What do you think guys about this? Perhaps it was proposed by someone else?
Thanks! Stepan Dyatkovskiy _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LVD63O... Code of Conduct: http://python.org/psf/codeofconduct/
I've never used compress() either. But I probably HAVE selected only the "truthy" elements of an iterable. The obvious way to do that is: it = (x for x in it if x) It feels like changing compress() would just be more obscure, but not add to clarity. On Mon, Sep 13, 2021, 5:07 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
My 2c: I don't remember ever seeing itertools.compress (although I've occasionally browsed through itertools), and I've certainly never used it. However, having worked out what it does, my first reaction was that this would be a harmless and mildly convenient change. But might it not lead to bugs if the second argument were accidentally omitted? Best wishes Rob Cliffe
Hi!
I propose to enhance "itertools.compress" in such a way so if you don't
On 13/09/2021 12:07, ml@dyatkovskiy.com wrote: provide selectors, then "data" itself is used as a selectors.
So "compress(a)" would be equivalent to "compress(a, a)"
For example:
from itertools import compress
[*compress([0, 1, 2, 3]))] [1, 2, 3]
[*compress(["", "CFLAGS=-O3"])] ["CFLAGS=-O3"]
opts = compress([None, "", "-filove-python", "CFLAGS=-O3"]) " ".join(opts) '-filove-python CFLAGS=-O3'
What do you think guys about this? Perhaps it was proposed by someone else?
Thanks! Stepan Dyatkovskiy _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LVD63O... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/433A57... Code of Conduct: http://python.org/psf/codeofconduct/
FYI, itertools.compress() is very useful in conjunction with itertools.cycle() to pick out elements following a periodic pattern of indices. For example, # Elements at even indices.
list(compress(range(20), cycle([1, 0]))) [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
# Or at odd ones.
list(compress(range(20), cycle([0, 1]))) [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
# Pick every third.
list(compress(range(20), cycle([0, 0, 1]))) [2, 5, 8, 11, 14, 17]
# Omit every third.
list(compress(range(20), cycle([1, 1, 0]))) [0, 1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 16, 18, 19]
For arguments that are re-iteraerable, there are several ways to get the proposed semantics, including just passing the argument twice to compress():
a = [None, "", "-filove-python", "CFLAGS=-O3"] " ".join(compress(a, a)) '-filove-python CFLAGS=-O3'
OK, If I might tell some story... :) Main motivation was a use case where we gather command line options, and some of them are… optional. Below is a realistic example. Namely options for LLVM Clang:
flags = [“-O3”, “-fomit-frame-pointer”, maybe_pgo(context)]
Here “maybe_pgo” returns either “None” if we don’t use profile guided optimization and it returns “-fprofile-generate” otherwise. Thus I have collection of options and some of them are empty or None. In order to get rendered command line options string I use “compress” in conjunction with “join":
opts = " ".join(compress(flags, flags))
I usually introduce alias for this:
def jin(a: str, sep=“ “): return sep.join(compress(a, a))
And I found that I use it quite frequently. (“jin” is an analog of “join" but without “emptiness”, so we omitted “o” :) ) Initially I also used this:
it = (x for x in it if x)
But when you about to build sophisticated options string it leads to quite scattered code. Whilst with an alias I make pretty readable one-liners:
opts = “ “.join(compress([“-O3”, “-fomit-stackpointer”, maybe_linker_pgo(context)]))
or in case with “str.jin”:
opts = " ".jin([“-O3”, “-fomit-stackpointer”, maybe_linker_pgo(context)])
I considered to propose “str.jin” static method, and pretty much wondered about it. But then decided that for stdlib and might be too much and stopped on “compress”. Also it still allows to build options collection in a nice way if you surround all options lists with a “compress” call:
flags = compress([“-O3”, “-fomit-stackpointer”, *maybe_even_more(context)]) ldflags = compress([“-fglobal-merge”, maybe_opt_level(context)] opts = " “.join(flags + ldflags)
What confuses me with “compress” is a weird dispersion of similar functionality: “filter” (builtin), “itertools.compress” and “itertools.filterfalse”. All of them pursues similar goals and in fact might be redesigned as a single method. Or methods family with same prefix (“filter”?). And perhaps it was discussed already. So wouldn’t it be wasted effort to work on “compress” right now? Perhaps “jin” would be better a solution indeed? And yet I’m solid that we need some compact and nice way for rendering strings with command options. That would be a thing. Thanks! Stepan Dyatkovskiy
On Sep 14, 2021, at 2:19 AM, Tim Peters <tim.peters@gmail.com> wrote:
FYI, itertools.compress() is very useful in conjunction with itertools.cycle() to pick out elements following a periodic pattern of indices. For example,
# Elements at even indices.
list(compress(range(20), cycle([1, 0]))) [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
# Or at odd ones.
list(compress(range(20), cycle([0, 1]))) [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
# Pick every third.
list(compress(range(20), cycle([0, 0, 1]))) [2, 5, 8, 11, 14, 17]
# Omit every third.
list(compress(range(20), cycle([1, 1, 0]))) [0, 1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 16, 18, 19]
For arguments that are re-iteraerable, there are several ways to get the proposed semantics, including just passing the argument twice to compress():
a = [None, "", "-filove-python", "CFLAGS=-O3"] " ".join(compress(a, a)) '-filove-python CFLAGS=-O3'
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XGCVW3... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, 14 Sept 2021 at 08:38, ml@dyatkovskiy.com <ml@dyatkovskiy.com> wrote:
Main motivation was a use case where we gather command line options, and some of them are… optional. [...] And yet I’m solid that we need some compact and nice way for rendering strings with command options. That would be a thing.
Frankly, I'd just use something like your "jin" function def make_option_string(options): return " ".join(opt for opt in options if opt) Note that I gave it a more readable name that reflects the actual use case. This is deliberate, as I think the main advantage here is readability, and using a name that reflects the use case, rather than a "generic" name, helps readability. That's also why I don't see this as being a useful candidate for the stdlib - it would have to have a "generic" name in that case, which defeats the (for me) main benefit. I find the "sep.join(x for x in it if x)" construction short and readable enough that I'd be unlikely to use a dedicated function for it, even if one existed. And while "x for x in it if x" is annoyingly repetitive, it's not so bad that I'd go hunting for a function to replace it. So for me, I *don't* think we need a dedicated function for this. Paul
On Tue, Sep 14, 2021 at 11:31:43AM +0400, ml@dyatkovskiy.com wrote:
Thus I have collection of options and some of them are empty or None. In order to get rendered command line options string I use “compress” in conjunction with “join":
opts = " ".join(compress(flags, flags))
Why do you (ab)use compress for that? I understand that `compress(flags, flags)` has the effect of filtering for non-empty flags. But that's an obfuscated way to write it. Either of these would be more understandable: * `filter(None, flags)` # could also use bool instead of None * `(flag for flag in flags if flag)` especially the last, although heavy users of functional languages may prefer filter. But using compress with the same argument twice is just weird. And also fragile. You can't use an iterator for the flags. >>> jin(iter(['a', 'b', '', 'c', 'd', 'e', 'f', '', 'g'])) 'a d'
I usually introduce alias for this:
def jin(a: str, sep=“ “): return sep.join(compress(a, a))
And I found that I use it quite frequently.
The signature is wrong. `flags` is a list of string options. If you pass an actual string, you expand it with spaces: >>> jin('abc def') 'a b c d e f' -- Steve
The signature is wrong. Thanks for remark. Of course proper signature would be:
def jin(a: Iterable[Optional[str]], sep=“ “): # …
Why do you (ab)use compress for that? Well, it seems that it is subjective. To me “[None, x, y, z]” -> “[x, y, z]” looks like “compression”. But if community agrees with your side, then, well, it’s OK.
As another argument to use “compress” was similar case I found on stackoverflow for numpy: https://stackoverflow.com/questions/5927180/how-do-i-remove-all-zero-element... So extending “compress” would be useful for math cases as well. Alternatively indeed we can use: 2. Use "filter(None, a)” 3. (x for x in a if x) Why not to use #3? Only having #2 or #3, I would vote for “filter”. It is a builtin, and used to be implemented as intrinsic. In cpython it has a separate “if” branch for case when first argument is “None” (see “filter_next” function in “bltinmodule.c”) There is also a good chance to work with optimized “compress” one day. #3 semantically is more complicated and it seems that there are no optimizations at least in cpython (but perhaps I’m wrong?). So, it looks like #3 is slower while parsing and while executing. #3 is bad choice for code maintenance. It is always better to pass variable once. “(x for x in a if x)” contains micro code dups. Here, you put “x” three times, and then if you decide to use something else you have to edit it in three places. So #3 defeats if you want to reuse or just maintain such code. Paul confirmed my worries about “jin”, so it seems that it is not an option either. And yet, I still have a little hope about original proposal. I proposed to add default value for second argument of “compress”. So thanks for you attention anyways, and let me know if it is still has a chance to be accepted. Thanks! Stepan Dyatkovskiy.
On Sep 14, 2021, at 2:18 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Sep 14, 2021 at 11:31:43AM +0400, ml@dyatkovskiy.com wrote:
Thus I have collection of options and some of them are empty or None. In order to get rendered command line options string I use “compress” in conjunction with “join":
opts = " ".join(compress(flags, flags))
Why do you (ab)use compress for that?
I understand that `compress(flags, flags)` has the effect of filtering for non-empty flags. But that's an obfuscated way to write it. Either of these would be more understandable:
* `filter(None, flags)` # could also use bool instead of None
* `(flag for flag in flags if flag)`
especially the last, although heavy users of functional languages may prefer filter. But using compress with the same argument twice is just weird.
And also fragile. You can't use an iterator for the flags.
jin(iter(['a', 'b', '', 'c', 'd', 'e', 'f', '', 'g'])) 'a d'
I usually introduce alias for this:
def jin(a: str, sep=“ “): return sep.join(compress(a, a))
And I found that I use it quite frequently.
The signature is wrong. `flags` is a list of string options. If you pass an actual string, you expand it with spaces:
jin('abc def') 'a b c d e f'
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RPY7HJ... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Sep 15, 2021 at 4:03 AM ml@dyatkovskiy.com <ml@dyatkovskiy.com> wrote:
The signature is wrong. Thanks for remark. Of course proper signature would be:
def jin(a: Iterable[Optional[str]], sep=“ “): # …
Why do you (ab)use compress for that? Well, it seems that it is subjective. To me “[None, x, y, z]” -> “[x, y, z]” looks like “compression”. But if community agrees with your side, then, well, it’s OK.
To me, that looks like filtering. The most common sort of filtering is what the filter() function does - ask a predicate function whether something's good or bad, and keep the good ones. The filtering done by itertools.compress() is slightly different - look it up in a corresponding list of "good" or "bad", and keep the good ones. What you're looking at is asking a question about each element. Specifically, you're asking "is this element truthy or falsy?". That perfectly matches the filter() function.
Alternatively indeed we can use: 2. Use "filter(None, a)” 3. (x for x in a if x)
Why not to use #3?
Only having #2 or #3, I would vote for “filter”. It is a builtin, and used to be implemented as intrinsic. In cpython it has a separate “if” branch for case when first argument is “None” (see “filter_next” function in “bltinmodule.c”)
There's no real difference between #2 and #3. If you feel more comfortable writing comprehensions, write comprehensions. If you feel more comfortable using builtins, use builtins. Either way, you're expressing the concept "keep the ones that are true".
#3 semantically is more complicated and it seems that there are no optimizations at least in cpython (but perhaps I’m wrong?). So, it looks like #3 is slower while parsing and while executing.
#3 is bad choice for code maintenance. It is always better to pass variable once. “(x for x in a if x)” contains micro code dups. Here, you put “x” three times, and then if you decide to use something else you have to edit it in three places. So #3 defeats if you want to reuse or just maintain such code.
Yeah, if that bothers you, use filter. Nothing wrong with either IMO.
And yet, I still have a little hope about original proposal. I proposed to add default value for second argument of “compress”.
So thanks for you attention anyways, and let me know if it is still has a chance to be accepted.
For it to be accepted, you have to convince people - particularly, core devs - that it's of value. At the moment, I'm unconvinced, but on the other hand, all you're proposing is a default value for a currently-mandatory argument, so the bar isn't TOO high (it's not like you're proposing to create a new language keyword or anything!). ChrisA
[Chris Angelico <rosuav@gmail.com>]
... For it to be accepted, you have to convince people - particularly, core devs - that it's of value. At the moment, I'm unconvinced, but on the other hand, all you're proposing is a default value for a currently-mandatory argument, so the bar isn't TOO high (it's not like you're proposing to create a new language keyword or anything!).
Except it's not that simple: def gen(hi): i = 0 while i < hi: yield i i += 1 from itertools import compress g = gen(12) print(list(filter(None, g))) g = gen(12) print(list(compress(g, g))) Which displays: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [0, 2, 4, 6, 8, 10] The first is obviously intended, but the latter is what you get merely by giving the same argument twice to `complress()`. `compress()` can't materialize its argument(s) into a list (or tuple) first, because it's intended to work fine with infinite sequences. It could worm around that like so:under the covers: from itertools import tee g = gen(12) print(list(compress(*tee(g)))) but that's just bizarre ;-) And inefficient. Or perhaps the `compress()` implementation could grow internal conditionals to use a different algorithm if the second argument is omitted. But that would be a major change to support something that's already easily done in more than one more-than-less obvious way.
On Tue, 14 Sept 2021 at 19:58, Tim Peters <tim.peters@gmail.com> wrote:
Except it's not that simple:
Apologies, Tim, it took me a couple of reads to work out what you were saying here. I hope you won't mind if I restate the point for the benefit of anyone else who might have got confused it like I did...
def gen(hi): i = 0 while i < hi: yield i i += 1
from itertools import compress g = gen(12) print(list(filter(None, g))) g = gen(12) print(list(compress(g, g)))
Which displays:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] [0, 2, 4, 6, 8, 10]
The first is obviously intended, but the latter is what you get merely by giving the same argument twice to `complress()`.
The key point here is that the proposal that compress(g) should mean the same as compress(g, g) doesn't actually do what you were suggesting it would (and what you want), if g is an iterator - and after all, that's what itertools is supposed to be about, operations on iterators in general. To actually get the same behaviour as filter(None, g) in general needs a much more complicated change than simply saying "the default is to use the first argument twice".
`compress()` can't materialize its argument(s) into a list (or tuple) first, because it's intended to work fine with infinite sequences. It could worm around that like so:under the covers:
from itertools import tee g = gen(12) print(list(compress(*tee(g))))
but that's just bizarre ;-) And inefficient.
Or perhaps the `compress()` implementation could grow internal conditionals to use a different algorithm if the second argument is omitted. But that would be a major change to support something that's already easily done in more than one more-than-less obvious way.
At this point, it's no longer a simple change adding a fairly obvious default value for an argument that's currently mandatory, it's actually an important and significant (but subtle) change in behaviour. Honestly, I think this pretty much kills the proposal. Thanks for pointing this flaw out Tim, and sorry if I laboured the point you were making :-) Paul
On Wed, Sep 15, 2021 at 4:58 AM Tim Peters <tim.peters@gmail.com> wrote:
Or perhaps the `compress()` implementation could grow internal conditionals to use a different algorithm if the second argument is omitted. But that would be a major change to support something that's already easily done in more than one more-than-less obvious way.
At which point it'd be basically just turning compress(iter, None) into filter(None, iter) and we gain nothing. Agreed. So this proposal has minimal value and enough wrinkles to make it highly unappealing. ChrisA
participants (7)
-
Chris Angelico
-
David Mertz, Ph.D.
-
ml@dyatkovskiy.com
-
Paul Moore
-
Rob Cliffe
-
Steven D'Aprano
-
Tim Peters