For-expression/throwaway comprehension
(This is my first time posting on any Python list; I've tried to search for this idea and didn't find it but if I looked in the wrong places/this has already been discussed I apologize and feel free to tell me!) Say you have a list and you want to perform some operation on each item in the list - but you don't need to store the result in a list. There are three simple ways of doing this, at least as far as I know: ([print(item)] could be any expression, just using it as an example) ``` lst = [1, 2, 3, 4] #1 for item in lst: print(item) # 2 [print(item) for item in lst] # 3 for item in lst: print(item) ``` #1 - In my opinion, this should be a one line operation so #1 is not ideal. #2 - It also shouldn't require storing results in array, to save time/memory, so #2 is out. #3 - I think #3 is just not good syntax, it seems very unpythonic to me - it breaks the norm that blocks go on their own lines. It does seem the best of the three though and I know my assessment is kind of subjective. I'm wondering if a possible alternative syntax could be a for-expression, like there's if-expressions, which always evaluates to None: ``` print(item) for item in lst ``` A more practical example of when this would be useful is extending list-2 with a modified version of list-1 - this syntax would avoid creating an intermediate list (not sure if the way lists are implemented in python removes this advantage by the way it resizes lists though). ``` lst1 = [1, 2, 3] lst2 = [4, 5, 6] lst1.append(item * 2) for item in lst1 ```
On Fri, 26 Jul 2019 at 13:55, Eli Berkowitz <eliberkowitz@gmail.com> wrote:
Say you have a list and you want to perform some operation on each item in the list - but you don't need to store the result in a list.
There are three simple ways of doing this, at least as far as I know: ([print(item)] could be any expression, just using it as an example)
``` lst = [1, 2, 3, 4]
#1 for item in lst: print(item)
# 2 [print(item) for item in lst]
# 3 for item in lst: print(item) ```
#1 - In my opinion, this should be a one line operation so #1 is not ideal. #2 - It also shouldn't require storing results in array, to save time/memory, so #2 is out. #3 - I think #3 is just not good syntax, it seems very unpythonic to me - it breaks the norm that blocks go on their own lines. It does seem the best of the three though and I know my assessment is kind of subjective.
#1 and #3 are the same (in terms of statement structure) and I see no reason to treat them differently. (#2 is significantly different, in that it retains all the intermediate values, and can only be used for expressions). In #1 you say "this should be a one line operation" - but there's no particular reason why it "should". If it should, then #3 *is* the appropriate one-line version. In #3 you say it "is just not good syntax", and yet you said above that the statement "should" be a one-liner. The only thing that is "not good" about #3 over #1 is that it's a one-liner...
I'm wondering if a possible alternative syntax could be a for-expression, like there's if-expressions, which always evaluates to None: ``` print(item) for item in lst ```
This seems to me to be no better than #3, and worse in the sense that it isn't currently valid, whereas #3 is. I think you should simply accept that #3 is entirely valid and acceptable syntax. I use it fairly regularly, where appropriate (which isn't often - #1 *is* typically better - but it's certainly all of the cases where your proposed new syntax would be useful). Paul
By #1 "should" be a 1-liner, I mean that I think a reasonable goal is to have a good syntax for this operation to be one line. And for #3 I'm basing it also off Pep 8: "Compound statements (multiple statements on the same line) are generally discouraged." Given that the proposed alternative isn't currently valid and #1 isn't 'bad' in any way other than being an extra line, I can understand not wanting to move this forward. One last thing to consider is how this would work with filtering: ``` f(item) for item in lst if g(item) ``` saves even more space, as the #1 alternative would be ``` for item in lst: if g(item): f(item) ``` However this advantage doesn't apply to if/else as the syntax becomes ambiguous: ``` f(item) for item in lst if g else h # could mean: [f(item) for item in lst] if g else h # or [f(item) for item in lst if g else h] ```
On Fri, Jul 26, 2019 at 6:57 AM Eli Berkowitz <eliberkowitz@gmail.com> wrote:
By #1 "should" be a 1-liner, I mean that I think a reasonable goal is to have a good syntax for this operation to be one line.
And for #3 I'm basing it also off Pep 8: "Compound statements (multiple statements on the same line) are generally discouraged."
Given that the proposed alternative isn't currently valid and #1 isn't 'bad' in any way other than being an extra line, I can understand not wanting to move this forward.
One last thing to consider is how this would work with filtering: ``` f(item) for item in lst if g(item) ``` saves even more space, as the #1 alternative would be
Do understand that Python also strives for explicitness on top of clarity which suggests trying to compress logic for the sake of saving a line or two isn't a goal if it doesn't help readability. I do get the desire for having an easy way to exhaust an iterator, but even with your desire for a generator expression syntax, you can still make it short with: ``` for _ in (f(item) for item in lst if g(item)): pass ``` That doesn't require any new syntax (which is a very "expensive" thing to do in Python as you have to update books for that sort of thing). -Brett ```
for item in lst: if g(item): f(item) ``` However this advantage doesn't apply to if/else as the syntax becomes ambiguous: ``` f(item) for item in lst if g else h # could mean: [f(item) for item in lst] if g else h # or [f(item) for item in lst if g else h] ``` _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BSALZN... Code of Conduct: http://python.org/psf/codeofconduct/
On Jul 26, 2019, at 05:51, Eli Berkowitz <eliberkowitz@gmail.com> wrote:
#1 for item in lst: print(item)
# 2 [print(item) for item in lst]
# 3 for item in lst: print(item) ```
Normally, expressions are about producing a value, and if you only care about side effects, you want a statement. That’s why #3 is better than #2, beyond the watered memory. Sometimes you want to violate guidelines like that, but in that case it’s usually worth explicitly marking that you're doing so. Which you can do easily and concisely in today’s Python: #4 consume(print(item) for item in lst) This signals that you’re iterating the iterator for side effects, rather than to build a value, and it’s nicely concise. You need to write that consume function (or just borrow it from the itertools docs), but it’s a trivial one-liner you can write once, and you can choose whether you want clarity: def consume(it): for _ in it: pass … or performance: def consume(it): colllections.deque(it, maxlen=0)
Andrew Barnert wrote:
consume(print(item) for item in lst)
From my understanding, consume() effectively provides the functionality the author was looking for. Also, between the options of `for _ in iter:` vs `colllections.deque(it, maxlen=0)`, how significant is the performance difference? I had assumed that the performance of `for _ in iter` would be significantly better, since due to the overhead cost of creating and filling a double ended queue, which provides optimization for insertion at the beginning and end. Wouldn't a one directional iterator provide better performance and have a lower memory cost if there is no modification required?
On 7/26/2019 6:05 PM, Kyle Stanley wrote:
Andrew Barnert wrote:
consume(print(item) for item in lst)
From my understanding, consume() effectively provides the functionality the author was looking for. Also, between the options of `for _ in iter:` vs `colllections.deque(it, maxlen=0)`, how significant is the performance difference?
I had assumed that the performance of `for _ in iter` would be significantly better, since due to the overhead cost of creating and filling a double ended queue, which provides optimization for insertion at the beginning and end. Wouldn't a one directional iterator provide better performance and have a lower memory cost if there is no modification required?
I haven't run any numbers. But moving a loop from Python code to C code is almost always a win. That's what's happening here. And I think the idea is a that deque with 0 members doesn't have much overhead. Eric
On Fri, Jul 26, 2019 at 12:51:46PM -0000, Eli Berkowitz wrote:
(This is my first time posting on any Python list; I've tried to search for this idea and didn't find it but if I looked in the wrong places/this has already been discussed I apologize and feel free to tell me!)
Say you have a list and you want to perform some operation on each item in the list - but you don't need to store the result in a list.
Does that come up very often? That means you're performing the operation only for the side-effects. I can't think of many situations where that would be useful. And the few times that it does come up, there are usually better ways to get the result you want, e.g.: # Instead of this: for x in lst: L.append(x) # Use this: L.extend(lst) # Instead of this: for x in lst: print(x) # Use this: print(*lst, sep='\n')
There are three simple ways of doing this, at least as far as I know: ([print(item)] could be any expression, just using it as an example)
``` lst = [1, 2, 3, 4]
#1 for item in lst: print(item)
# 2 [print(item) for item in lst]
No no no, this is wrong and bad, because you are building a list full of Nones that has to be thrown away afterwards. But you know that.
# 3 for item in lst: print(item)
Aside from being compressed to one line instead of two, that's identical to #1. They generate the same code, and both are perfectly fine if you have some reason for wanting to save a line.
#1 - In my opinion, this should be a one line operation so #1 is not ideal.
Why should it be a one-liner? Do you have a shortage of vertical space to work with? Your comment here contradicts your comment about #3, where you say that making it a one-liner is "very unpythonic".
#2 - It also shouldn't require storing results in array, to save time/memory, so #2 is out. #3 - I think #3 is just not good syntax, it seems very unpythonic to me - it breaks the norm that blocks go on their own lines. It does seem the best of the three though and I know my assessment is kind of subjective.
You can't have #1 and #3 at the same time. If it is unpythonic to have a loop on one line, then you need to split it over two lines. Inventing new syntax just so you can have a loop on one line when the language already permits loops on one line is unnecessary.
I'm wondering if a possible alternative syntax could be a for-expression, like there's if-expressions, which always evaluates to None: ``` print(item) for item in lst ```
That clashes with the syntax for a generator comprehension. Generator comprehensions are funny beasts, because they require parentheses when they stand alone: it = expression for item in lst # Syntax error. it = (expression for item in lst) # Permitted. But when the parens are already there, as in a function call, you can (and should) leave the gen comprehension brackets out: all((expression for item in lst)) # Unnecessary extra parens. all(expression for item in lst) # Preferred. Because of that, your suggested syntax would be ambiguous: function(expression for item in lst) could mean you are calling function() with a single generator comprehension as argument, or you are evaluating a "for-expression" for its side-effects and then calling function() with None as argument.
A more practical example of when this would be useful is extending list-2 with a modified version of list-1 - this syntax would avoid creating an intermediate list (not sure if the way lists are implemented in python removes this advantage by the way it resizes lists though).
``` lst1 = [1, 2, 3] lst2 = [4, 5, 6] lst1.append(item * 2) for item in lst1 ```
lst2.extend(item*2 for item in lst1) lst2.extend(map(lambda x: 2*x, lst1) In my personal toolbox, I have this function: def do(func, iterable, **kwargs): for x in iterable: func(x, **kwargs) Which I can use like this: do(print, lst) But I hardly ever do, since most of the time there are simpler alternatives. But feel free to use it in your own code. -- Steven
On Fri, Jul 26, 2019 at 10:06 PM Kyle Stanley <aeros167@gmail.com> wrote:
From my understanding, consume() effectively provides the functionality the author was looking for. Also, between the options of `for _ in iter:` vs `colllections.deque(it, maxlen=0)`, how significant is the performance difference?
I had assumed that the performance of `for _ in iter` would be significantly better, since due to the overhead cost of creating and filling a double ended queue, which provides optimization for insertion at the beginning and end. Wouldn't a one directional iterator provide better performance and have a lower memory cost if there is no modification required?
collections.deque with an explicit maxlen of 0 doesn't actually populate the queue at all; it has a special case for maxlen 0 that just pulls items and immediately throws away the reference to what it pulled without storing it in the deque at all. They split off that special case into its own function at the C layer, consume_iterator: https://github.com/python/cpython/blob/master/Modules/_collectionsmodule.c#L... It's basically impossible to beat that in CPython in the general case. By contrast, for _ in iterable would need to execute at least three bytecodes per item (advance iterator, store, jump), which is *way* more expensive per item. collections.deque(maxlen=0) can lose for small inputs (because it does have to call a constructor, create a deque, then throw it away; precreating a singleton for consume with `consumer = collections.deque(maxlen=0).extend` can save on some of that though), but for any meaningful length input, the reduced cost per item makes up for it.
On Jul 26, 2019, at 15:05, Kyle Stanley <aeros167@gmail.com> wrote:
Andrew Barnert wrote:
consume(print(item) for item in lst)
From my understanding, consume() effectively provides the functionality the author was looking for.
Exactly. And it’s readable and concise, and there’s even an implementation in the docs if you don’t think of it yourself.
Also, between the options of `for _ in iter:` vs `colllections.deque(it, maxlen=0)`, how significant is the performance difference?
I had assumed that the performance of `for _ in iter` would be significantly better, since due to the overhead cost of creating and filling a double ended queue, which provides optimization for insertion at the beginning and end. Wouldn't a one directional iterator provide better performance and have a lower memory cost if there is no modification required?
The maxlen of 0 means after determining that 0+1 > 0, the (C) function returns without even getting to the array manipulation stuff. Consuming the iterator in a for loop does even less work inside the loop, but it means the loop is in Python rather than C, which is a lot more expensive than the INC and JNZ that you save. You’re right that it probably rarely makes a difference, but for something that’s going to be recommended in the official docs and potentially used in who-knows-what code, apparently someone thought it was worth the effort to benchmark. I don’t know if anyone has retested this recently, but there was a StackOverflow question maybe 5 years back asking why itertools did this instead of something faster, and IIRC, after testing every idea everyone had an a variety of platforms and versions, the conclusion was that deque was (still) by far the fastest way to do it in CPython (short of a custom C function for consume, and even that isn’t much faster), and not quite the fastest but close enough in PyPy.
These are all fair and good points :) I really like the idea of writing a function that exhausts an iterator and using that. It seems like this should be a part of the itertools package at least, maybe called `run`, `do`, or `exhaust`. ``` from itertools import run run(list2.append(item * 6) for item in list1) ``` I think that this code reads very nicely - "run this expression for each item in this list". Thoughts?
On Aug 1, 2019, at 00:06, Eli Berkowitz <eliberkowitz@gmail.com> wrote:
These are all fair and good points :)
I really like the idea of writing a function that exhausts an iterator and using that. It seems like this should be a part of the itertools package at least, maybe called `run`, `do`, or `exhaust`.
There’s already the consume(iterator, n=None) function in the recipes section of the docs: "Advance the iterator n-steps ahead. If n is None, consume entirely." It’s only 4 lines of code, and you can copy and paste it from the code. Or, if you don’t need the optional n, the function is just a one-liner. There’s been discussion in the past about moving some of the recipes into the module itself. Besides consume, functions like grouper and unique_everseen have pretty common and broad uses, and they’re less trivial, and you can’t quite just copy-paste them (unless you want to from itertools import * they need minor edits). I think the original reason not to do so was that the module was pure C code, and the recipes don’t need to be in C, and are as useful as sample code as they are for direct use, so they should remain in Python. Given that in 3.x every stdlib module is supposed to be in Python with an optional C accelerator, that might not be as compelling anymore. But you’d still need to overcome the conservative presumption of the status quo with a good argument. Meanwhile, the third-party library more_itertools includes all of the recipes (and it’s kept up-to-date, so the occasional new recipes are available to anyone who installs it, even with older versions of Python). So, just add that to your requirements file, and you can import consume from more_itertools instead of itertools.
On Thu, Aug 1, 2019 at 9:19 AM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
Given that in 3.x every stdlib module is supposed to be in Python with an optional C accelerator,
Is this written down somewhere? And when was that policy decided? When we added math.is_close() ( https://www.python.org/dev/peps/pep-0485/) someone (Victor Stinner?) re-wrote the math module as a Python wrapper around the C one, so we could add pure Python functions to it. But that was rejected, and we stuck with pure C for that one. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
In terms of an argument for why it should be included, for most Python users the itertools recipes remain unseen. I consider myself relatively fluent in Python (not compared to y'all of course) and hadn't seen them until I started this thread. If I had to hazard a guess, they're probably unused by 95%+ of the Python userbase. Having a function like consume as a builtin or part of a library widely increases visibility, and as it's a clean and efficient way of running an expression on a collection of data I think it's worth it especially as it has very little overhead. Also, given that `consume_iterator` is already in CPython in the collections module as a special case for `collections.deque(..., maxlen=0)` (and maybe some other things), I'm guessing it would not be hard at all to add consume to the itertools module, though I'll be honest I have no idea how Python's underlying C code fits together at the module level. https://github.com/python/cpython/blob/master/Modules/_collectionsmodule.c#L...
On Thu, Aug 1, 2019 at 10:49 AM Eli Berkowitz <eliberkowitz@gmail.com> wrote:
In terms of an argument for why it should be included, for most Python users the itertools recipes remain unseen. I consider myself relatively fluent in Python (not compared to y'all of course) and hadn't seen them until I started this thread. If I had to hazard a guess, they're probably unused by 95%+ of the Python userbase. Having a function like consume as a builtin or part of a library widely increases visibility, and as it's a clean and efficient way of running an expression on a collection of data I think it's worth it especially as it has very little overhead.
Also, given that `consume_iterator` is already in CPython in the collections module as a special case for `collections.deque(..., maxlen=0)` (and maybe some other things), I'm guessing it would not be hard at all to add consume to the itertools module, though I'll be honest I have no idea how Python's underlying C code fits together at the module level.
https://github.com/python/cpython/blob/master/Modules/_collectionsmodule.c#L...
You can open an issue at bugs.python.org to make consume() a top-level itertools function, but check it hasn't been asked for before.
On Thu, Aug 1, 2019 at 10:49 AM Eli Berkowitz <eliberkowitz@gmail.com> wrote:
In terms of an argument for why it should be included, for most Python users the itertools recipes remain unseen. I consider myself relatively fluent in Python (not compared to y'all of course) and hadn't seen them until I started this thread. If I had to hazard a guess, they're probably unused by 95%+ of the Python userbase. Having a function like consume as a builtin or part of a library widely increases visibility, and as it's a clean and efficient way of running an expression on a collection of data I think it's worth it especially as it has very little overhead.
Also, given that `consume_iterator` is already in CPython in the collections module as a special case for `collections.deque(..., maxlen=0)` (and maybe some other things), I'm guessing it would not be hard at all to add consume to the itertools module, though I'll be honest I have no idea how Python's underlying C code fits together at the module level.
https://github.com/python/cpython/blob/master/Modules/_collectionsmodule.c#L...
This is an interesting phenomenon. I'm not saying it's good or bad, I'm just observing it (because it surprised me). Here is someone declaring that the docs are less accessible than the code. I personally am disappointed, given the amount of effort that we put in those docs. But maybe this is true. If we can't get people to peruse the docs, should we bother? OTOH suppose you have this problem, of wanting to call a function over an iterator without producing a list of dummy results. Presumably you're aware of the obvious solution (`for x int xs: f(x)`). What drives you to look for a purely functional approach? How do you know to search for "consume iterator"? (I Googled this, and the top two hits are a StackOverflow question about the specific meaning of "consuming", and the itertools docs -- apparently in those docs, everything "consumes" iterators.) How do you know this isn't premature optimization? (I bet we've collectively spent more energy on this thread than Facebook could save in a year by using whatever solution we could create. :-) I learned something in this thread -- I had no idea that the deque datatype even has an option to limit its size (and silently drop older values as new ones are added), let alone that the case of setting the size to zero is optimized in the C code. But more importantly, I don't think I've ever needed either of those features, so maybe I was better off not knowing about them? Have we collectively been nerd-sniped by an interesting but unimportant problem? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Aug 1, 2019, at 09:27, Christopher Barker <pythonchb@gmail.com> wrote:
On Thu, Aug 1, 2019 at 9:19 AM Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Given that in 3.x every stdlib module is supposed to be in Python with an optional C accelerator,
Is this written down somewhere? And when was that policy decided?
PEP 399. The PEP explicitly says that it only applies to new modules; the entire existing stdlib is grandfathered in:
This PEP does not mandate that pre-existing modules in the stdlib that lack a pure Python equivalent gain such a module. But if people do volunteer to provide and maintain a pure Python equivalent (e.g., the PyPy team volunteering their pure Python implementation of the csv module and maintaining it) then such code will be accepted. In those instances the C version is considered the reference implementation in terms of expected semantics.
So it’s certainly not mandatory to rewrite itertools in pure Python with a C accelerator, or even as a Python wrapper around C-only functions, if there’s no reason to do so. But if there _is_ a reason to do so, the default seems to be that it would be accepted. The benefit of copying some or all of the recipes into the module would still need to be sold before anyone agreed there actually is a good reason for the reimplementation. And of course the OP or someone else would have to volunteer to do the work. (Last I checked, PyPy doesn’t have a no-objspace version that could be borrowed.) But I don’t think it’s a non-starter the way it was in 2.6.
When we added math.is_close() ( https://www.python.org/dev/peps/pep-0485/) someone (Victor Stinner?) re-wrote the math module as a Python wrapper around the C one, so we could add pure Python functions to it. But that was rejected, and we stuck with pure C for that one.
I suspect math is a special case, in that it would presumably have received a special dispensation even if it were a brand-new module, since it mostly thinly wraps a C library (libm), and since a pure-Python implementation wouldn’t have been of any use to Jython or Iron or even PyPy. But, without reading the whole discussion in detail, it seems like the initial presumption was still that it was worth trying a Python wrapper, and it was only after that didn’t work out as well as hoped that it was rejected?
[Guido]
,,, I learned something in this thread -- I had no idea that the deque datatype even has an option to limit its size (and silently drop older values as new ones are added), let alone that the case of setting the size to zero is optimized in the C code. But more importantly, I don't think I've ever needed either of those features, so maybe I was better off not knowing about them?
I was aware of both, but never used maxsize=0. It appeals, I guess, for much the same reason it's sometimes convenient to throw away output in a Unixy shell just by redirecting to /dev/null. I write the `for` loop instead because it's clear at once. Non-zero sizes do have real uses, obviously so when working with linear recurrences. For example, a Fibonacci generator: def fib(): from collections import deque d = deque([0, 1], 2) yield from d while True: c = sum(d) yield c d.append(c) Of course the deeper the recurrence, the more pleasant this is than hoping not to make a subtle typo when shifting N variables "by hand".
Have we collectively been nerd-sniped by an interesting but unimportant problem?
Pretty much ;-) Here's another: what's the fastest way to get a Python loop to go around N times? Bingo: for _ in itertools.repeat(None, N): No dynamic memory churn: no new objects created per iteration. not even under the covers (the implementation uses a native C ssize_t to hold the remaining iteration count). itertools is the answer to every question ;-)
On Aug 1, 2019, at 10:48, Eli Berkowitz <eliberkowitz@gmail.com> wrote:
In terms of an argument for why it should be included, for most Python users the itertools recipes remain unseen.
Sure. But do you also know all of the functions in the module? Would you notice or a new one got added? In practical terms, how much more discoverable is groupby than grouper? I do occasionally search for things I think might be there by importing a module in the interactive interpreter and trying help or auto-complete. But I think I go to the docs, or a search engine, or StackOverflow, a lot more often. That being said, it isn’t zero benefit. Plus, there is another potential argument for adding at least some of the recipes to the module. When I show a novice how to import and use cycle, they they get excited and go off and use it. When I show them how to copy and paste grouper, they sometimes seem reluctant—I’m encouraging them to put code they don’t understand into their source files. (And, while explaining how and why grouper works is a great teaching opportunity for some really important concepts, that’s not always the top priority at the time…) So I usually tell them to pip install more-itertools and use that, and mention the recipes so they can maybe read them later
On 01/08/2019 19:11:18, Guido van Rossum wrote:
Here is someone declaring that the docs are less accessible than the code. I personally am disappointed, given the amount of effort that we put in those docs. But maybe this is true. If we can't get people to peruse the docs, should we bother?
One data point: I constantly use the docs and help(). I regard digging into the code as more effort and a second resort. I'm well aware from personal experience that documenting code can be (much) more effort than writing it, but I hope you continue to "bother".
When I show a novice how to import and use cycle, they they get excited and go off and use it. When I show them how to copy and paste grouper, they sometimes seem reluctant—
TLDR: a beginner can read, understand, and use the itertools module incredibly easily. But putting a recipe aside for your own use later? Forget it. That took my almost 3 years to get comfortable with. Longer versinon: I had this exact reaction for a very long time to doing this kind of thing (copying code and pasting it somewhere). The main reason is, as a beginner, the entire concept of having a "bag of my own tools" in my own code somewhere (whether copied and pasted or not, doesn't matter) was very intimidating. And it was a near certainly I would forget that I put a thing somewhere to use later, anyway, and I knew it. And if by some miracle I DID remember, figuring out how to package it up and import it later and fix it if it was broken (assuming that I wrote the code well enough to read and understand it 1 year later!) and then update the stored version and have it perpetuate into other code and tracking what I did and OH MY GOD BEGINNER BRAIN AHSPLODE. I think many people quickly lose track of how hard it is to be new to all of this stuff.
On Aug 1, 2019, at 12:52, Ricky Teachey <ricky@teachey.org> wrote:
TLDR: a beginner can read, understand, and use the itertools module incredibly easily. But putting a recipe aside for your own use later? Forget it. That took my almost 3 years to get comfortable with.
I can understand that. I could tell you to just bookmark the recipes and get used to coming back there, so you don’t have to figure out how to build and maintain a personal toolbox. But that probably still isn’t an easy sell for novices. But what about `pip install more-itertools`? Hopefully you become comfortable with that a lot faster than 3 years in. If not, the packaging team will probably be disappointed to hear it… (I have occasionally had other people insist that I shouldn’t tell users to pip install things… but those are the same people insisting I should always start by explaining how to do it in Python 2.5 rather than 3.7. The actual people being helped have never seemed reticent.)
On Thu, Aug 1, 2019 at 12:54 PM Ricky Teachey <ricky@teachey.org> wrote:
When I show a novice how to import and use cycle, they they get excited
and go off and use it. When I show them how to copy and paste grouper, they sometimes seem reluctant—
TLDR: a beginner can read, understand, and use the itertools module incredibly easily. But putting a recipe aside for your own use later? Forget it. That took my almost 3 years to get comfortable with.
Longer versinon:
I had this exact reaction for a very long time to doing this kind of thing (copying code and pasting it somewhere).
The main reason is, as a beginner, the entire concept of having a "bag of my own tools" in my own code somewhere (whether copied and pasted or not, doesn't matter) was very intimidating. And it was a near certainly I would forget that I put a thing somewhere to use later, anyway, and I knew it.
And if by some miracle I DID remember, figuring out how to package it up and import it later and fix it if it was broken (assuming that I wrote the code well enough to read and understand it 1 year later!) and then update the stored version and have it perpetuate into other code and tracking what I did and OH MY GOD BEGINNER BRAIN AHSPLODE. I think many people quickly lose track of how hard it is to be new to all of this stuff.
Ricky, thanks for sharing that. I agree that for core devs it is often hard to walk in beginners' shoes (it's one reason I sometimes mentor beginners). At the same time I think mentors and teachers and book authors should be encouraging their mentees/students/readers to do exactly this: create their own bag of tools to carry around. I worry that the alternative would be that we'd end up with a stdlib that contains way too many functions to do small tasks that could be expressed in 2-3 lines of code -- this would defeat the purpose of Python as a language that's easy to learn. Since I'm in a philosophical mood, I'll end with an analogy. When I was a kids, Lego had bricks of various sizes and colors that all fit together. You could build anything you wanted from the limited set of shapes, and they could be recombined endlessly. We were excited when the first bricks appeared with a slanted side so you could make more realistic roofs (only in red!). And wheels were great -- they came in two sizes, small and large (so you could make a tractor). But when my son went through his Lego phase, every kit came with pieces in all sorts of custom shapes -- airplane wings, ship's hulls, pre-made archways, plus endless accessories for the "guys". And nowadays (a decade later) most Lego kits are strongly branded with things like Harry Potter or Star Wars, and good luck doing something else with the pieces -- it seems you can never find two matching wheels even, or enough red bricks to build a red house. I don't want Python to become the modern-day Lego. The craft of programming includes being able to combine pieces in all sorts of ways. Now, it's fine for 3rd party packages to have a different philosophy. (Pandas seems to cater to the crowd that wants a function for every conceivable operation, and clearly there's a market for that. :-) But the stdlib ought to remain parsimonious (I think that's the word). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
But what about `pip install more-itertools`? Hopefully you become comfortable with that a lot faster than 3 years in. If not, the packaging team will probably be disappointed to hear it…
(I have occasionally had other people insist that I shouldn’t tell users to pip install things… )
pip install more-itertools is a GREAT solution for the problem I described, something I think the average 100% self-taught, never-used-the-command-line-before, learning-in-their-spare-time beginner (which is what I was) can get comfortable with within the first month, I'd say. The only problem with it is what you went on to say in your second comment: I've seen lots of "beginner advice" out there saying that using other people's code you don't understand is a bad habit and you shouldn't be doing it. Some of that comes from other communities (I've heard it from javascript pros I'm friendly with). There is SOME truth to this of course, so it helps a lot when the elder statesfolks "oh when you need to do X, a great 3rd party package for that is Y." I have never heard of more-itertools until today. I'll definitely be suggesting others use it, and use it myself.
Ricky, thanks for sharing that. I agree that for core devs it is often hard to walk in beginners' shoes (it's one reason I sometimes mentor beginners).
At the same time I think mentors and teachers and book authors should be encouraging their mentees/students/readers to do exactly this: create their own bag of tools to carry around.
I don't want Python to become the modern-day Lego.
Thanks. I think it's a good analogy. pip install is a low enough bar, imo, that an experienced person shouldn't feel like they are expecting too much to tell a beginner to do that. But be prepared to explain HOW, of course, haha. I had to learn it so many times, and I'm generally not considered a slow person.
On Fri, Aug 2, 2019 at 7:36 AM Ricky Teachey <ricky@teachey.org> wrote:
But what about `pip install more-itertools`? Hopefully you become comfortable with that a lot faster than 3 years in. If not, the packaging team will probably be disappointed to hear it…
(I have occasionally had other people insist that I shouldn’t tell users to pip install things… )
pip install more-itertools is a GREAT solution for the problem I described, something I think the average 100% self-taught, never-used-the-command-line-before, learning-in-their-spare-time beginner (which is what I was) can get comfortable with within the first month, I'd say.
The only problem with it is what you went on to say in your second comment: I've seen lots of "beginner advice" out there saying that using other people's code you don't understand is a bad habit and you shouldn't be doing it. Some of that comes from other communities (I've heard it from javascript pros I'm friendly with).
There is SOME truth to this of course, so it helps a lot when the elder statesfolks "oh when you need to do X, a great 3rd party package for that is Y." I have never heard of more-itertools until today. I'll definitely be suggesting others use it, and use it myself.
There is a LOT of truth to it. If you have a problem, you search the web for it, and the first hit is a pip-installable (or npm-installable or gem-installable etc) package, how do you know whether it's a good solution to your problem or a brand new time sink (or worse)? Sometimes, an unfortunate choice of keyword can mean a completely unknown package ranks higher than a far better one, just because its description happens to match the way you chose to word your problem. The stdlib does reference a small number of third-party packages (requests etc). We don't want to go overboard with that, but I think more-itertools is worth referencing. Maybe there needs to be a second-tier recommendation, where a list of packages can be given that aren't category killers, but have been given the blessing of the Python devs as "this is a good-quality, well-maintained package, and can be depended on"? ChrisA
On Aug 1, 2019, at 14:52, Chris Angelico <rosuav@gmail.com> wrote:
The stdlib does reference a small number of third-party packages (requests etc). We don't want to go overboard with that, but I think more-itertools is worth referencing. Maybe there needs to be a second-tier recommendation, where a list of packages can be given that aren't category killers, but have been given the blessing of the Python devs as "this is a good-quality, well-maintained package, and can be depended on"?
I agree. I don’t think more-itertools meets the category-killer standard, because of toolz. (Briefly, more-itertoolz has the exact recipes from the latest Python version’s docs plus a good set of its own extras, while toolz has a very large and well-integrated collection of tools that grew out of the recipes in their own direction.) But athey’re both solidly-maintained and widely-used packages, so I don’t think there would be much danger in blessing more-itertools, or even both of them, as “second-tier recommendations”. If that is something the Python devs want to get into doing, this is probably one of the best recommendations to consider making.
Rob Cliffe via Python-ideas writes:
On 01/08/2019 19:11:18, Guido van Rossum wrote:
Here is someone declaring that the docs are less accessible than the code. I personally am disappointed, given the amount of effort that we put in those docs. But maybe this is true. If we can't get people to peruse the docs, should we bother?
One data point: I constantly use the docs and help(). I regard digging into the code as more effort and a second resort. I'm well aware from personal experience that documenting code can be (much) more effort than writing it, but I hope you continue to "bother".
Another data point: I constantly use the docs and help(). I regard digging into the code as a reward for getting $DAYJOB done, but that rarely happens. Let's keep the docs excellent! Sometimes it even helps me get $DAYJOB done. More seriously, without the docs, all bugs are merely surface tension. "That's what the code does, so it must be right!" That's a somewhat facetious way of expressing that there needs to be a spec independent of the code to settle the "bug vs. feature" arguments, at least during release beta periods. Steve
Andrew Barnert via Python-ideas writes:
Maybe there needs to be a second-tier recommendation, where a list of packages can be given that aren't category killers, but have been given the blessing of the Python devs as "this is a good-quality, well-maintained package, and can be depended on"?
I agree.
Wiki-iki-iki-iki! Probably with some curation, I think this would be quite easy to set up on github. Steve
On 2019-08-01 12:31, Andrew Barnert via Python-ideas wrote:
On Aug 1, 2019, at 10:48, Eli Berkowitz<eliberkowitz@gmail.com> wrote:
In terms of an argument for why it should be included, for most Python users the itertools recipes remain unseen.
Sure. But do you also know all of the functions in the module? Would you notice or a new one got added? In practical terms, how much more discoverable is groupby than grouper?
It is massively more discoverable, for one simple reason: autocomplete. In teaching people to program, I often use Jupyter notebook, which has great autocomplete functionality that can also bring up the documentation on any function. You can type itertools.[TAB] and get a list, and then you can scroll down the list looking for a likely function, and when you get to it you can hit Shift-Tab and see the documentation. Certainly other IDEs have similar functionality. This is a colossal win over having to go the documentation and look through the text for a recipe that is not "addressable" in any way. You can't even link to it, for heaven's sake! The function docs in all the modules have permalinks but the recipes are just unstructured text. I don't fully understand the resistance to adding these functions to itertools itself. As what I mentioned above indicates, they already ARE in itertools from a documentation perspective. They're just there as irritatingly unstructured text instead of integrated into the clean, useful format that the documentation generally follows. They already have to be maintained because they're part of the official docs; if some change were made to itertools that required one of the recipes to change, the recipe would have to be changed, because it's already there in the official docs. From some other posts in the thread I get the impression some people think there is (or should be) some sort of multi-tiered system along the lines of: 1) in the stdlib 2) in a "recipe" in the stdlib but you have to copy and paste it 3) in an "officially sanctioned" third-party lib which is a "category killer" (like requests) 4) in a "somewhat less officially sanctioned" third-party lib which isn't a "category killer" (like toolz?) 5) in an ordinary third-party lib of which the documentation makes no mention Some of those may be good ideas, but I don't see the use of the distinction between tiers 1 and 2. If there's going to be a full implementation of a simple function in textual form in the docs, and that textual form is going to be maintained precisely because we already know the function is of general utility, what is gained by insisting that it remain only in textual form and not in code form that can actually be used? This is clearly different from other cases in the docs where functions or code snippets are provided solely as EXAMPLES that no one would actually use as-is in real life. The "grouper" example isn't an example or sketch of how to write a function that groups into chunks: it IS a function that groups into chunks. Yeah, sure, you might sometimes want to write a slight tweak on it, but I don't get why that means that an already-working version has to be dangled in front of people in text form without actually being usable (or discoverable via autocomplete). The way the "recipes" are presented in the existing docs clearly indicates that they are widely useful as-is, and they already have to be maintained as text, so why not just put them in for real? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On 02/08/2019 06:26, Brendan Barnwell wrote:
It is massively more discoverable, for one simple reason: autocomplete.
In teaching people to program, I often use Jupyter notebook, which has great autocomplete functionality that can also bring up the documentation on any function. You can type itertools.[TAB] and get a list, and then you can scroll down the list looking for a likely function, and when you get to it you can hit Shift-Tab and see the documentation. Certainly other IDEs have similar functionality.
This is a colossal win over having to go the documentation and look through the text for a recipe that is not "addressable" in any way. You can't even link to it, for heaven's sake! The function docs in all the modules have permalinks but the recipes are just unstructured text.
I'd have to challenge that "colossal win". I am very uncomfortable with IDEs that try to do my thinking for me, and I start turning things off on those occasions when I am forced to use them. It would even occur to me to try autocompletion. Reading the documentation is so much easier, and far more likely to point me at what the right answer actually is, rather than just what I think it might be. -- Rhodri James *-* Kynesim Ltd
On Fri, Aug 2, 2019 at 6:38 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
On 02/08/2019 06:26, Brendan Barnwell wrote:
It is massively more discoverable, for one simple reason: autocomplete.
In teaching people to program, I often use Jupyter notebook, which has great autocomplete functionality that can also bring up the documentation on any function. You can type itertools.[TAB] and get a list, and then you can scroll down the list looking for a likely function, and when you get to it you can hit Shift-Tab and see the documentation. Certainly other IDEs have similar functionality.
This is a colossal win over having to go the documentation and look through the text for a recipe that is not "addressable" in any way. You can't even link to it, for heaven's sake! The function docs in all the modules have permalinks but the recipes are just unstructured text.
I'd have to challenge that "colossal win". I am very uncomfortable with IDEs that try to do my thinking for me, and I start turning things off on those occasions when I am forced to use them. It would even occur to me to try autocompletion. Reading the documentation is so much easier, and far more likely to point me at what the right answer actually is, rather than just what I think it might be.
There seems to be a clash of generations here, or perhaps a clash of different educational paths. I'm in the same boat as you, but Python's recent success is definitely driven by things like IPython and Jupyter Notebooks which are optimized for exactly the approach to learning that Brendan describes. (And make no mistake about it, it is a form of learning!) I wonder what Raymond thinks (he's the maintainer of itertools and also an educator). PS. Raymond, if you're not reading python-ideas, Brandon's message is here, and it's well-written: https://mail.python.org/archives/list/python-ideas@python.org/message/URV4E3... -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
It is massively more discoverable, for one simple reason:
autocomplete.
I am very uncomfortable with IDEs that try to do my thinking for me, and I start turning things off on those occasions when I am forced to use them. It would even occur to me to try autocompletion.
There seems to be a clash of generations here, or perhaps a clash of different educational paths.
I agree it is more a clash of learning paths/training. I started out-- with ZERO knowledge-- in 2014 just before Jupyter started to be popular, and the more I have used it (as well as VSCode and Pycharm), the more I have found myself thinking "OH MAN-- if I had had THIS when I was learning, I would have learned so much faster with these code/feature discovery tools". Notebooks in particular-- more than IDEs, I think-- are changing the way people learn. if I were teaching someone new today, I'd have them use Jupyter right away, probably not the repl. But I still use the docs a lot too.
On Fri, Aug 02, 2019 at 10:48:26AM -0400, Ricky Teachey wrote:
I agree it is more a clash of learning paths/training. I started out-- with ZERO knowledge-- in 2014 just before Jupyter started to be popular, and the more I have used it (as well as VSCode and Pycharm), the more I have found myself thinking "OH MAN-- if I had had THIS when I was learning, I would have learned so much faster with these code/feature discovery tools".
Notebooks in particular-- more than IDEs, I think-- are changing the way people learn. if I were teaching someone new today, I'd have them use Jupyter right away, probably not the repl.
I think Smalltalk and Lisp had similar environments. Also the tradeoffs of "living in the environment" vs. "knowing the code base by reading and maintaining a mental map of files and documentation" are well known. The latter is probably required for larger code bases. Stefan Krah
On 02/08/2019 15:22, Guido van Rossum wrote:
There seems to be a clash of generations here, or perhaps a clash of different educational paths.
Very likely. I did my computer science learning on an IBM mainframe, with a local front-end over MVS. A full-screen editor was introduced during the year :-) I think as a result I don't trust tools that I can't take apart and put back together myself; if I don't know roughly how something works, I tend not to use it at all. This isn't just prejudice, I've had far too many occasions when I've had to undo something clever a "helpful" tool has done for me. Cue the Yorkshiremen sketch... -- Rhodri James *-* Kynesim Ltd
On Aug 1, 2019, at 22:26, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2019-08-01 12:31, Andrew Barnert via Python-ideas wrote:
On Aug 1, 2019, at 10:48, Eli Berkowitz<eliberkowitz@gmail.com> wrote:
In terms of an argument for why it should be included, for most Python users the itertools recipes remain unseen.
Sure. But do you also know all of the functions in the module? Would you notice or a new one got added? In practical terms, how much more discoverable is groupby than grouper?
It is massively more discoverable, for one simple reason: autocomplete.
I already mentioned, in that same post you’re replying to, that I sometimes find functions by autocomplete, especially when I can guess at the name. But I rarely use autocomplete to view the entire contents of a module. Maybe using Jupyter in a notebook vs. iPython in a terminal window is different enough that I would autocomplete `itertools.` more often. Or maybe it’s just because I’m old, as others have pointed out about themselves. At any rate, if a lot of other people do autocomplete `itertools.` rather than just using autocomplete to confirm a guess or a vague memory, then it doesn’t matter if I’m not one of those people, so, point taken.
This is a colossal win over having to go the documentation and look through the text for a recipe that is not "addressable" in any way. You can't even link to it, for heaven's sake! The function docs in all the modules have permalinks but the recipes are just unstructured text.
That’s a separate issue. There’s nothing inherent in the docs that disallows a link to each recipe, it’s just not the way they’re written. I suspect if someone filed a bug to fix that and submitted a patch, Raymond Hettinger wouldn’t object (or others might override him if he did).
I don't fully understand the resistance to adding these functions to itertools itself.
Honestly, I think a lot of the resistance is the implementation issue, or at least it’s the reason the resistance is hard to overcome. If someone isn’t sure whether the benefit of having itertools.consume is worth the cost of implementing and maintaining it, the fact that the implementation would have to be in C pushes them hard in the no direction. Especially since consume is at least as important as sample code as usable code, so you’d still need to maintain the pure Python “equivalent” in the docs, as with most of the core itertools functions. (And notice that, unlike a Python module with optional C accelerator; a C module with Python equivalent in the docs can’t easily be unit tested to verify that equivalence—and many itertools functions would fail, which is why the docs say “roughly equivalent”.) But if someone did the work to rewrite itertools as a Python-plus-optional-accelerator module (as it would have been if it had been submitted after PEP 399(, or at least as a hybrid Python-re-exporting-from-C, then adding consume or grouper or a set of recipes or even all of them only faces the same hurdle as adding a new function to any already-in-Python module, which is a lower hurdle. But nobody who wants the recipes ever volunteers to tackle that rewrite, prove that it doesn’t break any tests or hurt any benchmarks, etc. And nobody else has any motivation to do the rewrite.
From some other posts in the thread I get the impression some people think there is (or should be) some sort of multi-tiered system along the lines of:
1) in the stdlib 2) in a "recipe" in the stdlib but you have to copy and paste it 3) in an "officially sanctioned" third-party lib which is a "category killer" (like requests) 4) in a "somewhat less officially sanctioned" third-party lib which isn't a "category killer" (like toolz?) 5) in an ordinary third-party lib of which the documentation makes no mention
Well, there already _is_ such a hierarchy, but it’s missing the (4) case. That’s why more-itertools is in the less discoverable category 5 today. Whoever proposed that (Chris?) was trying to make things more accessible, not less.
Some of those may be good ideas, but I don't see the use of the distinction between tiers 1 and 2.
It’s partly historical, but not entirely. There’s plenty of recipes in the docs that are aren’t like itertools.consume. Look at the HOWTOs full of recipes for the more complicated modules; they’d become a lot more complicated if you had to make them configurable after import instead of tweakable after copy-paste. And thing about things like the threaded URL downloader—that might be useful, but not in the concurrent.futures library. So eliminating category 2 seems like a bad idea. And it also seems unnecessary for your point. You don’t need to eliminate case 2, or even move the bar between cases 1 and 2, you just need to move the itertools recipes, which are already close to the bar, over it.
I'm definitely on the old side of the distribution of programmers, and I strongly appreciate tab expansion in tools like Jupyter and vim. I never used a full "IDE", whatever the boundary line is. But exactly that kind of reminder of e.g. "what's in itertools again?" is very helpful to me, both when I wrote programs and when I teach them. ... But Raymond will always remain a month and a half older than me :-). On Fri, Aug 2, 2019, 10:25 AM Guido van Rossum <guido@python.org> wrote:
On Fri, Aug 2, 2019 at 6:38 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
On 02/08/2019 06:26, Brendan Barnwell wrote:
It is massively more discoverable, for one simple reason: autocomplete.
In teaching people to program, I often use Jupyter notebook, which has great autocomplete functionality that can also bring up the documentation on any function. You can type itertools.[TAB] and get a list, and then you can scroll down the list looking for a likely function, and when you get to it you can hit Shift-Tab and see the documentation. Certainly other IDEs have similar functionality.
This is a colossal win over having to go the documentation and
look
through the text for a recipe that is not "addressable" in any way. You can't even link to it, for heaven's sake! The function docs in all the modules have permalinks but the recipes are just unstructured text.
I'd have to challenge that "colossal win". I am very uncomfortable with IDEs that try to do my thinking for me, and I start turning things off on those occasions when I am forced to use them. It would even occur to me to try autocompletion. Reading the documentation is so much easier, and far more likely to point me at what the right answer actually is, rather than just what I think it might be.
There seems to be a clash of generations here, or perhaps a clash of different educational paths. I'm in the same boat as you, but Python's recent success is definitely driven by things like IPython and Jupyter Notebooks which are optimized for exactly the approach to learning that Brendan describes. (And make no mistake about it, it is a form of learning!) I wonder what Raymond thinks (he's the maintainer of itertools and also an educator).
PS. Raymond, if you're not reading python-ideas, Brandon's message is here, and it's well-written:
https://mail.python.org/archives/list/python-ideas@python.org/message/URV4E3...
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/B5FUOW... Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, Aug 3, 2019 at 12:48 AM Ricky Teachey <ricky@teachey.org> wrote:
Notebooks in particular-- more than IDEs, I think-- are changing the way people learn. if I were teaching someone new today, I'd have them use Jupyter right away, probably not the repl.
But I still use the docs a lot too.
Also a bit old-school (it took me many years to learn the value of syntax highlighting), and an educator, and I've seen students start out with Jupyter. As an alternative to the vanilla REPL, I think it's awesome if a little expensive on low-end machines; but as an alternative to a text editor, it's an attractive nuisance. Yes, you can try things out and see the results instantly, AND you can save it, edit, rerun, etc; the cost is that debugging becomes a nightmare. But for discoverability, incl tab completion? It's great, and I probably should start using it more. ChrisA
On 8/2/19 1:18 PM, David Mertz wrote:
I'm definitely on the old side of the distribution of programmers ...
Throw me into that bucket, too. I use autocomplete to save typing rather than to discover new functionality. Autocompletion is great, if you know where to start. In the case that started this thread, there wasn't a place to start. When I'm looking for an alternative to an if statement or a list comprehension, how would I even guess to start with itertools, let alone more-itertools?
... that kind of reminder of e.g. "what's in itertools again?" is very helpful to me, both when I wrote programs and when I teach them.
I agree, but that's assuming that you "know" what's there and just have to be reminded. IMO, it helps less with discovering new functions, and even less than that when it comes to discovering "new" python modules. Dan
as an alternative to a text editor, it's an attractive nuisance. Yes, you can try things out and see the results instantly, AND you can save it, edit, rerun, etc; the cost is that debugging becomes a nightmare.
Remember that Jupyter has a pretty nice text editor (with syntax highlighting! but no autocomplete or discovery... yet), and there's nothing stopping you from teaching a student how to write a .py file in the Jupyter interface and then import it in a cell. And then showing them how they can do the same thing with ANY old text editor.
On Sat, Aug 3, 2019 at 4:19 AM Ricky Teachey <ricky@teachey.org> wrote:
as an alternative to a text editor, it's an attractive nuisance. Yes, you can try things out and see the results instantly, AND you can save it, edit, rerun, etc; the cost is that debugging becomes a nightmare.
Remember that Jupyter has a pretty nice text editor (with syntax highlighting! but no autocomplete or discovery... yet), and there's nothing stopping you from teaching a student how to write a .py file in the Jupyter interface and then import it in a cell. And then showing them how they can do the same thing with ANY old text editor.
Of course. But I'm not the primary instructor for most of them - I'm just the TA that they come to when they have questions. So most of the time, they haven't been using any-old-text-editor. Although I didn't know you could import a .py file into a cell. Will have to look into that; maybe that would help them hybridize. ChrisA
On 2019-08-02 06:37, Rhodri James wrote:
On 02/08/2019 06:26, Brendan Barnwell wrote:
It is massively more discoverable, for one simple reason: autocomplete.
In teaching people to program, I often use Jupyter notebook, which has great autocomplete functionality that can also bring up the documentation on any function. You can type itertools.[TAB] and get a list, and then you can scroll down the list looking for a likely function, and when you get to it you can hit Shift-Tab and see the documentation. Certainly other IDEs have similar functionality.
This is a colossal win over having to go the documentation and look through the text for a recipe that is not "addressable" in any way. You can't even link to it, for heaven's sake! The function docs in all the modules have permalinks but the recipes are just unstructured text.
I'd have to challenge that "colossal win". I am very uncomfortable with IDEs that try to do my thinking for me, and I start turning things off on those occasions when I am forced to use them. It would even occur to me to try autocompletion. Reading the documentation is so much easier, and far more likely to point me at what the right answer actually is, rather than just what I think it might be.
Don't get me wrong, I have my own issues with "smart" IDEs. My go-to tool for my own work isn't even Jupyter but this thing called DreamPie (http://www.dreampie.org/), which is old and not really maintained and requires me to keep Python 2.7 and specific libraries installed just to run it, but I do it because I really like the very simple level of interpreter-wrapping interface it provides (which does, however, include autocomplete :-). But the question was "does having the recipes as functions make them more discoverable than having them only as textual recipes in the docs". The answer is yes. It doesn't matter whether you or I or anyone else has a particular preference for or against autocomplete. The fact is that autocomplete exists in widely-used tools and it makes modules and their contents strictly more discoverable (because they remain discoverable in every way they currently are, plus autocomplete). If we're concerned with discoverability it surely makes sense to take that into account. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Fri, Aug 2, 2019 at 10:54 AM Chris Angelico <rosuav@gmail.com> wrote:
Also a bit old-school (it took me many years to learn the value of syntax highlighting), and an educator, and I've seen students start out with Jupyter. As an alternative to the vanilla REPL, I think it's awesome if a little expensive on low-end machines; but as an alternative to a text editor, it's an attractive nuisance.
I agree here, and don't recommend it to newbies -- a bit less for your reasons than that it encourages really horrible software development practices -- cutting and pasting utility functions from one notebook to a new one, and essentially impossible to unit test.. But I do use, and recommend the iPython repl, and completion is really handy there, too :-) However, I"m not sure "you could find it with completion" is really the reason to put something in a module rather than as a recipe. Implementation aside, I think the criteria should be something like: If it's generally useful and not totally trivial (i.e. one line of code), then put it in. If it's a generally useful *pattern*, but individual use cases are likely to need to specialize it a bit, then it should be a recipe. As for "consume" -- I think it very much meets the requirement of non-trivial and generally useful. The other issue here, in the context of the OP's proposal, is that it's less than obvious that that's what a user would want when they want to operate on all the items in a iterable without creating a new iterable. To Guido's point, the way to do this now is: for i in an_iterable: act_on(i) Is simple, obvious and compact, though not "functional" or "using the comprehension syntax". Indeed, I proposed a couple years ago that some sort of "make nothing comprehension" would be good, and the idea was rejected, and I was pointed to that simple for loop as "the one obvious way to do it". At that time no one pointed me to consume, or anything like it, but I still think that: consume(act_on(i) for i in an_iterable) wouldn't be what I recommend, even if consume was built in to itertools. while yes, performant and using comprehension syntax is nice, it's still uglier, less obvious, and less discoverable than the simple for loop. Back to consume() as a built in to itertools -- looking at the recipe, it's good to have that recipe, as it takes advantage of a hidden performance trick in deque -- hardly anyone is going to come up with that on their own! But frankly, given that itertools is written in C anyway, it seems a high performance written-in-C implementation would be a fine idea. It just seem pretty basic to be -- something that should be a building block, rather than a wrapper around what looks like a higher level class. Not that I'm offering to write the code ... But if "someone would need to write the code" is the only stumbling block, maybe I would (or I'd bet someone would step up). -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Getting really OT now, but quickly: On Fri, Aug 2, 2019 at 11:28 AM Chris Angelico <rosuav@gmail.com> wrote:
On Sat, Aug 3, 2019 at 4:19 AM Ricky Teachey <ricky@teachey.org> wrote:
Remember that Jupyter has a pretty nice text editor
Actually, I think it's pretty darn crappy -- at least compared to any "proper" coding editor...
Although I didn't know you could import a .py file into a cell.
I think he means putting: import my_python_file in a cell, like any other python code. And yes, you can do that, and students should, but the fact is that Jupyter encourages people to write al their code in the notebook. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
This has gotten a bit OT, but I’ll roll with it..... But what about `pip install more-itertools`? Hopefully you become
comfortable with that a lot faster than 3 years in.
more-itertools Is kind of a special case (or at least different case), as it’s a collection of handy general purpose utilities, rather than a package designed to address particular problems, like , say requests or pytest (probably the first two packages I suggest newbies install) In fact, itertools itself is a bit odd that way — it’s not clear what types of problems it might address: You want to look In itertools If you need to iterate through something in a non-trivial, but probably common enough that others need to do it too way. Honestly, I often forget to look there when I should. Which brings us back to the OP's question: What is a good way to loop through and iterable and apply an operation to it when you don't want to create a new iterable? I doubt anyone is going to find consume() in itertools (even if it was built in) and know it use it to solve this use case. -CHB
On 2019-08-02 10:11, Andrew Barnert wrote:
Honestly, I think a lot of the resistance is the implementation issue, or at least it’s the reason the resistance is hard to overcome. If someone isn’t sure whether the benefit of having itertools.consume is worth the cost of implementing and maintaining it, the fact that the implementation would have to be in C pushes them hard in the no direction. Especially since consume is at least as important as sample code as usable code, so you’d still need to maintain the pure Python “equivalent” in the docs, as with most of the core itertools functions. (And notice that, unlike a Python module with optional C accelerator; a C module with Python equivalent in the docs can’t easily be unit tested to verify that equivalence—and many itertools functions would fail, which is why the docs say “roughly equivalent”.)
This is perhaps getting away from the main topic, but wait --- earlier in the thread someone said PEP 399 says all modules have to have a pure Python implementation. But now you say everything in itertools MUST be implemented in C? Why is that? Why can't we just put the Python implementations of the recipes as-is directly into itertools? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Sat, Aug 3, 2019 at 4:39 AM Christopher Barker <pythonchb@gmail.com> wrote:
Getting really OT now, but quickly:
Off-list because it really is.
On Fri, Aug 2, 2019 at 11:28 AM Chris Angelico <rosuav@gmail.com> wrote:
On Sat, Aug 3, 2019 at 4:19 AM Ricky Teachey <ricky@teachey.org> wrote:
Remember that Jupyter has a pretty nice text editor
Actually, I think it's pretty darn crappy -- at least compared to any "proper" coding editor...
Although I didn't know you could import a .py file into a cell.
I think he means putting:
import my_python_file
in a cell, like any other python code.
Ahh. I thought that there was another cell type that was a reference to a .py file. You can have a Python cell, a Markdown cell, etc, etc, and I thought maybe it had a "file reference cell". Which would be a cool feature. ChrisA
On 1 Aug 2019, at 19:11, Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote:
This is an interesting phenomenon. I'm not saying it's good or bad, I'm just observing it (because it surprised me). Here is someone declaring that the docs are less accessible than the code. I personally am disappointed, given the amount of effort that we put in those docs. But maybe this is true. If we can't get people to peruse the docs, should we bother?
Personally I use the docs first, then the code. I do this because I expect to get from the docs the API I can depend on. If I find something useful from read the code that is not documented, I use a my own risk. I have worked with engineers that would not look at the source code if the docs are insufficient. For a product I worked on at a small company we had the "is it worth having docs? No one reads them right?" discussion. The argument that won the day was that reasonable docs reduces support costs. Tech support can point the user at the docs and move on to the next problem. QA team could say all documented features work at each release. Barry
I think he means putting:
import my_python_file
in a cell, like any other python code.
Yes, that is all I meant. My super powered text editor experience is limited to Notepad++. Perhaps I have a lot to learn about what makes a good text editor for code. And I'm always willing to learn.
There *is* also the '%load' magic. But 'from my_stuff import *' is almost always a better idea. On Fri, Aug 2, 2019, 2:49 PM Christopher Barker <pythonchb@gmail.com> wrote:
Getting really OT now, but quickly:
On Fri, Aug 2, 2019 at 11:28 AM Chris Angelico <rosuav@gmail.com> wrote:
On Sat, Aug 3, 2019 at 4:19 AM Ricky Teachey <ricky@teachey.org> wrote:
Remember that Jupyter has a pretty nice text editor
Actually, I think it's pretty darn crappy -- at least compared to any "proper" coding editor...
Although I didn't know you could import a .py file into a cell.
I think he means putting:
import my_python_file
in a cell, like any other python code.
And yes, you can do that, and students should, but the fact is that Jupyter encourages people to write al their code in the notebook.
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ANBYE6... Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Aug 01, 2019 at 12:31:48PM -0700, Andrew Barnert via Python-ideas wrote:
Plus, there is another potential argument for adding at least some of the recipes to the module. When I show a novice how to import and use cycle, they they get excited and go off and use it.
Are infinite for-loops that common that people get excited by cycle? That sounds like the "here's a hammer, now every problem is a nail" issue. "itertools.cycle is cool, I must use it as often as possible!" I think I've used it once or twice, but I don't remember why.
When I show them how to copy and paste grouper, they sometimes seem reluctant—I’m encouraging them to put code they don’t understand into their source files.
I expect that the average novice has *vast* amounts of code they don't understand in their source files, some of it borrowed from Stackoverflow or copied and pasted from elsewhere. On the tutor mailing list, debugging by random perturbation until it stops raising an exception is a very common trap for novices, so often they don't understand their own code. I am skeptical that reluctance to use code they don't understand is the reason for their reluctance to copy and paste. In any case, why is "copy and paste three lines of code I don't understand" worse than "add a dependency of a third-party library you download from the Internet of hundreds, maybe thousands, of lines I don't understand"? -- Steven
On Fri, Aug 02, 2019 at 07:52:37AM +1000, Chris Angelico wrote:
Maybe there needs to be a second-tier recommendation, where a list of packages can be given that aren't category killers, but have been given the blessing of the Python devs as "this is a good-quality, well-maintained package, and can be depended on"?
Which of the core devs will have the responsibility for checking that something which is a good-quality, well-maintained, dependable package today remains so a year from now? There are third-party libraries like numpy, nltk etc which are too specialised, big and complex for anyone to duplicate in their own code. But I think that anyone who pip installs more-itertools *solely* to avoid copying and pasting the "grouper" receipe from the docs is doing themselves, and the users of their software, a disservice. It's three lines of code. Adding a third-party dependency of 2000+ sloc to avoid a three liner is not as bad as the Node.js LeftPad debarcle, but it's heading into the same ballpark. (Of course the calculus changes if you are a heavy consumer of iterators, and the extra tools in more-itertools are useful for you.) -- Steven
On Sat, Aug 3, 2019 at 9:48 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Aug 02, 2019 at 07:52:37AM +1000, Chris Angelico wrote:
Maybe there needs to be a second-tier recommendation, where a list of packages can be given that aren't category killers, but have been given the blessing of the Python devs as "this is a good-quality, well-maintained package, and can be depended on"?
Which of the core devs will have the responsibility for checking that something which is a good-quality, well-maintained, dependable package today remains so a year from now?
Nobody. The recommendation would simply be "this is good NOW". Anything can change in the future, and if someone sees something saying "as of 2012, that was a good package", it's up to them to see if it's still good. Of course, any core dev can go tag something and say "still good".
There are third-party libraries like numpy, nltk etc which are too specialised, big and complex for anyone to duplicate in their own code.
And it's "common knowledge" that numpy, ntlk, etc are all well-respected pieces of software, but how is that common knowledge to be acquired?
But I think that anyone who pip installs more-itertools *solely* to avoid copying and pasting the "grouper" receipe from the docs is doing themselves, and the users of their software, a disservice. It's three lines of code. Adding a third-party dependency of 2000+ sloc to avoid a three liner is not as bad as the Node.js LeftPad debarcle, but it's heading into the same ballpark.
This is true, but only because we're talking about a single three-line function...
(Of course the calculus changes if you are a heavy consumer of iterators, and the extra tools in more-itertools are useful for you.)
... and the best way to BECOME a heavy consumer of iterators is to flip through itertools and more-itertools, find that this is way easier than what you were doing previously, and start using more of the functions. So even if you start out by getting more-itertools as an alternative to copying and pasting a three-line function, you may find that you end up using a lot more of it. But there's still a *huge* difference between "import itertools" and "import some-package-nobody-knows-of". I just want to narrow that gap a little by giving recognition to those packages which are dependable and high quality. I did a PyPI search for "itertools" and this was what came up first: https://pypi.org/project/itertools-s/ It must be good, because it's the very first search result, right? I'll just pip-install it and start using it. Or should I look somewhere else for an enhanced itertools? How would I know? With a published list of known-excellent packages, there could be three or four itertoolses, not ten pages of search results, and you could know for sure that each and every one of them is worthy of at least a bit of a look. ChrisA
On Aug 2, 2019, at 11:55, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2019-08-02 10:11, Andrew Barnert wrote: Honestly, I think a lot of the resistance is the implementation issue, or at least it’s the reason the resistance is hard to overcome. If someone isn’t sure whether the benefit of having itertools.consume is worth the cost of implementing and maintaining it, the fact that the implementation would have to be in C pushes them hard in the no direction.
This is perhaps getting away from the main topic, but wait --- earlier in the thread someone said PEP 399 says all modules have to have a pure Python implementation. But now you say everything in itertools MUST be implemented in C? Why is that? Why can't we just put the Python implementations of the recipes as-is directly into itertools?
Because itertools is written purely in C, so there’s nowhere to put the Python implementations of the recipes. PEP 399 says all _new_ modules (that don’t have a good reason for a special dispensation) have to have a pure Python implementation. But it explicitly says that pre-existing modules in the stdlib didn’t need to be rewritten, and most of them haven’t been. (Raymond wrote itertools long before 3.2, even long before the discussion around 3.0 that eventually led to the PEP.) It also says that if someone like the PyPy developers wants to rewrite an existing C module to be a pure-Python module with a C accelerator it should/will probably be accepted, but that only applies if someone does the rewrite and submits it. So, that’s the way forward. You could port the recipes to C and change the docs recipes to be “roughly equivalent” Python code in the help for each function. Or you could port itertools to Python plus C accelerator and then just copy-paste the recipes into the Python part. I suspect the latter would be easier to get accepted, but I have no idea whether it’s more or less work.
On Aug 2, 2019, at 15:45, Steven D'Aprano <steve@pearwood.info> wrote:
Are infinite for-loops that common that people get excited by cycle? That sounds like the "here's a hammer, now every problem is a nail" issue. "itertools.cycle is cool, I must use it as often as possible!"
Well, they’re probably excited that the function they spent the last three hours debugging some fence post error on can be replaced by a single line, not that every program for the rest of their life will be similarly improved. (Once you get the itertools way of doing things, a whole lot of programs for the rest of your life _are_ similarly improved. But I don’t think cycle is enough to get that across. If someone wants to learn more, I usually point them to… I don’t have the links with me, but it’s Generator Tools for Systems Programmers v2, and something that was part of the pitch for yield from that I can’t remember.)
On 2019-08-02 19:16, Andrew Barnert wrote:
PEP 399 says all _new_ modules (that don’t have a good reason for a special dispensation) have to have a pure Python implementation. But it explicitly says that pre-existing modules in the stdlib didn’t need to be rewritten, and most of them haven’t been. (Raymond wrote itertools long before 3.2, even long before the discussion around 3.0 that eventually led to the PEP.) It also says that if someone like the PyPy developers wants to rewrite an existing C module to be a pure-Python module with a C accelerator it should/will probably be accepted, but that only applies if someone does the rewrite and submits it.
So, that’s the way forward. You could port the recipes to C and change the docs recipes to be “roughly equivalent” Python code in the help for each function. Or you could port itertools to Python plus C accelerator and then just copy-paste the recipes into the Python part. I suspect the latter would be easier to get accepted, but I have no idea whether it’s more or less work.
Does "port itertools to Python plus C accelerator" include "write a thin Python wrapper around the existing C"? That is, would it be possible to write a pure-Python wrapper around the existing C itertools (renaming that existing C module to _itertools or whatever), and add the recipes as pure Python in the wrapper but without a C implementation? Or would that not be accepted? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Aug 2, 2019, at 11:34, Christopher Barker <pythonchb@gmail.com> wrote:
The other issue here, in the context of the OP's proposal, is that it's less than obvious that that's what a user would want when they want to operate on all the items in a iterable without creating a new iterable. To Guido's point, the way to do this now is:
for i in an_iterable: act_on(i)
Is simple, obvious and compact, though not "functional" or "using the comprehension syntax".
Where I find consume useful is when I already have an iterator. I have a long chain of functional-style iterable transforms (mostly genexprs and itertools calls) leaving me with a final thing that represents the whole computation, but it’s lazy and I need it strictly evaluated now (generally because it has side effects, or because I don’t know from inspection whether it does). If I don’t already have an iterator, I wouldn’t create one just to pass to consume to run some side effects; I’d write your example the same way Guido suggested. The OP’s only problem is that he created an unnecessary genexpr, and the solution is to just not create the unnecessary genexpr. It’s one of those “Doctor, Doctor, it hurts when I punch myself in the neck” “OK, so don’t punch yourself in the neck” cases. But… when someone really does seem to be looking for consume, even in a place where I wouldn’t use it. I may tell them why I wouldn’t use it, but I’ll still tell them that the function they’re looking for is named consume, and where to find and/or how to write it. In fact, looking back at the message that started this whole side track, that’s exactly what I did. If it had already been in itertools, I’d still have explained why I think his #3 is better than his #2, but the rest would just be one line: “But if you really want that run function, it already exists: `from itertools import consume as run`”. I don’t think that’s a big enough win on its own to be worth someone rewriting itertools so consume can be added. But maybe consume plus grouper plus some of the less trivial recipes (or, more simply, just all of them, to avoid bikeshedding arguments on each one) is.
On Aug 2, 2019, at 19:22, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2019-08-02 19:16, Andrew Barnert wrote:
So, that’s the way forward. You could port the recipes to C and change the docs recipes to be “roughly equivalent” Python code in the help for each function. Or you could port itertools to Python plus C accelerator and then just copy-paste the recipes into the Python part. I suspect the latter would be easier to get accepted, but I have no idea whether it’s more or less work.
Does "port itertools to Python plus C accelerator" include "write a thin Python wrapper around the existing C"?
No. PEP 399 doesn’t say that would be accepted. But all that means is that there’s no PEP saying there should be a presumption in favor of your patch being accepted. i suspect that makes it the same situation as 99% of the other patches on b.p.o. And it would certainly be a lot less work. So if you want to try it and see what Raymond says, I’d say go for it. Sure, it’s possible he’ll say no, or ask some follow up questions, or say it should go back to -ideas or -dev for more discussion, or whatever, but the easiest way to find out (unless he joins this thread) is to just file the bug and add the patch. (By the way, why not just read PEP 399 instead of asking and then relying on the interpretation of some random guy on the internet?)
On Sat, Aug 03, 2019 at 10:02:39AM +1000, Chris Angelico wrote:
On Sat, Aug 3, 2019 at 9:48 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Aug 02, 2019 at 07:52:37AM +1000, Chris Angelico wrote:
Maybe there needs to be a second-tier recommendation, where a list of packages can be given that aren't category killers, but have been given the blessing of the Python devs as "this is a good-quality, well-maintained package, and can be depended on"?
Which of the core devs will have the responsibility for checking that something which is a good-quality, well-maintained, dependable package today remains so a year from now?
Nobody. The recommendation would simply be "this is good NOW".
Sure. And "NOW" means *now* when I'm reading the docs, right? *wink* How is the reader supposed to know whether a recommendation made in 2012 is still valid, or if the state of the art has moved on to something else? Any such recommendation is subject to bit-rot. Doc-rot? If its not actively maintained by somebody, it won't be long before the docs are doing the equivalent of recommending that people use Psyco: https://wiki.python.org/moin/UsefulModules#Platform-Specific Psyco has been unmaintained for more than seven years. Now, something like Psyco is not too bad. At least when you go to install it, you can't help but notice the great big notice saying that its unmaintained. But what happens when a recommended package is still actively maintained, but has been overtaken as "best of breed" by another package? The point is, if the core devs are to take on the responsibility for listing "best of breed" third-party libraries, somebody has to accept the responsibility for actively maintaining that list, or it will rot. [...]
There are third-party libraries like numpy, nltk etc which are too specialised, big and complex for anyone to duplicate in their own code.
And it's "common knowledge" that numpy, ntlk, etc are all well-respected pieces of software, but how is that common knowledge to be acquired?
The same way any common knowledge is acquired: by interacting with people who already have that knowledge, whether they are mentors, trainers, more experienced work-mates, bloggers that you read, people who answer Stackoverflow questions, etc. There's a whole community of people capable of passing on this knowledge, the core devs don't have to take it on too. It isn't as if nobody can find out which third-party libraries to use if they aren't listed in the official Python docs.
It must be good, because it's the very first search result, right?
Learning to have a healthy skepticism of Google and the internet is a necessary part of the gaining of experience. -- Steven
On Sat, Aug 03, 2019 at 03:52:31AM +1000, Chris Angelico wrote:
Also a bit old-school (it took me many years to learn the value of syntax highlighting), and an educator, and I've seen students start out with Jupyter. As an alternative to the vanilla REPL, I think it's awesome [...] But for discoverability, incl tab completion? It's great
*scratches head* Do people forget that the vanilla Python REPL has come with tab completion for nearly 20 years, and on by default for something like seven years? https://hg.python.org/cpython/rev/d5ef330bac50 https://docs.python.org/release/2.0/lib/module-rlcompleter.html (At least on Linux/POSIX systems. I don't know if it works on Windows or Macs.) I've had Linux-using Python programmers tell me that they couldn't imagine not using Jupyter specifically because of tab completion. They weren't even aware that the standard interpreter has it as a feature. -- Steven
On Sat, Aug 3, 2019 at 3:44 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Aug 03, 2019 at 03:52:31AM +1000, Chris Angelico wrote:
Also a bit old-school (it took me many years to learn the value of syntax highlighting), and an educator, and I've seen students start out with Jupyter. As an alternative to the vanilla REPL, I think it's awesome [...] But for discoverability, incl tab completion? It's great
*scratches head*
Yeah, you kinda edited me to having a quite different meaning there. That wasn't what I said, thank you.
Do people forget that the vanilla Python REPL has come with tab completion for nearly 20 years, and on by default for something like seven years?
https://hg.python.org/cpython/rev/d5ef330bac50 https://docs.python.org/release/2.0/lib/module-rlcompleter.html
(At least on Linux/POSIX systems. I don't know if it works on Windows or Macs.)
1) I don't trust it on arbitrary (mostly Windows) systems, so when I'm recommending to other people, I can't be confident of it. 2) Until recently, tab completion conflicted with tab indentation, making the default REPL very annoying. 3) In many terminals, tab completion of an entire module's contents is impractical. It's fine when you already have the beginning of what you're looking for (eg 1.1.as <tab> to get as_integer_ratio), but this thread is about discoverability, and a lot of modules have enough in them that modulename.<tab><tab> is just going to spam your terminal. GUI tools tend to do better at this. I'm not discounting the value of the vanilla REPL's tab completion, but it is not the ultimate in discoverability. ChrisA
On 2019-08-02 22:37, Steven D'Aprano wrote:
On Sat, Aug 03, 2019 at 03:52:31AM +1000, Chris Angelico wrote:
Also a bit old-school (it took me many years to learn the value of syntax highlighting), and an educator, and I've seen students start out with Jupyter. As an alternative to the vanilla REPL, I think it's awesome [...] But for discoverability, incl tab completion? It's great
*scratches head*
Do people forget that the vanilla Python REPL has come with tab completion for nearly 20 years, and on by default for something like seven years?
https://hg.python.org/cpython/rev/d5ef330bac50 https://docs.python.org/release/2.0/lib/module-rlcompleter.html
(At least on Linux/POSIX systems. I don't know if it works on Windows or Macs.)
It doesn't work on Windows. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Fri, Aug 02, 2019 at 11:34:31AM -0700, Christopher Barker wrote:
As for "consume" -- I think it very much meets the requirement of non-trivial and generally useful.
Seems pretty trivial to me: for x in iterable: pass ABout a year ago, there was a thread about introducing a keyword, "do", for those times you want to run something for its side-effects without collecting a list of return results. So instead of generating a list of Nones: [print(x) for x in iterable] you might call the do statement: do print(x) for x in iterable https://mail.python.org/archives/list/python-ideas@python.org/thread/2GQW3Q3... Or if you prefer: do print, iterable If we had such a keyword, None could be special-cased for consuming an iterator: do None, iterable At the time, I thought "That's a great idea!" and placed a do(func, iterable) function in my personal toolbox. (See the above thread for a possible implementation.) And never used it since. So I'm not sure that my experience agrees with you that consume would be generally useful. In my experience, most of the time I want to exhaust an iterator without doing anything with the values produced, I just throw it away: del iterator Actually I don't even bother doing that. If I don't want the values produced by an iterator, I just don't use the iterator. For consume to be useful, the mere act of accessing the items needs to have some necessary side-effect that you rely on. Otherwise why bother to exhaust the iterator, if you can just not use it? This seems like a code smell to me. -- Steven
On Sat, Aug 03, 2019 at 03:56:55PM +1000, Chris Angelico wrote:
On Sat, Aug 3, 2019 at 3:44 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Aug 03, 2019 at 03:52:31AM +1000, Chris Angelico wrote:
Also a bit old-school (it took me many years to learn the value of syntax highlighting), and an educator, and I've seen students start out with Jupyter. As an alternative to the vanilla REPL, I think it's awesome [...] But for discoverability, incl tab completion? It's great
*scratches head*
Yeah, you kinda edited me to having a quite different meaning there. That wasn't what I said, thank you.
Did I? If so, it was completely unintentional, sorry. I've re-read your original, and I don't see the "quite different meaning". I read your code as saying that Jupyter is "great" (better than the vanilla REPL) because it has tab completion. I don't know what other forms of "discoverability" you might be referring to. Nor do I know if Jupyter does tab completion differently (better?) than the built-in REPL.
1) I don't trust it on arbitrary (mostly Windows) systems, so when I'm recommending to other people, I can't be confident of it.
Trust it in what way? That it might eat your hard drive or expose your personal details to the internet?
2) Until recently, tab completion conflicted with tab indentation, making the default REPL very annoying.
True enough, but that's long fixed.
3) In many terminals, tab completion of an entire module's contents is impractical.
Being presented with a hundred options or more is rather intimidating, but rlcompleter.py prompts you first: py> os. Display all 343 possibilities? (y or n) allowing you to back out, and then simulates paging the output. What does Jupyter do if there are 300+ options? -- Steven
On Sat, Aug 3, 2019 at 5:32 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Aug 03, 2019 at 03:56:55PM +1000, Chris Angelico wrote:
On Sat, Aug 3, 2019 at 3:44 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Aug 03, 2019 at 03:52:31AM +1000, Chris Angelico wrote:
Also a bit old-school (it took me many years to learn the value of syntax highlighting), and an educator, and I've seen students start out with Jupyter. As an alternative to the vanilla REPL, I think it's awesome [...] But for discoverability, incl tab completion? It's great
*scratches head*
Yeah, you kinda edited me to having a quite different meaning there. That wasn't what I said, thank you.
Did I? If so, it was completely unintentional, sorry.
Apology accepted, and I was a bit snippy there, sorry.
I've re-read your original, and I don't see the "quite different meaning". I read your code as saying that Jupyter is "great" (better than the vanilla REPL) because it has tab completion.
It's better, and it has tab completion, but that's only one of the many additional features it has. It also has FAR better facilities for recalling blocks of code than the vanilla REPL does; Idle has a different way of doing it, and both of them are streets ahead of "recall one line at a time".
I don't know what other forms of "discoverability" you might be referring to. Nor do I know if Jupyter does tab completion differently (better?) than the built-in REPL.
1) I don't trust it on arbitrary (mostly Windows) systems, so when I'm recommending to other people, I can't be confident of it.
Trust it in what way? That it might eat your hard drive or expose your personal details to the internet?
Trust it to exist - sorry, wasn't clear there. The vanilla REPL simply doesn't do tab completion on all systems.
2) Until recently, tab completion conflicted with tab indentation, making the default REPL very annoying.
True enough, but that's long fixed.
Hmm, maybe the fix didn't propagate out, but I remember it being a problem up until very recently (like last year).
3) In many terminals, tab completion of an entire module's contents is impractical.
Being presented with a hundred options or more is rather intimidating, but rlcompleter.py prompts you first:
py> os. Display all 343 possibilities? (y or n)
allowing you to back out, and then simulates paging the output. What does Jupyter do if there are 300+ options?
Not sure, but ISTR it would let you scroll through them. Not something you can easily do in a plain terminal. Tab completion, as a means of shortening the typing of something you already know about, is great. But for discovering something new, not so much. I'm not sure what IS ideal, though. ChrisA
On Fri, Aug 02, 2019 at 02:16:52PM -0400, Dan Sommers wrote:
I agree, but that's assuming that you "know" what's there and just have to be reminded. IMO, it helps less with discovering new functions, and even less than that when it comes to discovering "new" python modules.
Pressing tab twice after the word import: py> import Display all 492 possibilities? (y or n) CDROM change_type_example locale site CFRAC chardet logcheck1 smtpd DLFCN chunk logcheck2 smtplib [... additional lines trimmed for brevity ...] I had completely forgotten I installed chardet! -- Steven
On Sat, Aug 3, 2019 at 6:16 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Aug 02, 2019 at 02:16:52PM -0400, Dan Sommers wrote:
I agree, but that's assuming that you "know" what's there and just have to be reminded. IMO, it helps less with discovering new functions, and even less than that when it comes to discovering "new" python modules.
Pressing tab twice after the word import:
py> import Display all 492 possibilities? (y or n) CDROM change_type_example locale site CFRAC chardet logcheck1 smtpd DLFCN chunk logcheck2 smtplib [... additional lines trimmed for brevity ...]
I had completely forgotten I installed chardet!
Not in my REPL. What did you do to enable that? ChrisA
On Aug 3, 2019, at 01:04, Chris Angelico <rosuav@gmail.com> wrote:
Not sure, but ISTR it would let you scroll through them. Not something you can easily do in a plain terminal.
IPython manages to get a lot of those same Jupyter Notebook features into a plain terminal—as long as it’s either termios-friendly or the Windows console, but that’s most terminals nowadays. The fact that it’s nearly identical on Windows is especially nice. Also, not going full-screen on POSIX, so it doesn’t fight with iTerm’s scrollback buffer or mouse commands and so on, even when I’m running it on a remote machine over ssh. Anyway, when tab completion is more than a single possibility, it pops up an inverse-colored overlay box that you can navigate through with arrows (or emacs keys), and if there are more than it can fit in that box, it scrolls. There are other terminal-based REPLs that also do scrolling tab completion, like bpython and ptp. One of them (I forget which) even does the IDE thing of automatically popping up autocomplete suggestions when you pause (and removing them if you resume typing normal characters). But they don’t have all those IPython/Jupyter features, which are hard to live without once you get used to them. Also, most of them are curses or otherwise full-screen.
On 8/3/19 4:15 AM, Steven D'Aprano wrote:
On Fri, Aug 02, 2019 at 02:16:52PM -0400, Dan Sommers wrote:
I agree, but that's assuming that you "know" what's there and just have to be reminded. IMO, it helps less with discovering new functions, and even less than that when it comes to discovering "new" python modules.
Pressing tab twice after the word import:
py> import Display all 492 possibilities? (y or n) CDROM change_type_example locale site CFRAC chardet logcheck1 smtpd DLFCN chunk logcheck2 smtplib [... additional lines trimmed for brevity ...]
I had completely forgotten I installed chardet!
I think you're making my point. :-) Yes, in your Python REPL, you can get a list of all 492 importable modules. (Mine doesn't do that; neither does ChrisA's.) I can probably guess what some of those modules do, like locale. If I don't know what SMTP is, then I still don't know which one of those modules to use to send email. I was going to say how much better the online docs at docs.python.org/3/library were, but that didn't work out, either. On that page, the only appearance of the word email is the email module, which then refers to the smptlib module. OTOH, searching the internet for python send email turns up StackOverflow and everyone's and their brother's blogs and tutorials for sending email with Python. And *now* I've discovered the smptlib module. But I still don't really know how to use it. Once I know what SMTP is, and what the smtplib module does, and I've used it a couple of times, then seeing the list of names inside the module might remind me how to use it. So maybe SMTP is an extreme case. I'm sure there are others. How does TAB completion help me to discover hashlib, if all I know is that I want the SHA-3 of some file? Again, I'm not disputing the usefulness of TAB completion; my point is that it's a sub-optimal tool for discoverability.
On 3 Aug 2019, at 11:48, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Aug 3, 2019, at 01:04, Chris Angelico <rosuav@gmail.com> wrote:
Not sure, but ISTR it would let you scroll through them. Not something you can easily do in a plain terminal.
IPython manages to get a lot of those same Jupyter Notebook features into a plain terminal—as long as it’s either termios-friendly or the Windows console, but that’s most terminals nowadays. The fact that it’s nearly identical on Windows is especially nice. Also, not going full-screen on POSIX, so it doesn’t fight with iTerm’s scrollback buffer or mouse commands and so on, even when I’m running it on a remote machine over ssh.
Anyway, when tab completion is more than a single possibility, it pops up an inverse-colored overlay box that you can navigate through with arrows (or emacs keys), and if there are more than it can fit in that box, it scrolls.
There are other terminal-based REPLs that also do scrolling tab completion, like bpython and ptp. One of them (I forget which) even does the IDE thing of automatically popping up autocomplete suggestions when you pause (and removing them if you resume typing normal characters). But they don’t have all those IPython/Jupyter features, which are hard to live without once you get used to them. Also, most of them are curses or otherwise full-screen.
You might be thinking of ptpython which is basically what modern ipython is based on nowadays. /Anders
Steven D'Aprano writes:
Do people forget that the vanilla Python REPL has come with tab completion for nearly 20 years, and on by default for something like seven years?
(At least on Linux/POSIX systems. I don't know if it works on Windows or Macs.)
Tab completion WFM on Mac, but it won't complete modules the you described for import. WFM on Linux, but not the way you described for import. Which Linux distro (mine's Debian)? Do you have some option in your site init? Are you using GNU readline or some other completion library? Windows, you got me, but Windows look-and-feel always looks and feels uncomfortable to me, so I bet it doesn't work as I'd expect. ;-)
Steven D'Aprano writes:
Which of the core devs will have the responsibility for checking that something which is a good-quality, well-maintained, dependable package today remains so a year from now?
All of them and none of them, as usual. Also, since it's not the CPython implementation, I suspect that non-core developers would be given triage privileges here, as they are on the tracker. The more interesting question is how those with privileges will prioritize that task. The answer, it seems to me, is that (a) it will be nowhere near as high as the ideal priority (everything is priority #1!), and (b) it will be surprisingly high to those of us who hate curating other people's work. As far as making this useful and keeping it somewhat correlated with "up to date", I would say maybe apply to PSF (and/or Google SoC et al) to fund bots to check - recent (past year) downloads vs. long term (past five years average) - has any listed maintainer been active on github recently? - who's been active in this package? - how recently has there been activity in this package? - what are the current (say last year) average rates of issue arrival vs average time to fix vs average time to package release containing the fix? and similar statistics. With a little more work (from higher-cost personnel, I suspect) we could get expert opinions on "high-quality" and "well-maintained" and look for correlations with the statistics we can compute from VCS repo activity and PyPI activity.
But I think that anyone who pip installs more-itertools *solely* to avoid copying and pasting the "grouper" receipe from the docs is doing themselves, and the users of their software, a disservice.
Of course, if you have it installed locally, you won't have any invisible garbage Unicode characters or grit-like dumbquotes in your copy-paste source, which is a plus, YMMV. Whether you'd be more likely to remember it's in more-itertools or in xyz.py in an archived workspace of your own the next time you want to use it is a question I don't know the answer to, even for myself.
(Of course the calculus changes if you are a heavy consumer of iterators,
Who isn't? Most of my real problems that are worth writing code for involve iterables, although many are strings or lists or tuples or sets or dicts or views or more specialized containers, not true iterators. But it's helpful if I don't need to care whether it's a sequence or an iterator or some more exotic iterable.
and the extra tools in more-itertools are useful for you.)
That depends on the algorithms you use to process those iterables, of course. In my case for loops, comprehensions, and the occasional generator expression account for the vast majority of iteration, which tends to support your case. On the other hand, sometimes I want something more elegant or performant or I'm not clear that the naive algorithm I think up myself actually handles all the cases I'm going to run into. I really don't think there's much evidence one way or the other about whether in general importing a module for one function is a bad idea, at least when you're programming mostly for personal consumption. And I don't see how "pip install more-itertools" hurts at all, except to make it look like you chose an insanely edgy edge case. It's true that we now know that "grouper" is a three line function, but in many cases the three-line function calls some other function, which might be in yet another module (hopefully in the same package!) In such cases, you may find yourself doing a fair amount of extra work tracking down all the dependencies. Granted, this refactoring may be well worthwhile in the case of a codebase distributed to third parties, but rarely so for personal use. Steve
On 3 Aug 2019, at 12:18, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
Once I know what SMTP is, and what the smtplib module does, and I've used it a couple of times, then seeing the list of names inside the module might remind me how to use it. So maybe SMTP is an extreme case. I'm sure there are others.
How does TAB completion help me to discover hashlib, if all I know is that I want the SHA-3 of some file?
Again, I'm not disputing the usefulness of TAB completion; my point is that it's a sub-optimal tool for discoverability
It seems like there is a nugget of an idea of how to make a truly awesome autocomplete in your example. What if if you type email + tab and there is no symbol it said "email is sent with the smtp module"? That would be pretty great! / Anders
This sounds like a perfect opportunity to prove a third party module could be useful. I'm not sure how to configure tab completion callbacks in every environment like ipython, Jupyter, Python shell, PyCharm, VS Code, vim, or whatever. But without getting to that step (which is definitely possible with the right incantations for each), you could definitely create this and publish it on PyPI:
from howto import q q('email') email can be sent with the `smtp` module email messages can be created and manipulated with the `email` module email can be retrieved with the `pop3` and `smtp` modules ...
The actual dictionary or search tool that finds relevant messages is 100x more work than wiring in the tab-completion hook to some particular environment. On Sun, Aug 4, 2019, 1:15 PM Anders Hovmöller <boxed@killingar.net> wrote:
On 3 Aug 2019, at 12:18, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
Once I know what SMTP is, and what the smtplib module does, and I've used it a couple of times, then seeing the list of names inside the module might remind me how to use it. So maybe SMTP is an extreme case. I'm sure there are others.
How does TAB completion help me to discover hashlib, if all I know is that I want the SHA-3 of some file?
Again, I'm not disputing the usefulness of TAB completion; my point is that it's a sub-optimal tool for discoverability
It seems like there is a nugget of an idea of how to make a truly awesome autocomplete in your example. What if if you type email + tab and there is no symbol it said "email is sent with the smtp module"? That would be pretty great!
/ Anders _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GYT5DG... Code of Conduct: http://python.org/psf/codeofconduct/
On 8/4/19 1:13 PM, Anders Hovmöller wrote:
On 3 Aug 2019, at 12:18, Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
Once I know what SMTP is, and what the smtplib module does, and I've used it a couple of times, then seeing the list of names inside the module might remind me how to use it. So maybe SMTP is an extreme case. I'm sure there are others.
How does TAB completion help me to discover hashlib, if all I know is that I want the SHA-3 of some file?
Again, I'm not disputing the usefulness of TAB completion; my point is that it's a sub-optimal tool for discoverability
It seems like there is a nugget of an idea of how to make a truly awesome autocomplete in your example. What if if you type email + tab and there is no symbol it said "email is sent with the smtp module"? That would be pretty great!
It already works like that: I type "python send email" + Enter (which is a larger target than Tab, so it's easier to use), and one of those World Wide Web Search Engine thingies points me to Python documentation, an Stack Overflow post, or some random blog or other, at least one of which is bound not to be too outdated. I guess it all depends on how you define "autocomplete" and "IDE." :-/
Guido van Rossum wrote:
This is an interesting phenomenon. I'm not saying it's good or bad, I'm just observing it (because it surprised me). Here is someone declaring that the docs are less accessible than the code. I personally am disappointed, given the amount of effort that we put in those docs. But maybe this is true. If we can't get people to peruse the docs, should we bother?
The docs are very well written and of good quality, no doubt, but that doesn't necessarily mean they are accessible. Searching for "python docs" on Google points me to https://docs.python.org/. So far so good, but the landing page doesn't seem very newcomer friendly. It contains a few (technical) topics with links on the left side of the screen while the right side is completely empty; then in the top right corner there is this tiny search bar, which more than easy to overlook. It reminds me of the old PyPI website (https://web.archive.org/web/20170822160042/https://pypi.python.org/pypi) which also had a tiny search bar. The new PyPI design on the other hand is much better: a huge search bar right in the center of the page, impossible to miss. This also suggests to the visitor _"this is a place where you can search for (and, more importantly, find) answers to your questions"_. I think this is what especially newcomers would like to do; being new to something you probably don't feel comfortable clicking and scrolling through dozens of articles in the hope that you might find what you're looking for. Instead you want a search engine to do the job of crawling and just present you with the best results. So I could imagine, lots of users, even if they land on the docs page, just turn to their search engine instead and then they land on Stackoverflow and the like, but not the Python docs. Hence the docs landing page should probably have a much more enlarged and centered search bar, since searching is a central interaction of users with documentation. Also the various listed categories probably don't sound very newcomer friendly. For example "Library Reference (keep this under your pillow)", how would a newcomer who wants to work with email in Python know that this is the place to look for solutions? These topic headings are good if you're already familiar with Python and/or the docs, but to be attractive for new users they should be more explicit (and explicit is better than implicit). Then other topics like "Extending and Embedding" are not newcomer relevant at all and thus might have a discouraging effect. So a reordering of topics from general to specific, newcomer to expert, and aligned top to bottom could help as well (experienced users will know where to look for their answers anyway). Another aspect is in which way users are pointed to the Python docs from elsewhere (apart from search engines). For example the search bar at https://www.python.org/ seems not to search the docs but other documents like PEPs. Then in the interpreter if people use `help` to search a module for example they're pointed to the online docs (e.g. `help('abc')` mentions https://docs.python.org/3.7/library/abc). Maybe it would a good idea to also provide this online reference for builtin types and functions: so `help(list)` could include a link to https://docs.python.org/3/library/stdtypes.html#lists or https://docs.python.org/3/tutorial/datastructures.html#more-on-lists; especially since if I search for "list" in the Python docs (https://docs.python.org/3/search.html?q=list&check_keywords=yes&area=default) I get all sorts of results but `list` relevant results are at positions 10+ and under different names such as "Data Structures" and "Built-in Types". Similarly `help(sorted)` could point to https://docs.python.org/3/library/functions.html#sorted. This would encourage users to consider the online docs, and once they've appreciated the docs as a helpful resource they're more likely to come back on their own. Then, on a side note, the Python Wiki (https://wiki.python.org/moin/) also spreads some kind of nostalgic atmosphere given how it looks. I could imagine that especially the younger generations feel more attracted by more modern looking websites (for example https://realpython.com/) and also they might (falsely) assume that the content is not completely up-to-date either. So long story short, I think the content of the Python docs is of excellent quality but the way the docs present themselves (are presented) to the outside world, and especially to newcomers, could be improved in order to have a better impact / acceptance.
On Sat, Aug 03, 2019 at 06:56:16PM +1000, Chris Angelico wrote:
On Sat, Aug 3, 2019 at 6:16 PM Steven D'Aprano <steve@pearwood.info> wrote:
py> import Display all 492 possibilities? (y or n) CDROM change_type_example locale site CFRAC chardet logcheck1 smtpd DLFCN chunk logcheck2 smtplib [... additional lines trimmed for brevity ...]
I had completely forgotten I installed chardet!
Not in my REPL. What did you do to enable that?
Oops! Ha ha, chardet is not the only thing I had forgotten. I have my own custom tab completer installed which completes on import lines as well as file names inside strings. -- Steven
On Aug 6, 2019, at 21:40, Steven D'Aprano <steve@pearwood.info> wrote:
Oops! Ha ha, chardet is not the only thing I had forgotten. I have my own custom tab completer installed which completes on import lines as well as file names inside strings.
I used to have one of those, but nowadays I just rely on IPython’s, which completes filenames. But it doesn’t have quite the same features. On the one hand, it has Jedi awesomeness, which is hard to live without once you get used to it. (And it’s especially fun when the local C++ or Go fanatic tells you that Python sucks because there’s no way it could autocomplete some example even in an IDE, and you can just show them Python autocompleting their example right there in the REPL.) On the other hand, it doesn’t have little rules that I need once every few months and forget I no longer have, like retrying a missing directory after removing all non-escape-sequence backslashes, so you can paste something with backslash-escaped spaces from bash.
participants (23)
-
Anders Hovmöller
-
Andrew Barnert
-
Barry Scott
-
Brendan Barnwell
-
Brett Cannon
-
Chris Angelico
-
Christopher Barker
-
Dan Sommers
-
David Mertz
-
Dominik Vilsmeier
-
Eli Berkowitz
-
Eric V. Smith
-
Guido van Rossum
-
Josh Rosenberg
-
Kyle Stanley
-
Paul Moore
-
Rhodri James
-
Ricky Teachey
-
Rob Cliffe
-
Stefan Krah
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Tim Peters