
Generators are a wonderful feature of the python language, and one of its best idea. They are initially very intuitive to understand & easy to use. However, once you get beyond that; they are actually quite confusing because their behavior is not natural. Thus they have a initial easy learning & acceptance curve; and then as you go from initial use to more advanced use, there is a sudden "bump" in the learning curve, which is not as smooth as it could be. Specifically, the fact that when you call them, they do not actually call your code (but instead call a wrapper) is quite confusing. Example: import __main__ def read_and_prefix_each_line(path): with open(path) as f: data = f.read() for s in data.splitlines(): yield '!' + s def print_prefixed_file(path): reader = read_and_prefix_each_line(path) #LINE 12 print('Here is how %r looks prefixed' % path) for s in reader: #LINE 16 print(s) print_prefixed_file(__main__.__file__) print_prefixed_file('nonexistent') Will produce the following: Traceback (most recent call last): File "x.py", line 20, in <module> print_prefixed_file('nonexistent') File "x.py", line 16, in print_prefixed_file for s in reader: File "x.py", line 5, in read_and_prefix_each_line with open(path) as f: IOError: [Errno 2] No such file or directory: 'nonexistent' This is quite confusing to a person who has been using generators for a month, and thinks they understand them. WHY is the traceback happening at line 16 instead of at line 12, when the function is called? It is much more intuitive, and natural [to a beginner], to expect the failure to open the file "nonexistent" to happen at line 12, not line 16. So, now, the user, while trying to figure out a bug, has to learn that: - NO, calling a generator (which looks like a function) does not actually call the body of the function (as the user defined it) - Instead it calls some generator wrapper. - And finally the first time the __next__ method of the wrapper is called, the body of the function (as the user defined it) gets called. And this learning curve is quite steep. It is made even harder by the following: >>> def f(): yield 1 ... >>> f <function f at 0x7f748507c5f0> So the user is now convinced that 'f' really is a function. Further investigation makes it even more confusing: >>> f() <generator object f at 0x7f74850716e0> At this point, the user starts to suspect that something is kind of unusual about 'yield' keyword. Eventually, after a while, the user starts to figure this out:
def f(): print('f started'); yield 1 ... >>> f() <generator object f at 0x7f3a0baf5fc0> >>> f().__next__() f started 1 >>>
And eventually after reading https://docs.python.org/3/reference/datamodel.html the following sentence: "The following flag bits are defined for co_flags: bit 0x04 is set if the function uses the *arguments syntax to accept an arbitrary number of positional arguments; bit 0x08 is set if the function uses the **keywords syntax to accept arbitrary keyword arguments; bit 0x20 is set if the function is a generator." Finally figures it out: >>> def f(): yield 1 ... >>> f <function f at 0x7f73f38089d8> >>> f.__code__.co_flags & 0x20 32 My point is ... this learning process is overly confusing to a new user & not a smooth learning curve. Here is, just a quick initial proposal on how to fix it: >>> def f(): yield 1 >>> ... Syntax Error: the 'yield' keyword can only be used in a generator; please be sure to use @generator before the definition of the generator >>> @generator ... def f(): yield 1 ... >>> f <generator f at 0x7f73f38089d8> Just the fact it says '<generator f at ...' instead of '<function f at ...' would be a *BIG* help to starting users. This would also really drive home, from the start, the idea to the user, that: - A generator is special, is not a function - Calling the generator does not call the generator, but its wrapper. (the generator is not called until its __next_ method is called) Next: 1. Don't make this the default; but require: from __future__ import generator to activate this feature (for the next few releases of python) 2. Regarding contexts, *REQUIRE* an argument to generator that tells it how to have the generator interact with contexts. I.E.: Something like: @generator(catpure_context_at_start = true) def f(): yield 1 With three options: (a) capture context at start; (b) capture context on first call to __next__; (c) don't capture context at all, but have it work the natural way, you expressed two emails ago (i.e.: each time use the context of the caller, not a special context for the generator). Finally if a user attempts to user a generator with contexts but without one of these three parameters, throw a syntax error & tell the user the context usage must be specified. The major point of all this, is make the learning curve easier for users, so generators are: - Intuitive, easy to pick up & quick use (as they currently are) - Intuitive easy to pick up & quick to use, as you go from a beginner to a medium level user (making it easier to learn their specific in's & out's) - Intuitive, easy to pick up & quick to use, as you go from medium user to advanced user (and need to make them interact in different way with contexts, etc). On Sun, Oct 15, 2017 at 9:33 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 15 October 2017 at 13:51, Amit Green <amit.mixie@gmail.com> wrote:
Once again, I think Paul Moore gets to the heart of the issue.
Generators are simply confusing & async even more so.
Per my earlier email, the fact that generators look like functions, but are not functions, is at the root of the confusion.
I don't agree. I don't find generators *at all* confusing. They are a very natural way of expressing things, as has been proven by how popular they are in the Python community.
I don't *personally* understand async, but I'm currently willing to reserve judgement until I've been in a situation where it would be useful, and therefore needed to learn it.
Paul