On Fri, Dec 19, 2014 at 5:36 PM, Nick Coghlan <ncoghlan(a)gmail.com> wrote:
> On 20 December 2014 at 10:55, Guido van Rossum <guido(a)python.org> wrote:
>> A few months ago we had a long discussion about type hinting. I've
>> thought a lot more about this. I've written up what I think is a decent
>> "theory" document -- writing it down like this certainly helped *me* get a
>> lot of clarity about some of the important issues.
> This looks like a great direction to me. While I know it's not the primary
> purpose, a multidispatch library built on top of it could potentially do
> wonders for cleaning up some of the ugliness in the URL parsing libraries
> (which have quite a few of those "all str, or all bytes, but not a mixture"
> style interfaces).
Right. Unfortunately we don't have a great proposal for a multidispatch
*syntax*. Here are the docs for mypy's @overload decorator
http://mypy.readthedocs.org/en/latest/function_overloading.html but the
implementation requires using peeking into the call frame's locals via
sys._getframe(). (Don't be distracted by the fact that mypy currently
doesn't record the parameter value for parameterized types, e.g.
Sequence[int] -- I have a simple enough solution for that, but it doesn't
help the overload syntax.)
The requirements for an overloading syntax are complex; it must be possible
to implement it efficiently for simple cases, but it must also support
complex overloading schemes (not everybody needs to dispatch just on str
vs. bytes :-), it must diagnose ambiguities clearly (e.g. showing near
misses if no exact match can be found, and showing what went wrong if
multiple matches apply). It must also be easy to parse for a static checker
--Guido van Rossum (python.org/~guido)
In context to building a PEP or similar, I don't know how many times I've
trawled looking for:
* Docs links
* Source links
* Patch links
* THREAD POST LINKS
A tool to crawl structued and natural language data from the forums could
be very useful for preparing PEPs.
Currently, BoundArguments stores the results of signature.bind (and
bind_partial) in an OrderedDict. Given that there is no way to retrieve
the order in which keyword arguments were passed, it is not surprising that
the dict ordering is based on the order of parameters in the signature, not
the order in which arguments are passed (although this point could
certainly be made clearer in the docs). But this also means that that
ordering could easily be retrieved by iterating over the parameters
("[(name, ba.arguments[name]) for name in ba.signature.parameters if name
in ba.arguments]"), and signature.bind now has to pay for the cost of using
a Python-implemented OrderedDict.
Obviously, one solution would be to switch to a C-implemented OrderedDict,
but another simpler one would simply to make BoundArguments.arguments a
regular dict rather than an OrderedDict (which is a one-line patch,
changing the definition of "arguments" in Signature._bind). A simple test
on the example below show approximately a two-fold faster "bind".
from inspect import signature
s = signature(lambda x, y=1, *, z: None)
for _ in range(30000):
s.bind(1, 3, z=2)
s.bind(x=1, z=2, y=3)
except TypeError: pass
except TypeError: pass
try: s.bind(1, y=2)
except TypeError: pass
Somehow during the discussion of PEP 479 it was concluded that
generators will become distinct from iterators. A few people have
pointed out that this is a harmful distinction (complicates the
language hard to explain etc). I think it is simply an invalid
distinction and should be abolished.
PEP 479 proposes to change the way that generators handle
StopIteration fall-through by default. However the iterator protocol
has never formally specified that an iterator should allow
StopIteration to fall through to a grandparent consumer. I considered
the fact that StopIteration fall-through was possible to be a
convenient design feature but it was never a required part of the
definition of an iterator.
Many iterators perform other kinds of exception handling without
allowing that to fall through and many iterators catch StopIteration
from child iterators without reraising it. This has never lead anyone
to suggest that such iterators are not true iterators in any way.
Under PEP 479 generators will handle a StopIteration arising from the
executing frame differently. The generator will still be an object
with __iter__ and __next__ that exposes StopIteration at the
appropriate time to its parent iterator-consumer. In other words it
will still satisfy the definition of an iterator.
The PEP says:
Under this proposal, generators and iterators would be distinct, but
related, concepts. Like the mixing of text and bytes in Python 2, the
mixing of generators and iterators has resulted in certain perceived
conveniences, but proper separation will make bugs more visible.
This is just plain false. Under the proposal generators will still be
iterators. Mixing generators and iterators is nothing like mixing text
and bytes and never has been. Mixing iterators and iterables is a bit
like mixing text and the bytes in the sense that it can seem to work
but then sometimes silently do the wrong thing. Mixing generators and
iterators is like mixing sets and containers: there is no mixing since
it is simply a false distinction.
AFAICT from reading through the discussions this idea has (implicitly)
followed from the following fallacious argument:
1) Bare next creates a problem for generators.
2) Therefore we fix it by changing generators.
3) Therefore generators are not iterators any more.
Anyone with experience in propositional logic can see multiple
fallacies in that argument but actually the biggest mistake is simply
in the opening premise: bare next is a problem for all iterators.
Generators are affected precisely because they are iterators.
Generators were introduced as "a kind of Python iterator, but of an
especially powerful kind" (PEP 255). The coroutine idea has since
turned generators into something of a Frankenstein concept but the
fundamental fact that they are iterators remains unchanged. And
regardless of how much discussion coroutines generate the fact remains
that 99% of generators are used purely for iteration.
I propose to abolish this notion that generators are not iterators and
to amend the text of the PEP to unambiguously state that generators
are iterators regardless of any changes to the way they propagate
StopIteration from the executing frame.
While namedtuples have a good reason to not support field names starting
with an underscore (namely, to prevent name collision), it would be nice to
still be able to define some sort of "private" attribute using (manual)
double-underscore name mangling, i.e. allow "namedtuple('Foo', 'bar baz
_Foo__qux')". It used to be possible to bypass the validation by passing a
carefully crafted subclass of str to the factory function, but this is no
longer an option after the fix for issue 21832 (which, by the way, seems a
complete overkill - there are a lot of other casts-to-str in the stdlib
which do not check that the result is exactly of type str (e.g.,
ConfigParser.read_dict), and if I want to execute untrusted code by passing
a subclass of str I can always put that code in the __str__ method of the
" exception StopIteration
Raised by built-in function next() and an iterator‘s __next__()
method to signal that there are no further items produced by the iterator."
I have always taken this to mean that these are the only functions that
should raise StopIteration, but other have not, and indeed,
StopIteration in generator functions, which are not __next__ methods,
has been accepted and passed on by generator.__next__. PEP 479 reverses
this acceptance by having generator.__next turn StopIteration raised in
a user-written generator function body into a RuntimeError. I propose
that other builtin iterator.__next__ methods that execute a passed in
function do the same.
This proposal comes from Oscar Benjamin's comments on the PEP in the
'Generator are iterators' thread. In one post, he gave this example.
>>> def func(x):
... if x < 0:
... raise StopIteration
... return x ** 2
>>> it = map(func, [1, 2, 3, -1, 2])
[1, 4, 9]
>>> list(it) # map continues to yield values...
Function func violates what I think should be the guideline. When
passed to map, the result is a silently buggy iterator. The doc says
that map will "Return an iterator that applies function to every item of
iterable, yielding the results.". If map cannot do that, because func
raises for some item in iterable, map should also raise, but the
exception should be something other than StopIteration, which signals
normal non-buggy completion of its task.
I propose that map.__next__ convert StopIteration raised by func to
RuntimeError, just as generator.__next__ now does for StopIteration
raised by executing a generator function frame. (And same for filter().)
"Simplified version allowing just one input iterable."
def __init__(self, func, iterable):
self.func = func
self.argit = iter(iterable)
func = self.func
args = next(self.argit) # pass on expected StopIteration
if func is None:
try: # new wrapper
raise RuntimeError('func raised StopIteration')
Terry Jan Reedy
I've been away from Python-related mailing lists for almost a year now
and only a few days ago started following them again just in time to
see the "hopefully final" text of PEP 479 posted on python-dev. I had
been intending to just lurk around for a while but felt compelled to
post something after seeing that. It's taken a while for me to find
the time to read through what's already been written but I see that
PEP 479 is apparently a done deal so I won't try to argue with it
(except to register a quick -1 here).
Somehow the discussion of PEP 479 became obsessed with the idea that
leaked StopIteration is a problem for generators when it is a problem
for all iterators. I haven't seen this pointed out yet so I'll
demonstrate that map() is susceptible to the same problem:
$ cat tmp.py
people = [
['John Cleese', 1, 0, 1],
['Michael Palin', 123, 123],
, # Whoops!
['Terry Gilliam', 12, False, ''],
for name in map(first_name, people):
$ python3.4 tmp.py
(i.e. Terry was not printed and no error was raised.)
There's nothing hacky about the use of map above: the mistake is just
the bare next call. The same thing happens with filter, takewhile,
etc. Essentially any of the itertools style functions that takes a
user-defined function allows StopIteration to pass from the user
function to the parent iterator-consumer. I believe this is by design
since apparently the author Raymond Hettinger (like me) considered
StopIteration fall-through a deliberate design feature of the iterator
Fixing this problem so that a leaked StopIteration turns into a loud
error message has been deemed important enough that a partial fix
(applying only to generators) warrants breaking the backward
compatibility of the core language in a minor release. So what should
happen with all the other places that are susceptible? Is
StopIteration fall-through to be considered an anti-pattern that
anyone implementing the iterator protocol should avoid?
With or without PEP 479 the root of the problem is simply in the
careless use of next(). The PEP proposes to make this easier to track
down but once located the problem will be an unguarded next call that
needs to be fixed. (It will also force people to "fix" other things
like "raise StopIteration" but these were not actually problematic
Clearly people want a function like next() that isn't susceptible to
this problem or they wouldn't use next in this way and the problem
wouldn't exist. So I propose a new function called take() with the
def take(iterator, n=None):
if n is None:
return tuple(take(iterator) for _ in range(n))
The idea is that take(iterator) is the generic way to get the next
item from the iterator and assert that the item should exist. When you
use take(iterator) your intention that the item should exist is self
documenting whereas a bare next() is ambiguous without a comment:
x = next(iterator) # Never raises StopIteration
x = next(iterator) # Propagate StopIteration
x = next(iterator) # Haven't considered StopIteration
This gives users a clear and un-ugly fix to any code that uses next
inappropriately: s/next/take so that there is no excuse for not fixing
that code to:
x = take(iterator) # Either I get an item or a proper Error is raised.
Similarly take(iterator, n) is like islice except that it immediately
advances the iterator and raises if the required number of items was
not found. Essentially this is a safer version of:
firstn = [next(iterator) for _ in range(n)] # Leaks StopIteration
firstn = tuple(next(iterator) for _ in range(n)) # Terminates silently
firstn = list(islice(iterator, n)) # Terminates silently
(Actually the second example would raise RuntimeError with PEP 479)
With take it becomes:
firstn = take(iterator, n) # n items returned or an Error
On Thu Dec 11 2014 at 11:48:22 AM Giampaolo Rodola' <g.rodola(a)gmail.com>
> On Wed, Dec 10, 2014 at 5:59 PM, Bruno Cauet <brunocauet(a)gmail.com> wrote:
>> Hi all,
>> Last year a survey was conducted on python 2 and 3 usage.
>> Here is the 2014 edition, slightly updated (from 9 to 11 questions).
>> It should not take you more than 1 minute to fill. I would be pleased if
>> you took that time.
It would be really nice to complement this survey with information gathered
from PyPI. I think the python version of the client is not being recorded
I think this could be very useful not only for the Python devs and the
community, but particularly for package maintainers. In this way we could
find out not only the number of times a given package was downloaded, but
also for which python version.
I understand that this information might not be entirely accurate (i.e.
people downloading the file from the browser or from github) or using other
repositories (like the ones provided by anaconda), but still I think it
could be useful. And with pip becoming the standard tool, it might be a
good moment to do it.
I've written a tail call optimization lib for python3.
It works fine.
I use a custom return, which throws the next function arguments as an
exception and a decorator, which handles the exception.
One strength of the functools.lru_cache lies in caching results of calls
initiated by the function itself (i.e. recursive call results).
However because of the exception, the intermediate results from the tail
recursion don't end up in the cache, if my tail call optimization is
used together with lru_cache.
So I would like to be able to manually add argument-result pairs in the
cache. A manual lookup is not needed for my purpose, but I propose that
there should be methods to
1. add argument-result pairs to the cache
2. lookup if there is a result for given arguments
3. lookup the result for given arguments if it exists (exception thrown
4. invalidate specific cache entries (at the moment you can only
invalidate the cache as a whole through func.cache_clear())
What do you think?
http://bugs.python.org/issue14551 discusses the deprecation and removal of
imp.load_source. In the discussion, there seems to be a general consensus
that (1) a convenience function for loading modules from an arbitrary file
is a good thing and (2) that many people use it.
I can only concur, the need for loading a module from an arbitrary file
without having to worry about sys.path or __init__.py is useful for
supporting plugin-like extensibility/configurability and I personally use
imp.load_source in a couple of projects.
However, the discussions in http://bugs.python.org/issue14551 seemed to
have died off without any clear decision for action.
Can we consider adding such a convenience function to importlib in Python 3?