On Mon, 18 Nov 2019 at 08:42, Paul Moore <p.f.moore@gmail.com> wrote:
On Sun, 17 Nov 2019 at 19:18, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Ultimately the problem is that the requirements on a context manager are not clearly spelled out. The with statement gives context manager authors a strong guarantee that if __enter__ returns successfully then __exit__ will be called at some point later. There needs to be a reverse requirement on context manager authors to guarantee that it is not necessary to call __exit__ whenever __enter__ has not been called. With the protocol requirements specified in both directions it would be easy to make utilities like nested for combining context managers in different ways.
The context here has been lost - I've searched the thread and I can't find a proper explanation of how open() "misbehaves" in any way that seems to relate to this statement (I don't actually see any real explanation of any problem with open() to be honest). There's some stuff about what happens if open() itself fails, but I don't see how that results in a problem (as opposed to something like a subtle application error because the writer didn't realise this could happen).
Can someone restate the problem please?
Sorry Paul! I think a small number of us were following a sub-thread here where we understood what we were talking about but it wasn't clearly spelt out anywhere. I introduced the word "misbehave" so I'll clarify what I meant. First I'll describe all of the background: Python 2.5 introduced the with statement from PEP 343 and made file objects into context managers by adding __enter__ and __exit__ methods. This means that the object returned by open can be used in a with statement like with open(filename) as fin: ... The contextlib module was also added in Python 2.5 and included a useful utility called nested: https://docs.python.org/2.7/library/contextlib.html#contextlib.nested The idea with nested is that you could flatten nested with statements so with mgr1: with mgr2: ... can be rewritten as with nested(mgr1, mgr2): ... This means that you don't have so much indentation and since nested takes *args you can use an arbitrary number of context managers. This was deprecated essentially because it leads to this construction with nested(open(file1), open(file2)) as (f1, f2): ... Here before nested is called its arguments are prepared from left to right so first file1 is opened and then file2 is opened and then both are passed to nested. If an exception is raised while attempting to open file2 then the file object returned for file1 doesn't get passed to nested and doesn't get used in any with statement so its __enter__ and __exit__ methods are never called. In this simple example the file object will probably be closed by __del__ but a significant part of the point of context managers is that we don't want to rely on __del__ in general. Also forms that are otherwise equivalent won't necessarily lead to __del__ being called e.g.: f1 = open(file1) f2 = open(file2) with nested(f1, f2): ... Since this "deficiency" of nested is about an exception that is raised before nested is even called it clearly wasn't possible to solve this problem by improving nested itself. So Python 2.6 introduced the multiple with statement: with open(file1) as f1, open(file2) as f2: ... Since this is now built in to the with statement rather than using a function it is possible to evaluate things in a different order so e.g. f1.__enter__ here is called before open(file2) which wouldn't be possible with a utility function like nested. Most importantly f1.__exit__ will be called if open(file2) raises which solves the main problem with nested. Then the nested function was deprecated in Python 2.7 and at some point removed altogether. The multiple with statement has problems as well though. One problem is the syntax limitation which is the subject of the OP in this thread. The other is the inability to take an arbitrary number of context managers as nested could with *args. Alternatives to nested can not be used as cleanly though if they are expected to meet this requirement that they should do the right thing with exceptions raised while creating the arguments (before the function is called!). With that constraint in mind it isn't possible to have any utility for multiple with statements that receives more than one context manager at a time. Hence exit stack can be used as it creates an object that only receives context managers one at a time: https://docs.python.org/3/library/contextlib.html#contextlib.ExitStack The example given in the docs there explicitly includes open to show you the kind of problem it is designed to solve: with ExitStack() as stack: files = [stack.enter_context(open(fname)) for fname in filenames] # All opened files will automatically be closed at the end of # the with statement, even if attempts to open files later # in the list raise an exception To me that seems clumsy and awkward compared to nested though: with nested(*map(open, filenames)) as files: ... Ideally I would design nested to take an iterable rather than *args and then it would be fine to do e.g. with nested(open(filename) for filename in filenames) as files: ... Here nested could take advantage of the delayed evaluation in the generator expression to invoke the __enter__ methods and call __exit__ on the opened files if any of the open calls fails. This would also leave a "trap" though since using a list comprehension would suffer the same problem as if nested took *args: with nested([open(filename) for filename in filenames]) as files: ... That's the background so what is it that we are discussing in this subthread? I am proposing the root of the problem here is the fact that open acquires its resource (the opened file descriptor) before __enter__ is called. This is what I mean by a context manager that "misbehaves". If there was a requirement on context managers that __exit__ cleans up after __enter__ and any resource that needs cleaning up should only be acquired in __enter__ then there would never have been a problem with nested. In particular PEP 343 gives an alternative to the current behaviour of open: which is @contextmanager def opened(filename, mode="r"): f = open(filename, mode) try: yield f finally: f.close() https://www.python.org/dev/peps/pep-0343/#examples Because this uses the contextmanager decorator it may not immediately be obvious but this function does not suffer any of the problems described above. That is because what this returns is not a file object but rather an object that can only be used as a context manager. It is the __enter__ method of this context manager that opens the file and returns a usable file object. Here is a simple demonstration:
from contextlib import contextmanager @contextmanager ... def f(): ... print(1) # Executed on __enter__ ... try: ... yield 3 ... finally: ... pass ... f() <contextlib._GeneratorContextManager object at 0x10786be10> f().__enter__() 1 3
That means that there is no problem with using with nested(opened(filename1), opened(filename2)) as (file1, file2): ... or any of the variations on this above. For whatever reason this is not what was released in Python 2.5 which instead added the __enter__ and __exit__ methods to file objects themselves so that the existing open builtin could be used directly with the with statement. What I am saying is that conceived as a context manager the object returned by open misbehaves. I think that not just nested but a number of other convenient utilities and patterns could have been possible if opened has been used instead of open and if context managers were expected to meet the constraint: """ There should be no need to call __exit__ if __enter__ has not been called. """ Of course a lot of time has passed since then and now there are probably many other misbehaving context managers so it might be too late to do anything about that. Oscar