[Python-ideas] Deterministic iterator cleanup

Nick Coghlan ncoghlan at gmail.com
Sat Oct 22 23:22:54 EDT 2016


On 23 October 2016 at 02:17, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 22 October 2016 at 06:59, Chris Barker <chris.barker at noaa.gov> wrote:
>> And then context managers were introduced. And it seems to be there is a
>> consensus in the Python community that we all should be using them when
>> working on files, and I myself have finally started routinely using them,
>> and teaching newbies to use them -- which is kind of a pain, 'cause I want
>> to have them do basic file reading stuff before I explain what a "context
>> manager" is.
>
> This is actually a case where style guidelines would ideally differ
> between between scripting use cases (let the GC handle it whenever,
> since your process will be terminating soon anyway) and
> library(/framework/application) development use cases (promptly clean
> up after yourself, since you don't necessarily know your context of
> use).
>
> However, that script/library distinction isn't well-defined in
> computing instruction in general, and most published style guides are
> written by library/framework/application developers, so students and
> folks doing ad hoc scripting tend to be the recipients of a lot of
> well-meaning advice that isn't actually appropriate for them :(

Pondering this overnight, I realised there's a case where folks using
Python primarily as a scripting language can still run into many of
the resource management problems that arise in larger applications:
IPython notebooks, where the persistent kernel can keep resources
alive for a surprisingly long time in the absence of a reference
counting GC. Yes, they have the option of just restarting the kernel
(which many applications don't have), but it's still a nicer user
experience if we can help them avoid having those problems arise in
the first place.

This is likely mitigated in practice *today* by IPython users mostly
being on CPython for access to the Scientific Python stack, but we can
easily foresee a future where the PyPy community have worked out
enough of their NumPy compatibility and runtime redistribution
challenges that it becomes significantly more common to be using
notebooks against Python kernels that don't use automatic reference
counting.

I'm significantly more amenable to that as a rationale for pursuing
non-syntactic approaches to local resource management than I am the
notion of pursuing it for the sake of high performance application
development code.

Chris, would you be open to trying a thought experiment with some of
your students looking at ways to introduce function-scoped
deterministic resource management *before* introducing with
statements? Specifically, I'm thinking of a progression along the
following lines:

    # Cleaned up whenever the interpreter gets around to cleaning up
the function locals
    def readlines_with_default_resource_management(fname):
        return open(fname).readlines()

    # Cleaned up on function exit, even if the locals are still
referenced from an exception traceback
    # or the interpreter implementation doesn't use a reference counting GC
    from local_resources import function_resource

    def readlines_with_declarative_cleanup(fname):
       return function_resource(open(fname)).readlines()

    # Cleaned up at the end of the with statement
    def readlines_with_imperative_cleanup(fname):
        with open(fname) as f:
            return f.readlines()

The idea here is to change the requirement for new developers from
"telling the interpreter what to *do*" (which is the situation we have
for context managers) to "telling the interpreter what we *want*"
(which is for it to link a managed resource with the lifecycle of the
currently running function call, regardless of interpreter
implementation details)

Under that model, Inada-san's recent buffer snapshotting proposal
would effectively be an optimised version of the one liner:

    def snapshot(data, limit, offset=0):
        return bytes(function_resource(memoryview(data))[offset:limit])

The big refactoring benefit that this feature would offer over with
statements is that it doesn't require a structural change to the code
- it's just wrapping an existing expression in a new function call
that says "clean this up promptly when the function terminates, even
if it's still part of a reference cycle, or we're not using a
reference counting GC".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list