Mailman 3 February 2016 - Python-ideas

@classproperty, @abc.abstractclasspropery, etc.
by K. Richard Pixley 17 Dec '20

17 Dec '20

There's a whole matrix of these and I'm wondering why the matrix is currently sparse rather than implementing them all. Or rather, why we can't stack them as: class foo(object): @classmethod @property def bar(cls, ...): ... Essentially the permutation are, I think: {'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable attribute}. concreteness implicit first arg type name comments {unadorned} {unadorned} method def foo(): exists now {unadorned} {unadorned} property @property exists now {unadorned} {unadorned} non-callable attribute x = 2 exists now {unadorned} static method @staticmethod exists now {unadorned} static property @staticproperty proposing {unadorned} static non-callable attribute {degenerate case - variables don't have arguments} unnecessary {unadorned} class method @classmethod exists now {unadorned} class property @classproperty or @classmethod;@property proposing {unadorned} class non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract {unadorned} method @abc.abstractmethod exists now abc.abstract {unadorned} property @abc.abstractproperty exists now abc.abstract {unadorned} non-callable attribute @abc.abstractattribute or @abc.abstract;@attribute proposing abc.abstract static method @abc.abstractstaticmethod exists now abc.abstract static property @abc.staticproperty proposing abc.abstract static non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract class method @abc.abstractclassmethod exists now abc.abstract class property @abc.abstractclassproperty proposing abc.abstract class non-callable attribute {degenerate case - variables don't have arguments} unnecessary I think the meanings of the new ones are pretty straightforward, but in case they are not... @staticproperty - like @property only without an implicit first argument. Allows the property to be called directly from the class without requiring a throw-away instance. @classproperty - like @property, only the implicit first argument to the method is the class. Allows the property to be called directly from the class without requiring a throw-away instance. @abc.abstractattribute - a simple, non-callable variable that must be overridden in subclasses @abc.abstractstaticproperty - like @abc.abstractproperty only for @staticproperty @abc.abstractclassproperty - like @abc.abstractproperty only for @classproperty --rich

10 15

Specify number of items to allocate for array.array() constructor
by Sven Rahmann 22 Feb '20

22 Feb '20

At the moment, the array module of the standard library allows to create arrays of different numeric types and to initialize them from an iterable (eg, another array). What's missing is the possiblity to specify the final size of the array (number of items), especially for large arrays. I'm thinking of suffix arrays (a text indexing data structure) for large texts, eg the human genome and its reverse complement (about 6 billion characters from the alphabet ACGT). The suffix array is a long int array of the same size (8 bytes per number, so it occupies about 48 GB memory). At the moment I am extending an array in chunks of several million items at a time at a time, which is slow and not elegant. The function below also initializes each item in the array to a given value (0 by default). Is there a reason why there the array.array constructor does not allow to simply specify the number of items that should be allocated? (I do not really care about the contents.) Would this be a worthwhile addition to / modification of the array module? My suggestions is to modify array generation in such a way that you could pass an iterator (as now) as second argument, but if you pass a single integer value, it should be treated as the number of items to allocate. Here is my current workaround (which is slow): def filled_array(typecode, n, value=0, bsize=(1<<22)): """returns a new array with given typecode (eg, "l" for long int, as in the array module) with n entries, initialized to the given value (default 0) """ a = array.array(typecode, [value]*bsize) x = array.array(typecode) r = n while r >= bsize: x.extend(a) r -= bsize x.extend([value]*r) return x

14 20

Implicit string literal concatenation considered harmful?
by Guido van Rossum 15 Mar '18

15 Mar '18

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b'). This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden). Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.) Would it be reasonable to start deprecating this and eventually remove it from the language? -- --Guido van Rossum (python.org/~guido)

51 165

proposal: "python -m foo" should bind sys.modules['foo']
by Cameron Simpson 18 Jan '17

18 Jan '17

Hello all, This is a writeup of a proposal I floated here: https://mail.python.org/pipermail/python-list/2015-August/694905.html last Sunday. If the response is positive I wish to write a PEP. Briefly, it is a natural expectation in users that the command: python -m module_name ... used to invoke modules in "main program" mode on the command line imported the module as "module_name". It does not, it imports it as "__main__". An import within the program of "module_name" makes a new instance of the module, which causes cognitive dissonance and has the side effect that now the program has two instances of the module. What I propose is that the above command line _should_ bind sys.modules['module_name'] as well as binding '__main__' as it does currently. I'm proposing that the python -m option have this effect (python pseudocode): % python -m module.name ... runs: # pseudocode, with values hardwired for clarity import sys M = new_empty_module(name='__main__', qualname='module.name') sys.modules['__main__'] = M sys.modules['module.name'] = M # load the module code from wherever (not necessarily a file - CPython # already must do this phase) M.execfile('/path/to/module/name.py') Specificly, this would have the following two changes to current practice: 1) the module is imported _once_, and bound to both its canonical name and also to __main__. 2) imported modules acquire a new attribute __qualname__ (analogous to the recent __qualname__ on functions). This is always the conanoical name of the module as resolved by the importer. For most modules __name__ will be the same as __qualname__, but for the "main" module __name__ will be '__main__'. This change has the following advantages: The current standard boilerplate: if __name__ == '__main__': ... invoke "main program" here ... continues to work unchanged. Importantly, if the program then issues "import module_name", it is already there and the existing instance is found and used. The thread referenced above outlines my most recent encounter with this and the trouble it caused me. Followup messages include some support for this proposed change, and some criticism. The critiquing article included some workarounds for this multiple module situation, but they were (1) somewhat dependent on modules coming from a file pathname and (2) cumbersome and require every end user to adopt these changes if affected by the situation. I'd like to avoid that. Cheers, Cameron Simpson <cs(a)zip.com.au> The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw

5 13

Re: [Python-ideas] Exposing regular expression bytecode
by Jonathan Goble 05 Jun '16

05 Jun '16

OK, thanks for the comments, everyone. I'm glad to hear that people generally think this is a useful idea. Some specific replies: On Tue, Feb 16, 2016 at 4:22 AM, Chris Angelico <rosuav(a)gmail.com> wrote: > For what it's worth, I read your post with interest, but didn't have > anything substantive to reply - mainly because I don't use regexes > much. But it would be rather cool to be able to decompile a regex. > Imagine a regex pretty-printer: compile an arbitrary string, and if > it's a valid regex, decompile it to a valid source code form, using > re.VERBOSE. That could help _hugely_ with debugging, if the trick can > be pulled off. > > ChrisA That's exactly the type of tools I envision being made available by third parties. Depending on how much I get invested into this project, I may even write such a tool myself (though that's not guaranteed). On Tue, Feb 16, 2016 at 4:55 AM, Paul Moore <p.f.moore(a)gmail.com> wrote: > Sorry. I don't personally have any issue with the proposal, and it > sounds like a reasonable idea. I don't think it's likely to be > *hugely* controversial - although it will likely need a little care in > documenting the feature to ensure that we are clear that there's no > guarantees of backward compatibility that we don't want to commit to > on the newly - exposed data. And we should also ensure that by > exposing this information, we don't preclude changes such as the > incorporation of the regex module (I don't know if the regex module > has a bytecode implementation like the re module does). The regex implementation is indeed something I would need to investigate here, and will do so before I go too far. > The next step is probably simply to raise a tracker issue for this. I > know you said you have little C experience, but the big problem is > that it's unlikely that any of the core devs with C experience will > have the time or motivation to code up your idea. So without a working > patch, and someone willing and able to respond to comments on the > patch, it's not likely to progress. Tracker issue is already filed: http://bugs.python.org/issue26336 I actually filed the issue before I realized that the mailing lists were a better place to discuss it. > But if you are willing to dig into Python's C API yourself (and it > sounds like you are) there are definitely people who will help you. > You might want to join the core mentorship list (see > http://pythonmentors.com/) where you should get plenty of assistance. > This proposal sounds like a great "beginner" task, as well - so even > if you don't want to implement it yourself, still put it on the > tracker, and mark it as an "easy" change, and maybe some other > newcomer who wants a task to help them learn the C API will pick it > up. I'll look into the mentorship list; thanks for the link. As for marking it "easy", I don't seem to have the necessary permissions to change the Keywords field; perhaps you or someone else can set that flag for me? If so, I'd appreciate it. :-) > Hope that helps - thanks for the suggestion and sorry if it seems like > no-one was interested at first. It's an unfortunate fact of life > around here that things *do* take time to get people's interest. You > mention patience in one of your messages - that's definitely something > you'll need to cultivate, I'm afraid... :-) Patience is something I've been working on since I was a little kid. I'm 29 years old now, and it still eludes me from time to time. But yes, it's something I'll have to work on. :-P Also, I received a small patch off-list from Petr Viktorin implementing a getter for the code list (thanks, Petr). I'll need to test it, but from the little I know of the C API it looks like it will get me started in the right direction. Assuming that works, what's left is a public constructor for the regex type (to enable optimizers), a dis-like module, and docs and tests. I don't think this would be major enough to require a PEP, but of course being new here, I'm open to being told I'm wrong. :-)

4 7

solving multi-core Python
by Eric Snow 04 Jun '16

04 Jun '16

tl;dr Let's exploit multiple cores by fixing up subinterpreters, exposing them in Python, and adding a mechanism to safely share objects between them. This proposal is meant to be a shot over the bow, so to speak. I plan on putting together a more complete PEP some time in the future, with content that is more refined along with references to the appropriate online resources. Feedback appreciated! Offers to help even more so! :) -eric -------- Python's multi-core story is murky at best. Not only can we be more clear on the matter, we can improve Python's support. The result of any effort must make multi-core (i.e. parallelism) support in Python obvious, unmistakable, and undeniable (and keep it Pythonic). Currently we have several concurrency models represented via threading, multiprocessing, asyncio, concurrent.futures (plus others in the cheeseshop). However, in CPython the GIL means that we don't have parallelism, except through multiprocessing which requires trade-offs. (See Dave Beazley's talk at PyCon US 2015.) This is a situation I'd like us to solve once and for all for a couple of reasons. Firstly, it is a technical roadblock for some Python developers, though I don't see that as a huge factor. Regardless, secondly, it is especially a turnoff to folks looking into Python and ultimately a PR issue. The solution boils down to natively supporting multiple cores in Python code. This is not a new topic. For a long time many have clamored for death to the GIL. Several attempts have been made over the years and failed to do it without sacrificing single-threaded performance. Furthermore, removing the GIL is perhaps an obvious solution but not the only one. Others include Trent Nelson's PyParallels, STM, and other Python implementations.. Proposal ======= In some personal correspondence Nick Coghlan, he summarized my preferred approach as "the data storage separation of multiprocessing, with the low message passing overhead of threading". For Python 3.6: * expose subinterpreters to Python in a new stdlib module: "subinterpreters" * add a new SubinterpreterExecutor to concurrent.futures * add a queue.Queue-like type that will be used to explicitly share objects between subinterpreters This is less simple than it might sound, but presents what I consider the best option for getting a meaningful improvement into Python 3.6. Also, I'm not convinced that the word "subinterpreter" properly conveys the intent, for which subinterpreters is only part of the picture. So I'm open to a better name. Influences ======== Note that I'm drawing quite a bit of inspiration from elsewhere. The idea of using subinterpreters to get this (more) efficient isolated execution is not my own (I heard it from Nick). I have also spent quite a bit of time and effort researching for this proposal. As part of that, a number of people have provided invaluable insight and encouragement as I've prepared, including Guido, Nick, Brett Cannon, Barry Warsaw, and Larry Hastings. Additionally, Hoare's "Communicating Sequential Processes" (CSP) has been a big influence on this proposal. FYI, CSP is also the inspiration for Go's concurrency model (e.g. goroutines, channels, select). Dr. Sarah Mount, who has expertise in this area, has been kind enough to agree to collaborate and even co-author the PEP that I hope comes out of this proposal. My interest in this improvement has been building for several years. Recent events, including this year's language summit, have driven me to push for something concrete in Python 3.6. The subinterpreter Module ===================== The subinterpreters module would look something like this (a la threading/multiprocessing): settrace() setprofile() stack_size() active_count() enumerate() get_ident() current_subinterpreter() Subinterpreter(...) id is_alive() running() -> Task or None run(...) -> Task # wrapper around PyRun_*, auto-calls Task.start() destroy() Task(...) # analogous to a CSP process id exception() # other stuff? # for compatibility with threading.Thread: name ident is_alive() start() run() join() Channel(...) # shared by passing as an arg to the subinterpreter-running func # this API is a bit uncooked still... pop() push() poison() # maybe select() # maybe Note that Channel objects will necessarily be shared in common between subinterpreters (where bound). This sharing will happen when the one or more of the parameters to the function passed to Task() is a Channel. Thus the channel would be open to the (sub)interpreter calling Task() (or Subinterpreter.run()) and to the new subinterpreter. Also, other channels could be fed into such a shared channel, whereby those channels would then likewise be shared between the interpreters. I don't know yet if this module should include *all* the essential pieces to implement a complete CSP library. Given the inspiration that CSP is providing, it may make sense to support it fully. It would be interesting then if the implementation here allowed the (complete?) formalisms provided by CSP (thus, e.g. rigorous proofs of concurrent system models). I expect there will also be a _subinterpreters module with low-level implementation-specific details. Related Ideas and Details Under Consideration ==================================== Some of these are details that need to be sorted out. Some are secondary ideas that may be appropriate to address in this proposal or may need to be tabled. I have some others but these should be sufficient to demonstrate the range of points to consider. * further coalesce the (concurrency/parallelism) abstractions between threading, multiprocessing, asyncio, and this proposal * only allow one running Task at a time per subinterpreter * disallow threading within subinterpreters (with legacy support in C) + ignore/remove the GIL within subinterpreters (since they would be single-threaded) * use the GIL only in the main interpreter and for interaction between subinterpreters (and a "Local Interpreter Lock" for within a subinterpreter) * disallow forking within subinterpreters * only allow passing plain functions to Task() and Subinterpreter.run() (exclude closures, other callables) * object ownership model + read-only in all but 1 subinterpreter + RW in all subinterpreters + only allow 1 subinterpreter to have any refcounts to an object (except for channels) * only allow immutable objects to be shared between subinterpreters * for better immutability, move object ref counts into a separate table * freeze (new machinery or memcopy or something) objects to make them (at least temporarily) immutable * expose a more complete CSP implementation in the stdlib (or make the subinterpreters module more compliant) * treat the main interpreter differently than subinterpreters (or treat it exactly the same) * add subinterpreter support to asyncio (the interplay between them could be interesting) Key Dependencies ================ There are a few related tasks/projects that will likely need to be resolved before subinterpreters in CPython can be used in the proposed manner. The proposal could implemented either way, but it will help the multi-core effort if these are addressed first. * fixes to subinterpreter support (there are a couple individuals who should be able to provide the necessary insight) * PEP 432 (will simplify several key implementation details) * improvements to isolation between subinterpreters (file descriptors, env vars, others) Beyond those, the scale and technical scope of this project means that I am unlikely to be able to do all the work myself to land this in Python 3.6 (though I'd still give it my best shot). That will require the involvement of various experts. I expect that the project is divisible into multiple mostly independent pieces, so that will help. Python Implementations =================== They can correct me if I'm wrong, but from what I understand both Jython and IronPython already have subinterpreter support. I'll be soliciting feedback from the different Python implementors about subinterpreter support. C Extension Modules ================= Subinterpreters already isolate extension modules (and built-in modules, including sys). PEP 384 provides some help too. However, global state in C can easily leak data between subinterpreters, breaking the desired data isolation. This is something that will need to be addressed as part of the effort.

26 130

Exposing flat bytecode representation to optimizers
by Andrew Barnert 15 May '16

15 May '16

The biggest pain with dealing with the peephole optimizer is that it happens after all the complicated flattening and fixup[^1] the compiler does, which means you have to hack up the jump targets as you go along. The future bytecode optimizers that PEP 511 enables will have the same headache. But this isn't actually necessary. The optimizer could work on a flat array of instructions[^2] instead of an array of bytes, with relocatable jump targets instead of fixed byte offsets, and then the compiler could do the fixup _after_ the optimization.[^3] It would break the optimizer APIs, but `PyCode_Optimize` isn't public, and the API proposed by PEP 511 is public, but that PEP isn't even finalized, much less accepted yet. I don't think we need to expose the intermediate representation any farther along than the `PyCode_Optimize` step.[^4] Just moving the optimize one step earlier in the chain solves more than enough to be worth it. [^1]: If you think numbering the offsets and emitting the jump targets is easy: Every time you fix up a jump, that may require adding an `EXTENDED_ARG`, which means you have to redo any fixups that cross the the current instruction. The compiler does this by just looping until it hasn't added any more `EXTENDED_ARG`s. [^2]: In case anyone's thinking that wordcode would solve this problem, it doesn't. The `EXTENDED_ARG` jump targets are a much bigger hassle than the 1-or-3-byte-ops, and wordcode not only can't eliminate those, but makes `EXTENDED_ARG` more common. [^3]: The compiler doesn't actually have exactly what the optimizers would want, but it's pretty close: it has a linked list of block objects, each of which has an array of instruction objects, with jump targets being pointers to blocks. That's actually even better to work with, but too complicated to expose to optimizers. Flattening it would be trivial. Or, if that's too expensive, we could do something almost as simple and much cheaper: convert it in-line to a deque-like linked list of arrays, with jump targets being indices or pointers into that. Or we could just expose the list of blocks as-is, as an opaque thing with a mutable-deque-of-instructions API around it. [^4]: Later stages--import hooks, optimizing decorators, etc.--have the same pain as the peephole optimizer, but they also have code objects that contain other code objects, and they can be written in Python, and so on, so the same solution isn't feasible there. Of course we could add a function to convert bytecode back to a list of instructions, in some format that can be exposed to Python (in fact, `dis` already does 90% of that), and then another one to convert that back to bytecode and redo the fixup (which is basically the third-party `byteplay` module). But that's almost certainly overkill. (If we wanted that, I'd rather add `byteplay` to the stdlib, port it and `dis` to C, and expose a C API for them. And then we could use that for everything, including the peephole optimizer and PEP 511 optimizers. Which would all be really cool, but I think it's more work than we want to do, and I don't know if we'd actually want something like `byteplay` builtin even if it were easy...)

6 16

Re: [Python-ideas] How the heck does async/await work in Python 3.5
by Sven R. Kunze 15 Mar '16

15 Mar '16

On 20.02.2016 07:53, Christian Gollwitzer wrote: > If you have difficulties wit hthe overall concept, and if you are open > to discussions in another language, take a look at this video: > > https://channel9.msdn.com/Shows/C9-GoingNative/GoingNative-39-await-co-rout… > > > MS has added coroutine support with very similar syntax to VC++ > recently, and the developer tries to explain it to the "stackful" > programmers. Because of this thread, I finally finished an older post collecting valuable insights from last year discussions regarding concurrency modules available in Python: http://srkunze.blogspot.com/2016/02/concurrency-in-python.html It appears to me that it would fit here well. @python-ideas Back then, the old thread ("Concurrency Modules") was like basically meant to result in something useful. I hope the post covers the essence of the discussion. Some even suggested putting the table into the Python docs. I am unaware of the formal procedure here but I would be glad if somebody could point be at the right direction if that the survey table is wanted in the docs. Best, Sven

9 15

Simpler Customization of Class Creation - PEP 487
by Martin Teichmann 03 Mar '16

03 Mar '16

Hi List, about a year ago I started a discussion on how to simplify metaclasses, which led to PEP 487. I got some good ideas from this list, but couldn't follow up on this because I was bound in other projects. In short, metaclasses are often not used as they are considered very complicated. Indeed they are, especially if you need to use two of them at the same time in a multiple inheritance context. Most metaclasses, however, serve only some of the following three purposes: a) run some code after a class is created b) initialize descriptors of a class or c) keep the order in which class attributes have been defined. PEP 487 now proposes to put a metaclass into the standard library, which can be used for all those three purposes. If now libraries start to use this metaclass, we won't need any metaclass mixing anymore. What has changed since the last time I posted PEP 487? Firstly, I re-wrote large parts of the PEP to make it easier to read. Those who liked the old text, that's still existing in PEP 422. Secondly, I modified the proposal following suggestions from this list: I added the descriptor initialization (purpose b)), as this was considered particularly useful, even if it could in principle be done using purpose a) from above. The order-keeping of the class attributes is the leftover from a much more ambitious previous idea that would have allowed for custom namespaces during class creation. But this additional feature would have rendered the most common usecase - getting the order of attributes - much more complicated, so I opted for usability over flexibility. I have put the new version of the PEP here: https://github.com/tecki/metaclasses/blob/pep487/pep-0487.txt and also added it to this posting. An implementation of this PEP can be found at: https://pypi.python.org/pypi/metaclass Greetings Martin PEP: 487 Title: Simpler customisation of class creation Version: $Revision$ Last-Modified: $Date$ Author: Martin Teichmann <lkb.teichmann(a)gmail.com>, Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 27-Feb-2015 Python-Version: 3.6 Post-History: 27-Feb-2015, 5-Feb-2016 Replaces: 422 Abstract ======== Currently, customising class creation requires the use of a custom metaclass. This custom metaclass then persists for the entire lifecycle of the class, creating the potential for spurious metaclass conflicts. This PEP proposes to instead support a wide range of customisation scenarios through a new ``__init_subclass__`` hook in the class body, a hook to initialize descriptors, and a way to keep the order in which attributes are defined. Those hooks should at first be defined in a metaclass in the standard library, with the option that this metaclass eventually becomes the default ``type`` metaclass. The new mechanism should be easier to understand and use than implementing a custom metaclass, and thus should provide a gentler introduction to the full power Python's metaclass machinery. Background ========== Metaclasses are a powerful tool to customize class creation. They have, however, the problem that there is no automatic way to combine metaclasses. If one wants to use two metaclasses for a class, a new metaclass combining those two needs to be created, typically manually. This need often occurs as a surprise to a user: inheriting from two base classes coming from two different libraries suddenly raises the necessity to manually create a combined metaclass, where typically one is not interested in those details about the libraries at all. This becomes even worse if one library starts to make use of a metaclass which it has not done before. While the library itself continues to work perfectly, suddenly every code combining those classes with classes from another library fails. Proposal ======== While there are many possible ways to use a metaclass, the vast majority of use cases falls into just three categories: some initialization code running after class creation, the initalization of descriptors and keeping the order in which class attributes were defined. Those three use cases can easily be performed by just one metaclass. If this metaclass is put into the standard library, and all libraries that wish to customize class creation use this very metaclass, no combination of metaclasses is necessary anymore. The three use cases are achieved as follows: 1. The metaclass contains an ``__init_subclass__`` hook that initializes all subclasses of a given class, 2. the metaclass calls an ``__init_descriptor__`` hook for all descriptors defined in the class, and 3. an ``__attribute_order__`` tuple is left in the class in order to inspect the order in which attributes were defined. For ease of use, a base class ``SubclassInit`` is defined, which uses said metaclass and contains an empty stub for the hook described for use case 1. As an example, the first use case looks as follows:: class SpamBase(SubclassInit): # this is implicitly a @classmethod def __init_subclass__(cls, **kwargs): # This is invoked after a subclass is created, but before # explicit decorators are called. # The usual super() mechanisms are used to correctly support # multiple inheritance. # **kwargs are the keyword arguments to the subclasses' # class creation statement super().__init_subclass__(cls, **kwargs) class Spam(SpamBase): pass # the new hook is called on Spam The base class ``SubclassInit`` contains an empty ``__init_subclass__`` method which serves as an endpoint for cooperative multiple inheritance. Note that this method has no keyword arguments, meaning that all methods which are more specialized have to process all keyword arguments. This general proposal is not a new idea (it was first suggested for inclusion in the language definition `more than 10 years ago`_, and a similar mechanism has long been supported by `Zope's ExtensionClass`_), but the situation has changed sufficiently in recent years that the idea is worth reconsidering for inclusion. The second part of the proposal adds an ``__init_descriptor__`` initializer for descriptors. Descriptors are defined in the body of a class, but they do not know anything about that class, they do not even know the name they are accessed with. They do get to know their owner once ``__get__`` is called, but still they do not know their name. This is unfortunate, for example they cannot put their associated value into their object's ``__dict__`` under their name, since they do not know that name. This problem has been solved many times, and is one of the most important reasons to have a metaclass in a library. While it would be easy to implement such a mechanism using the first part of the proposal, it makes sense to have one solution for this problem for everyone. To give an example of its usage, imagine a descriptor representing weak referenced values (this is an insanely simplified, yet working example):: import weakref class WeakAttribute: def __get__(self, instance, owner): return instance.__dict__[self.name] def __set__(self, instance, value): instance.__dict__[self.name] = weakref.ref(value) # this is the new initializer: def __init_descriptor__(self, owner, name): self.name = name The third part of the proposal is to leave a tuple called ``__attribute_order__`` in the class that contains the order in which the attributes were defined. This is a very common usecase, many libraries use an ``OrderedDict`` to store this order. This is a very simple way to achieve the same goal. Key Benefits ============ Easier inheritance of definition time behaviour ----------------------------------------------- Understanding Python's metaclasses requires a deep understanding of the type system and the class construction process. This is legitimately seen as challenging, due to the need to keep multiple moving parts (the code, the metaclass hint, the actual metaclass, the class object, instances of the class object) clearly distinct in your mind. Even when you know the rules, it's still easy to make a mistake if you're not being extremely careful. Understanding the proposed implicit class initialization hook only requires ordinary method inheritance, which isn't quite as daunting a task. The new hook provides a more gradual path towards understanding all of the phases involved in the class definition process. Reduced chance of metaclass conflicts ------------------------------------- One of the big issues that makes library authors reluctant to use metaclasses (even when they would be appropriate) is the risk of metaclass conflicts. These occur whenever two unrelated metaclasses are used by the desired parents of a class definition. This risk also makes it very difficult to *add* a metaclass to a class that has previously been published without one. By contrast, adding an ``__init_subclass__`` method to an existing type poses a similar level of risk to adding an ``__init__`` method: technically, there is a risk of breaking poorly implemented subclasses, but when that occurs, it is recognised as a bug in the subclass rather than the library author breaching backwards compatibility guarantees. A path of introduction into Python ================================== Most of the benefits of this PEP can already be implemented using a simple metaclass. For the ``__init_subclass__`` hook this works all the way down to Python 2.7, while the attribute order needs Python 3.0 to work. Such a class has been `uploaded to PyPI`_. The only drawback of such a metaclass are the mentioned problems with metaclasses and multiple inheritance. Two classes using such a metaclass can only be combined, if they use exactly the same such metaclass. This fact calls for the inclusion of such a class into the standard library, let's call it ``SubclassMeta``, with the base class using it called ``SubclassInit``. Once all users use this standard library metaclass, classes from different packages can easily be combined. But still such classes cannot be easily combined with other classes using other metaclasses. Authors of metaclasses should bear that in mind and inherit from the standard metaclass if it seems useful for users of the metaclass to add more functionality. Ultimately, if the need for combining with other metaclasses is strong enough, the proposed functionality may be introduced into Python's ``type``. Those arguments strongly hint to the following procedure to include the proposed functionality into Python: 1. The metaclass implementing this proposal is put onto PyPI, so that it can be used and scrutinized. 2. Once the code is properly mature, it can be added to the Python standard library. There should be a new module called ``metaclass`` which collects tools for metaclass authors, as well as a documentation of the best practices of how to write metaclasses. 3. If the need of combining this metaclass with other metaclasses is strong enough, it may be included into Python itself. While the metaclass is still in the standard library and not in the language, it may still clash with other metaclasses. The most prominent metaclass in use is probably ABCMeta. It is also a particularly good example for the need of combining metaclasses. For users who want to define a ABC with subclass initialization, we should support a ``ABCSubclassInit`` class, or let ABCMeta inherit from this PEP's metaclass. Extensions written in C or C++ also often define their own metaclass. It would be very useful if those could also inherit from the metaclass defined here, but this is probably not possible. New Ways of Using Classes ========================= This proposal has many usecases like the following. In the examples, we still inherit from the ``SubclassInit`` base class. This would become unnecessary once this PEP is included in Python directly. Subclass registration --------------------- Especially when writing a plugin system, one likes to register new subclasses of a plugin baseclass. This can be done as follows:: class PluginBase(SubclassInit): subclasses = [] def __init_subclass__(cls, **kwargs): super().__init_subclass__(**kwargs) cls.subclasses.append(cls) One should note that this also works nicely as a mixin class. Trait descriptors ----------------- There are many designs of Python descriptors in the wild which, for example, check boundaries of values. Often those "traits" need some support of a metaclass to work. This is how this would look like with this PEP:: class Trait: def __get__(self, instance, owner): return instance.__dict__[self.key] def __set__(self, instance, value): instance.__dict__[self.key] = value def __init_descriptor__(self, owner, name): self.key = name class Int(Trait): def __set__(self, instance, value): # some boundary check code here super().__set__(instance, value) Rejected Design Options ======================= Calling the hook on the class itself ------------------------------------ Adding an ``__autodecorate__`` hook that would be called on the class itself was the proposed idea of PEP 422. Most examples work the same way or even better if the hook is called on the subclass. In general, it is much easier to explicitly call the hook on the class in which it is defined (to opt-in to such a behavior) than to opt-out, meaning that one does not want the hook to be called on the class it is defined in. This becomes most evident if the class in question is designed as a mixin: it is very unlikely that the code of the mixin is to be executed for the mixin class itself, as it is not supposed to be a complete class on its own. The original proposal also made major changes in the class initialization process, rendering it impossible to back-port the proposal to older Python versions. Other variants of calling the hook ---------------------------------- Other names for the hook were presented, namely ``__decorate__`` or ``__autodecorate__``. This proposal opts for ``__init_subclass__`` as it is very close to the ``__init__`` method, just for the subclass, while it is not very close to decorators, as it does not return the class. Requiring an explicit decorator on ``__init_subclass__`` -------------------------------------------------------- One could require the explicit use of ``@classmethod`` on the ``__init_subclass__`` decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message. This decision was reinforced after noticing that the user experience of defining ``__prepare__`` and forgetting the ``@classmethod`` method decorator is singularly incomprehensible (particularly since PEP 3115 documents it as an ordinary method, and the current documentation doesn't explicitly say anything one way or the other). Defining arbitrary namespaces ----------------------------- PEP 422 defined a generic way to add arbitrary namespaces for class definitions. This approach is much more flexible than just leaving the definition order in a tuple. The ``__prepare__`` method in a metaclass supports exactly this behavior. But given that effectively the only use cases that could be found out in the wild were the ``OrderedDict`` way of determining the attribute order, it seemed reasonable to only support this special case. The metaclass described in this PEP has been designed to be very simple such that it could be reasonably made the default metaclass. This was especially important when designing the attribute order functionality: This was a highly demanded feature and has been enabled through the ``__prepare__`` method of metaclasses. This method can be abused in very weird ways, making it hard to correctly maintain this feature in CPython. This is why it has been proposed to deprecated this feature, and instead use ``OrderedDict`` as the standard namespace, supporting the most important feature while dropping most of the complexity. But this would have meant that ``OrderedDict`` becomes a language builtin like dict and set, and not just a standard library class. The choice of the ``__attribute_order__`` tuple is a much simpler solution to the problem. A more ``__new__``-like hook ---------------------------- In PEP 422 the hook worked more like the ``__new__`` method than the ``__init__`` method, meaning that it returned a class instead of modifying one. This allows a bit more flexibility, but at the cost of much harder implementation and undesired side effects. History ======= This used to be a competing proposal to PEP 422 by Nick Coughlan and Daniel Urban. It shares both most of the PEP text and proposed code, but has major differences in how to achieve its goals. In the meantime, PEP 422 has been withdrawn favouring this approach. References ========== .. _published code: http://mail.python.org/pipermail/python-dev/2012-June/119878.html .. _more than 10 years ago: http://mail.python.org/pipermail/python-dev/2001-November/018651.html .. _Zope's ExtensionClass: http://docs.zope.org/zope_secrets/extensionclass.html .. _uploaded to PyPI: https://pypi.python.org/pypi/metaclass Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

7 16

Prevent importing yourself?
by Ned Batchelder 02 Mar '16

02 Mar '16

Hi, A common question we get in the #python IRC channel is, "I tried importing a module, but I get an AttributeError trying to use the things it said it provided." Turns out the beginner named their own file the same as the module they were trying to use. That is, they want to try (for example) the "azure" package. So they make a file called azure.py, and start with "import azure". The import succeeds, but it has none of the contents the documentation claims, because they have imported themselves. It's baffling, because they have used the exact statements shown in the examples, but it doesn't work. Could we make this a more obvious failure? Is there ever a valid reason for a file to import itself? Is this situation detectable in the import machinery? --Ned.

14 24