Mailman 3 July 2015 - Python-ideas

@classproperty, @abc.abstractclasspropery, etc.
by K. Richard Pixley Dec. 16, 2020

Dec. 16, 2020

There's a whole matrix of these and I'm wondering why the matrix is currently sparse rather than implementing them all. Or rather, why we can't stack them as: class foo(object): @classmethod @property def bar(cls, ...): ... Essentially the permutation are, I think: {'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable attribute}. concreteness implicit first arg type name comments {unadorned} {unadorned} method def foo(): exists … [View More]now {unadorned} {unadorned} property @property exists now {unadorned} {unadorned} non-callable attribute x = 2 exists now {unadorned} static method @staticmethod exists now {unadorned} static property @staticproperty proposing {unadorned} static non-callable attribute {degenerate case - variables don't have arguments} unnecessary {unadorned} class method @classmethod exists now {unadorned} class property @classproperty or @classmethod;@property proposing {unadorned} class non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract {unadorned} method @abc.abstractmethod exists now abc.abstract {unadorned} property @abc.abstractproperty exists now abc.abstract {unadorned} non-callable attribute @abc.abstractattribute or @abc.abstract;@attribute proposing abc.abstract static method @abc.abstractstaticmethod exists now abc.abstract static property @abc.staticproperty proposing abc.abstract static non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract class method @abc.abstractclassmethod exists now abc.abstract class property @abc.abstractclassproperty proposing abc.abstract class non-callable attribute {degenerate case - variables don't have arguments} unnecessary I think the meanings of the new ones are pretty straightforward, but in case they are not... @staticproperty - like @property only without an implicit first argument. Allows the property to be called directly from the class without requiring a throw-away instance. @classproperty - like @property, only the implicit first argument to the method is the class. Allows the property to be called directly from the class without requiring a throw-away instance. @abc.abstractattribute - a simple, non-callable variable that must be overridden in subclasses @abc.abstractstaticproperty - like @abc.abstractproperty only for @staticproperty @abc.abstractclassproperty - like @abc.abstractproperty only for @classproperty --rich [View Less]

10 15

Specify number of items to allocate for array.array() constructor
by Sven Rahmann Feb. 21, 2020

Feb. 21, 2020

At the moment, the array module of the standard library allows to create arrays of different numeric types and to initialize them from an iterable (eg, another array). What's missing is the possiblity to specify the final size of the array (number of items), especially for large arrays. I'm thinking of suffix arrays (a text indexing data structure) for large texts, eg the human genome and its reverse complement (about 6 billion characters from the alphabet ACGT). The suffix array is a long int … [View More]

14 20

Implicit string literal concatenation considered harmful?
by Guido van Rossum March 14, 2018

March 14, 2018

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b'). This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden). Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' … [View More]

51 165

solving multi-core Python
by Eric Snow June 3, 2016

June 3, 2016

tl;dr Let's exploit multiple cores by fixing up subinterpreters, exposing them in Python, and adding a mechanism to safely share objects between them. This proposal is meant to be a shot over the bow, so to speak. I plan on putting together a more complete PEP some time in the future, with content that is more refined along with references to the appropriate online resources. Feedback appreciated! Offers to help even more so! :) -eric -------- Python's multi-core story is murky at best. … [View More]Not only can we be more clear on the matter, we can improve Python's support. The result of any effort must make multi-core (i.e. parallelism) support in Python obvious, unmistakable, and undeniable (and keep it Pythonic). Currently we have several concurrency models represented via threading, multiprocessing, asyncio, concurrent.futures (plus others in the cheeseshop). However, in CPython the GIL means that we don't have parallelism, except through multiprocessing which requires trade-offs. (See Dave Beazley's talk at PyCon US 2015.) This is a situation I'd like us to solve once and for all for a couple of reasons. Firstly, it is a technical roadblock for some Python developers, though I don't see that as a huge factor. Regardless, secondly, it is especially a turnoff to folks looking into Python and ultimately a PR issue. The solution boils down to natively supporting multiple cores in Python code. This is not a new topic. For a long time many have clamored for death to the GIL. Several attempts have been made over the years and failed to do it without sacrificing single-threaded performance. Furthermore, removing the GIL is perhaps an obvious solution but not the only one. Others include Trent Nelson's PyParallels, STM, and other Python implementations.. Proposal ======= In some personal correspondence Nick Coghlan, he summarized my preferred approach as "the data storage separation of multiprocessing, with the low message passing overhead of threading". For Python 3.6: * expose subinterpreters to Python in a new stdlib module: "subinterpreters" * add a new SubinterpreterExecutor to concurrent.futures * add a queue.Queue-like type that will be used to explicitly share objects between subinterpreters This is less simple than it might sound, but presents what I consider the best option for getting a meaningful improvement into Python 3.6. Also, I'm not convinced that the word "subinterpreter" properly conveys the intent, for which subinterpreters is only part of the picture. So I'm open to a better name. Influences ======== Note that I'm drawing quite a bit of inspiration from elsewhere. The idea of using subinterpreters to get this (more) efficient isolated execution is not my own (I heard it from Nick). I have also spent quite a bit of time and effort researching for this proposal. As part of that, a number of people have provided invaluable insight and encouragement as I've prepared, including Guido, Nick, Brett Cannon, Barry Warsaw, and Larry Hastings. Additionally, Hoare's "Communicating Sequential Processes" (CSP) has been a big influence on this proposal. FYI, CSP is also the inspiration for Go's concurrency model (e.g. goroutines, channels, select). Dr. Sarah Mount, who has expertise in this area, has been kind enough to agree to collaborate and even co-author the PEP that I hope comes out of this proposal. My interest in this improvement has been building for several years. Recent events, including this year's language summit, have driven me to push for something concrete in Python 3.6. The subinterpreter Module ===================== The subinterpreters module would look something like this (a la threading/multiprocessing): settrace() setprofile() stack_size() active_count() enumerate() get_ident() current_subinterpreter() Subinterpreter(...) id is_alive() running() -> Task or None run(...) -> Task # wrapper around PyRun_*, auto-calls Task.start() destroy() Task(...) # analogous to a CSP process id exception() # other stuff? # for compatibility with threading.Thread: name ident is_alive() start() run() join() Channel(...) # shared by passing as an arg to the subinterpreter-running func # this API is a bit uncooked still... pop() push() poison() # maybe select() # maybe Note that Channel objects will necessarily be shared in common between subinterpreters (where bound). This sharing will happen when the one or more of the parameters to the function passed to Task() is a Channel. Thus the channel would be open to the (sub)interpreter calling Task() (or Subinterpreter.run()) and to the new subinterpreter. Also, other channels could be fed into such a shared channel, whereby those channels would then likewise be shared between the interpreters. I don't know yet if this module should include *all* the essential pieces to implement a complete CSP library. Given the inspiration that CSP is providing, it may make sense to support it fully. It would be interesting then if the implementation here allowed the (complete?) formalisms provided by CSP (thus, e.g. rigorous proofs of concurrent system models). I expect there will also be a _subinterpreters module with low-level implementation-specific details. Related Ideas and Details Under Consideration ==================================== Some of these are details that need to be sorted out. Some are secondary ideas that may be appropriate to address in this proposal or may need to be tabled. I have some others but these should be sufficient to demonstrate the range of points to consider. * further coalesce the (concurrency/parallelism) abstractions between threading, multiprocessing, asyncio, and this proposal * only allow one running Task at a time per subinterpreter * disallow threading within subinterpreters (with legacy support in C) + ignore/remove the GIL within subinterpreters (since they would be single-threaded) * use the GIL only in the main interpreter and for interaction between subinterpreters (and a "Local Interpreter Lock" for within a subinterpreter) * disallow forking within subinterpreters * only allow passing plain functions to Task() and Subinterpreter.run() (exclude closures, other callables) * object ownership model + read-only in all but 1 subinterpreter + RW in all subinterpreters + only allow 1 subinterpreter to have any refcounts to an object (except for channels) * only allow immutable objects to be shared between subinterpreters * for better immutability, move object ref counts into a separate table * freeze (new machinery or memcopy or something) objects to make them (at least temporarily) immutable * expose a more complete CSP implementation in the stdlib (or make the subinterpreters module more compliant) * treat the main interpreter differently than subinterpreters (or treat it exactly the same) * add subinterpreter support to asyncio (the interplay between them could be interesting) Key Dependencies ================ There are a few related tasks/projects that will likely need to be resolved before subinterpreters in CPython can be used in the proposed manner. The proposal could implemented either way, but it will help the multi-core effort if these are addressed first. * fixes to subinterpreter support (there are a couple individuals who should be able to provide the necessary insight) * PEP 432 (will simplify several key implementation details) * improvements to isolation between subinterpreters (file descriptors, env vars, others) Beyond those, the scale and technical scope of this project means that I am unlikely to be able to do all the work myself to land this in Python 3.6 (though I'd still give it my best shot). That will require the involvement of various experts. I expect that the project is divisible into multiple mostly independent pieces, so that will help. Python Implementations =================== They can correct me if I'm wrong, but from what I understand both Jython and IronPython already have subinterpreter support. I'll be soliciting feedback from the different Python implementors about subinterpreter support. C Extension Modules ================= Subinterpreters already isolate extension modules (and built-in modules, including sys). PEP 384 provides some help too. However, global state in C can easily leak data between subinterpreters, breaking the desired data isolation. This is something that will need to be addressed as part of the effort. [View Less]

26 130

Learning from the shell in supporting asyncio background calls
by Nick Coghlan Aug. 21, 2015

Aug. 21, 2015

Hi folks, Based on the recent discussions Sven kicked off regarding the complexity of interacting with asyncio from otherwise synchronous code, I came up with an API design that I like inspired by the way background and foreground tasks in the POSIX shell work. My blog post about this design is at http://www.curiousefficiency.org/posts/2015/07/asyncio-background-calls.html, but the essential components are the following two APIs: def run_in_background(target, *, loop=None): """… [View More]Schedules target as a background task Returns the scheduled task. If target is a future or coroutine, equivalent to asyncio.ensure_future If target is a callable, it is scheduled in the default executor """ ... def run_in_foreground(task, *, loop=None): """Runs event loop in current thread until the given task completes Returns the result of the task. For more complex conditions, combine with asyncio.wait() To include a timeout, combine with asyncio.wait_for() """ ... run_in_background is akin to invoking a shell command with a trailing "&" - it puts the operation into the background, leaving the current thread to move on to the next operation (or wait for input at the REPL). When coroutines are scheduled, they won't start running until you start a foreground task, while callables delegated to the default executor will start running immediately. To actually get the *results* of that task, you have to run it in the foreground of the current thread using run_in_foreground - this is akin to bringing a background process to the foreground of a shell session using "fg". To relate this idea back to some of the examples Sven was discussing, here's how translating some old serialised synchronous code to use those APIs might look in practice: # Serial synchronous data loading def load_and_process_data(): data1 = load_remote_data_set1() data2 = load_remote_data_set2() return process_data(data1, data2) # Parallel asynchronous data loading def load_and_process_data(): future1 = asyncio.run_in_background(load_remote_data_set1_async()) future2 = asyncio.run_in_background(load_remote_data_set2_async()) data1 = asyncio.run_in_foreground(future1) data2 = asyncio.run_in_foreground(future2) return process_data(data1, data2) The application remains fundamentally synchronous, but the asyncio event loop is exploited to obtain some local concurrency in waiting for client IO operations. Regards, Nick. P.S. time.sleep() and asyncio.sleep() are rather handy as standins for blocking and non-blocking IO operations. I wish I'd remembered that earlier :) -- Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia [View Less]

8 16

Briefer string format
by Mike Miller Aug. 9, 2015

Aug. 9, 2015

Have long wished python could format strings easily like bash or perl do, ... and then it hit me: csstext += f'{nl}{selector}{space}{{{nl}' (This script included whitespace vars to provide a minification option.) I've seen others make similar suggestions, but to my knowledge they didn't include this pleasing brevity aspect. -Mike

35 163

Re: [Python-ideas] Concurrency Modules
by Sven R. Kunze Aug. 5, 2015

Aug. 5, 2015

Hi. I am back. First of all thanks for your eager participation. I would like to catch on on Steve's and Mark's examples as they seem to be very good illustrations of what issue I still have. Steve explained why asyncio is great and Mark explained why threading+multiprocessing is great. Each from his own perspective and focusing on the internal implementation details. To me, all approaches can now be fit into this sort of table. Please, correct me if it's wrong (that is very important): … [View More]# | code lives in | managed by --+---------------+------------- 1 | processes | os scheduler 2 | threads | os scheduler 3 | tasks | event loop But the original question still stands: Which one to use? Ignoring little details like 'shared state', 'custom prioritization', etc., they all look the same to me and to what it all comes down are these little nasty details people try to explain so eagerly. Not saying that is a bad thing but it has some implications on production code I do not like and in the following I am going to explain that. Say, we have decided for approach N because of some requirements (examples from here and there, guidelines given by smart people, customer needs etc.) and wrote hundred thousand lines of code. What if these requirements change 6 years in the future? What if the maintainer of approach N decided to change it in such a way that is not compatible with our requirements anymore? From what I can see there is no easy way 'back' to use another approach. They all have different APIs, basically for: 'executing a function and returning its precious result (the cake)'. asyncio gives us the flexibility to choose a prioritization mechanism. Nice to have, because we are now independent on the os scheduler. But do we really ever need that? What is wrong with the os scheduler? Would that not mean that Mark better switches to asyncio? We don't know if we ever would need that in project A and project B. What now? Use asyncio just in case? Preemptively? @Steve Thanks for that great explanation of how asyncio works and its relationship to threads/processes. But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread). Not saying that asyncio would be a bad idea to use here, but couldn't we accomplish the same functionality by using threads? I think, after we've settled the above questions, we should change the focus from How do they work internally and what are the tiny differences? (answered greatly by Mark) to When do I use which one? The latter question actually is what counts for production code. It actually is quite interesting to know and to ponder over all the differences, dependencies, corner cases etc. However, when it actually comes down to 'executing a piece of code and returning its result', you end up deciding which approach to choose. You won't implement all 3 different ways just because it is great to see all the nasty little details to click in. On Thursday, July 9, 2015 at 11:54:11 PM UTC+1, Sven R. Kunze wrote: > > In order to make a sound decision for the question: "Which one(s) do I > use?", at least the following items should be somehow defined clearly > for these modules: > > 1) relationship between the modules > 2) NON-overlapping usage scenarios > 3) future development intentions > 4) ease of usage of the modules => future syntax > 5) examples [View Less]

10 24

Reprs of classes and functions
by Serhiy Storchaka Aug. 5, 2015

Aug. 5, 2015

This idea is already casually mentioned, but sank deep into the threads of the discussion. Raise it up. Currently reprs of classes and functions look as: >>> int <class 'int'> >>> int.from_bytes <built-in method from_bytes of type object at 0x826cf60> >>> open <built-in function open> >>> import collections >>> collections.Counter <class 'collections.Counter'> >>> collections.Counter.fromkeys <bound method … [View More]

7 9

A different format for PI?
by Abe Dillon July 31, 2015

July 31, 2015

Is there a forum or something similar related to python-ideas? If there isn't, I think there should be. The mailing list format is restrictive. There's no good way to search past discussions and the digests I get are disorganized and difficult to follow. I'd like to contribute, but I don't know if my ideas are topics that have already been discussed in depth or if they're actually new. -Abe Dillon

6 7

fork
by Sven R. Kunze July 30, 2015

July 30, 2015

Hi everybody, well during the discussion of the concurrency capabilities of Python, I found this article reading worthwhile: http://chriskiehl.com/article/parallelism-in-one-line/ His statement "Mmm.. Smell those Java roots." basically sums the whole topic up for me. That is sequential code (almost plain English): for image in images: create_thumbnail(image) In order to have a start with parallelism and concurrency, we need to do the following: pool = Pool() pool.map(… [View More]

4 3