Mailman 3 May 2014 - Python-ideas

@classproperty, @abc.abstractclasspropery, etc.
by K. Richard Pixley 16 Dec '20

16 Dec '20

There's a whole matrix of these and I'm wondering why the matrix is currently sparse rather than implementing them all. Or rather, why we can't stack them as: class foo(object): @classmethod @property def bar(cls, ...): ... Essentially the permutation are, I think: {'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable attribute}. concreteness implicit first arg type name comments {unadorned} {unadorned} method def foo(): exists now {unadorned} {unadorned} property @property exists now {unadorned} {unadorned} non-callable attribute x = 2 exists now {unadorned} static method @staticmethod exists now {unadorned} static property @staticproperty proposing {unadorned} static non-callable attribute {degenerate case - variables don't have arguments} unnecessary {unadorned} class method @classmethod exists now {unadorned} class property @classproperty or @classmethod;@property proposing {unadorned} class non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract {unadorned} method @abc.abstractmethod exists now abc.abstract {unadorned} property @abc.abstractproperty exists now abc.abstract {unadorned} non-callable attribute @abc.abstractattribute or @abc.abstract;@attribute proposing abc.abstract static method @abc.abstractstaticmethod exists now abc.abstract static property @abc.staticproperty proposing abc.abstract static non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract class method @abc.abstractclassmethod exists now abc.abstract class property @abc.abstractclassproperty proposing abc.abstract class non-callable attribute {degenerate case - variables don't have arguments} unnecessary I think the meanings of the new ones are pretty straightforward, but in case they are not... @staticproperty - like @property only without an implicit first argument. Allows the property to be called directly from the class without requiring a throw-away instance. @classproperty - like @property, only the implicit first argument to the method is the class. Allows the property to be called directly from the class without requiring a throw-away instance. @abc.abstractattribute - a simple, non-callable variable that must be overridden in subclasses @abc.abstractstaticproperty - like @abc.abstractproperty only for @staticproperty @abc.abstractclassproperty - like @abc.abstractproperty only for @classproperty --rich

10 15

Specify number of items to allocate for array.array() constructor
by Sven Rahmann 21 Feb '20

21 Feb '20

At the moment, the array module of the standard library allows to create arrays of different numeric types and to initialize them from an iterable (eg, another array). What's missing is the possiblity to specify the final size of the array (number of items), especially for large arrays. I'm thinking of suffix arrays (a text indexing data structure) for large texts, eg the human genome and its reverse complement (about 6 billion characters from the alphabet ACGT). The suffix array is a long int array of the same size (8 bytes per number, so it occupies about 48 GB memory). At the moment I am extending an array in chunks of several million items at a time at a time, which is slow and not elegant. The function below also initializes each item in the array to a given value (0 by default). Is there a reason why there the array.array constructor does not allow to simply specify the number of items that should be allocated? (I do not really care about the contents.) Would this be a worthwhile addition to / modification of the array module? My suggestions is to modify array generation in such a way that you could pass an iterator (as now) as second argument, but if you pass a single integer value, it should be treated as the number of items to allocate. Here is my current workaround (which is slow): def filled_array(typecode, n, value=0, bsize=(1<<22)): """returns a new array with given typecode (eg, "l" for long int, as in the array module) with n entries, initialized to the given value (default 0) """ a = array.array(typecode, [value]*bsize) x = array.array(typecode) r = n while r >= bsize: x.extend(a) r -= bsize x.extend([value]*r) return x

14 20

Implicit string literal concatenation considered harmful?
by Guido van Rossum 14 Mar '18

14 Mar '18

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b'). This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden). Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.) Would it be reasonable to start deprecating this and eventually remove it from the language? -- --Guido van Rossum (python.org/~guido)

51 165

Another pathlib suggestion
by Antony Lee 22 Jun '14

22 Jun '14

Handling of Paths with multiple extensions is currently not so easy with pathlib. Specifically, I don't think there is an easy way to go from "foo.tar.gz" to "foo.ext", because Path.with_suffix only replaces the last suffix. I would therefore like to suggest either 1/ add Path.replace_suffix, such that Path("foo.tar.gz").replace_suffix(".tar.gz", ".ext") == Path("foo.ext") (this would also provide extension-checking capabilities, raising ValueError if the first argument is not a valid suffix of the initial path); or 2/ add a second argument to Path.with_suffix, "n_to_strip" (although perhaps with a better name), defaulting to 1, such that Path("foo.tar.gz").with_suffix(".ext", 0) == Path("foo.tar.gz.ext") Path("foo.tar.gz").with_suffix(".ext", 1) == Path("foo.tar.ext") Path("foo.tar.gz").with_suffix(".ext", 2) == Path("foo.ext") # set n_to_strip to len(path.suffixes) for stripping all of them. Path("foo.tar.gz").with_suffix(".ext", 3) raises a ValueError. Best, Antony

1 1

Implement `itertools.permutations.__getitem__` and `itertools.permutations.index`
by Ram Rachum 09 Jun '14

09 Jun '14

I suggest implementing: - `itertools.permutations.__getitem__`, for getting a permutation by its index number, and possibly also slicing, and - `itertools.permutations.index` for getting the index number of a given permutation. What do you think? Thanks, Ram.

7 14

Make __reduce__ to correspond to __getnewargs_ex__
by Neil Girdhar 08 Jun '14

08 Jun '14

Currently __reduce__<http://docs.python.org/3.4/library/pickle.html#object.__reduce__>returns up to five things: (1) self.__new__ (or a substitute) (2) the result of __getnewargs__<http://docs.python.org/3.4/library/pickle.html#object.__getnewargs__>, which returns a tuple of positional arguments for __new__<http://docs.python.org/3.4/reference/datamodel.html#object.__new__> , (3) the result of __getstate__<http://docs.python.org/3.4/library/pickle.html#object.__getstate__>, which returns an object to be passed to __setstate__<http://docs.python.org/3.4/library/pickle.html#object.__setstate__> (4) an iterator of values for appending to a sequence (5) an iterator of key-value pairs for setting on a string. Python 3.4 added the very useful (for me) __getnewargs_ex__<http://docs.python.org/3.4/library/pickle.html#object.__getnewargs_ex__>, which returns a pair: (1) a tuple of positional arguments for __new__<http://docs.python.org/3.4/reference/datamodel.html#object.__new__> (2) a dict of keyword arguments for __new__<http://docs.python.org/3.4/reference/datamodel.html#object.__new__> Therefore, I am proposing that __reduce__ return somehow these keyword arguments for __new__. Best, Neil

3 7

Disable all peephole optimizations
by Ned Batchelder 08 Jun '14

08 Jun '14

** The problem A long-standing problem with CPython is that the peephole optimizer cannot be completely disabled. Normally, peephole optimization is a good thing, it improves execution speed. But in some situations, like coverage testing, it's more important to be able to reason about the code's execution. I propose that we add a way to completely disable the optimizer. To demonstrate the problem, here is continue.py: a = b = c = 0 for n in range(100): if n % 2: if n % 4: a += 1 continue else: b += 1 c += 1 assert a == 50 and b == 50 and c == 50 If you execute "python3.4 -m trace -c -m continue.py", it produces this continue.cover file: 1: a = b = c = 0 101: for n in range(100): 100: if n % 2: 50: if n % 4: 50: a += 1 >>>>>> continue else: 50: b += 1 50: c += 1 1: assert a == 50 and b == 50 and c == 50 This indicates that the continue line is not executed. It's true: the byte code for that statement is not executed, because the peephole optimizer has removed the jump to the jump. But in reasoning about the code, the continue statement is clearly part of the semantics of this program. If you remove the statement, the program will run differently. If you had to explain this code to a learner, you would of course describe the continue statement as part of the execution. So the trace output does not match our (correct) understanding of the program. The reason we are running trace (or coverage.py) in the first place is to learn something about our code, but it is misleading us. The peephole optimizer is interfering with our ability to reason about the code. We need a way to disable the optimizer so that this won't happen. This type of control is well-known in C compilers, for the same reasons: when running code, optimization is good for speed; when reasoning about code, optimization gets in the way. More details are in http://bugs.python.org/issue2506, which also includes previous discussion of the idea. This has come up on Python-Dev, and Guido seemed supportive: https://mail.python.org/pipermail/python-dev/2012-December/123099.html . ** Implementation Although it may seem like a big change to be able to disable the optimizer, the heart of it is quite simple. In compile.c is the only call to PyCode_Optimize. That function takes a string of bytecode and returns another. If we skip that call, the peephole optimizer is disabled. ** User Interface Unfortunately, the -O command-line switch does not lend itself to a new value that means, "less optimization than the default." I propose a new switch -P, to control the peephole optimizer, with a value of -P0 meaning no optimization at all. The PYTHONPEEPHOLE environment variable would also control the option. There are about a dozen places internal to CPython where optimization level is indicated with an integer, for example, in Py_CompileStringObject. Those uses also don't allow for new values indicating less optimization than the default: 0 and -1 already have meanings. Unless we want to start using -2 for less that the default. I'm not sure we need to provide for those values, or if the PYTHONPEEPHOLE environment variable provides enough control. ** Ramifications This switch makes no changes to the semantics of Python programs, although clearly, if you are tracing a program, the exact sequence of lines and bytecodes will be different (this is the whole point). In the ticket, one objection raised is that providing this option will complicate testing, and that optimization is a difficult enough thing to get right as it is. I disagree, I think providing this option will help test the optimizer, because it will give us a way to test that code runs the same with and without the optimizer. This gives us a tool to use to demonstrate that the optimizer isn't changing the behavior of programs.

17 41

Make Python code read-only
by Victor Stinner 05 Jun '14

05 Jun '14

Hi, I'm trying to find the best option to make CPython faster. I would like to discuss here a first idea of making the Python code read-only to allow new optimizations. Make Python code read-only ========================== I propose to add an option to Python to make the code read-only. In this mode, module namespace, class namespace and function attributes become read-only. It is still be possible to add a "__readonly__ = False" marker to keep a module, a class and/or a function modifiable. I chose to make the code read-only by default instead of the opposite. In my test, almost all code can be made read-only without major issue, few code requires the "__readonly__ = False" marker. A module is only made read-only by importlib after the module is loaded. The module is stil modifiable when code is executed until importlib has set all its attributes (ex: __loader__). I have a proof of concept: a fork of Python 3.5 making code read-only if the PYTHONREADONLY environment variable is set to 1. Commands to try it: hg clone http://hg.python.org/sandbox/readonly cd readonly && ./configure && make PYTHONREADONLY=1 ./python -c 'import os; os.x = 1' # ValueError: read-only dictionary Status of the standard library (Lib/*.py): 139 modules are read-only, 25 are modifiable. Except of the sys module, all modules writen in C are read-only. I'm surprised that so few code rely on the ability to modify everything. Most of the code can be read-only. Optimizations possible when the code is read-only ================================================= * Inline calls to functions. * Replace calls to pure functions (without side effect) with the result. For example, len("abc") can be replaced with 3. * Constants can be replaced with their values (at least for simple types like bytes, int and str). It is for example possible to implement these optimizations by manipulating the Abstract Syntax Tree (AST) during the compilation from the source code to bytecode. See my astoptimizer project which already implements similar optimizations: https://bitbucket.org/haypo/astoptimizer More optimizations ================== My main motivation to make code read-only is to specialize a function: optimize a function for a specific environment (type of parameters, external symbols like other functions, etc). Checking the type of parameters can be fast (especially when implemented in C), but it would be expensive to check that all global variables used in the function were not modified since the function has been "specialized". For example, if os.path.isabs(path) is called: you have to check that "os.path" and "os.path.isabs" attributes were not modified and that the isabs() was not modified. If we know that globals are read-only, these checks are no more needed and so it becomes cheap to decide if the specialized function can be used or not. It becomes possible to "learn" types (trace the execution of the application, and then compile for the recorded types). Knowing the type of function parameters, result and local variables opens an interesting class of new optimizations, but I prefer to discuss this later, after discussing the idea of making the code read-only. One point remains unclear to me. There is a short time window between a module is loaded and the module is made read-only. During this window, we cannot rely on the read-only property of the code. Specialized code cannot be used safetly before the module is known to be read-only. I don't know yet how the switch from "slow" code to optimized code should be implemented. Issues with read-only code ========================== * Currently, it's not possible to allow again to modify a module, class or function to keep my implementation simple. With a registry of callbacks, it may be possible to enable again modification and call code to disable optimizations. * PyPy implements this but thanks to its JIT, it can optimize again the modified code during the execution. Writing a JIT is very complex, I'm trying to find a compromise between the fast PyPy and the slow CPython. Add a JIT to CPython is out of my scope, it requires too much modifications of the code. * With read-only code, monkey-patching cannot be used anymore. It's annoying to run tests. An obvious solution is to disable read-only mode to run tests, which can be seen as unsafe since tests are usually used to trust the code. * The sys module cannot be made read-only because modifying sys.stdout and sys.ps1 is a common use case. * The warnings module tries to add a __warningregistry__ global variable in the module where the warning was emited to not repeat warnings that should only be emited once. The problem is that the module namespace is made read-only before this variable is added. A workaround would be to maintain these dictionaries in the warnings module directly, but it becomes harder to clear the dictionary when a module is unloaded or reloaded. Another workaround is to add __warningregistry__ before making a module read-only. * Lazy initialization of module variables does not work anymore. A workaround is to use a mutable type. It can be a dict used as a namespace for module modifiable variables. * The interactive interpreter sets a "_" variable in the builtins namespace. I have no workaround for this. The "_" variable is no more created in read-only mode. Don't run the interactive interpreter in read-only mode. * It is not possible yet to make the namespace of packages read-only. For example, "import encodings.utf_8" adds the symbol "utf_8" to the encodings namespace. A workaround is to load all submodules before making the namespace read-only. This cannot be done for some large modules. For example, the encodings has a lot of submodules, only a few are needed. Read the documentation for more information: http://hg.python.org/sandbox/readonly/file/tip/READONLY.txt More optimizations ================== See my notes for all ideas to optimize CPython: http://haypo-notes.readthedocs.org/faster_cpython.html I explain there why I prefer to optimize CPython instead of working on PyPy or another Python implementation like Pyston, Numba or similar projects. Victor

14 28

Faster PyArg_ParseTupleAndKeywords kwargs
by dw+python-ideas＠hmmz.org 23 May '14

23 May '14

Early while working on py-lmdb I noticed that a huge proportion of runtime was being lost to PyArg_ParseTupleAndKeywords, and so I subsequently wrote a specialization for this extension module. In the current code[0], parse_args() is much faster than ParseTupleAndKeywords, responsible for a doubling of performance in several of the library's faster code paths (e.g. Cursor.put(append=True)). Ever since adding the rewrite I've wanted to go back and either remove it or at least reduce the amount of custom code, but it seems there really isn't a better approach to fast argument parsing using the bare Python C API at the moment. [0] https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L833 In the append=True path, parse_args() yields a method that can complete 1.1m insertions/sec on my crappy Core 2 laptop, compared to 592k/sec using the same method rewritten with PyArg_ParseTupleAndKeywords. Looking to other 'fast' projects for precedent, and studying Cython's output in particular, it seems that Cython completely ignores the standard APIs and expends a huge amount of .text on using almost every imagineable C performance trick to speed up parsing (actually Cython's output is a sheer marvel of trickery, it's worth study). So it's clear the standard APIs are somewhat non-ideal, and those concerned with performance are taking other approaches. ParseTupleAndKeywords is competitive for positional arguments (1.2m/sec vs 1.5m/sec for "Cursor.put(k, v)"), but things go south when a kwarg dict is provided. The primary goal of parse_args() was to avoid the continous temporary allocations and hashing done by PyArg_ParseTupleAndKeywords, by way of PyDict_GetItemString(), which invokes PyString_FromString() internally, which subsequently causes alloc / strlen() and memcpy(), one for each possible kwarg, on every function call. The rewrite has been hacked over time, and honestly I'm not sure which bits are responsible for the speed improvement, and which are totally redundant. The tricks are: * Intern keyword arg strings once at startup, avoiding the temporary PyString creation and also causing their hash() to be cached across calls. This uses an incredibly ugly pair of enum/const char *[] static globals.[3] [3] https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L79 * Use a per-function 'static const' array of structures to describe the expected set of arguments. Since these arrays are built at compile time, they cannot directly reference the runtime-generated interned PyStrings, thus the use of an enum. A nice side effect of the array's contents being purely small integer is that each array element is small and thus quite cache-efficient. In the current code array elements are 4 bytes each. * Avoid use of variable-length argument lists. I'm not sure if this helps at all, but certainly it simplifies the parsing code and makes the call sites much more compact. Instead of a va_arg list of destination pointers, parsed output is represented as a per-function structure[1][2] definition, whose offsets are encoded into the above argspec array, and at build time. [1] https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L1265 [2] https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L704 This might hurt the compiler's ability to optimize the placement of what were previouly small stack variables (e.g. I'm not sure if it prevents the compiler making more use of registers). In any case the overall result is much faster than before. And most recently, giving a further 20% boost to append=True: * Cache a dict that maps interned kwarg -> argspec array offset, allowing the per-call kwarg dict to be iterated, and causing only one hash lookup per supplied kwarg. Prior to the cache, presence of kwargs would cause one hash lookup per argspec entry (e.g. potentially 15 lookups instead of 1 or 2). It's obvious this approach isn't generally useful, and looking at the CPython source we can see the interning trick is already known, and presumably not exposed in the CPython API because the method is quite ugly. Still it seems there is room to improve the public API to include something like this interning trick, and that's what this mail is about. My initial thought is for a horribly macro-heavy API like: PyObject *my_func(PyObject *self, PyObject *args, PyObject *kwargs) { Py_ssize_t foo; const char *some_buf; PyObject *list; Py_BEGIN_ARGS PY_ARG("foo", PY_ARG_SSIZE_T, NULL, PY_ARG_REQUIRED), PY_ARG("some_buf", PY_ARG_BUFFER, NULL, PY_ARG_REQUIRED), PY_ARG("list", PY_ARG_OBJECT, &PyList_Type, NULL, 0) Py_END_ARGS if(Py_PARSE_ARGS(args, kwds, &foo, &some_buf, &list)) { return NULL; } /* do stuff */ } Where: struct py_arg_info; /* Opaque */ struct py_arg_spec { const char *name; enum { ... } type; PyTypeObject *type; int options; }; #define PY_BEGIN_ARGS \ static struct py_arg_info *_py_arg_info; \ if(! _py_arg_info) { \ static const struct py_arg_spec _py_args[] = { #define PY_END_ARGS \ }; \ _Py_InitArgInfo(&_py_arg_info, _py_args, \ sizeof _py_args / sizeof _py_args[0]); \ } #define PY_ARG(name, type, type2, opts) {name, type, type2, opts} #define Py_PARSE_ARGS(a, k, ...) \ _Py_ParseArgsFromInfo(&_py_arg_info, a, k, _VA_ARG_); Here some implementation-internal py_arg_info structure is built up on first function invocation, producing the cached mapping of argument keywords to array index, and storing a reference to the py_arg_spec array, or some version of it that has been internally transformed to a more useful format. You may notice this depends on va_arg macros, which breaks at least Visual Studio, so at the very least that part is broken. The above also doesn't deal with all the cases supported by the existing PyArg_ routines, such as setting the function name and custom error message, or unpacking tuples (is this still even supported in Python 3?) Another approach might be to use a PyArg_ParseTupleAndKeywords-alike API, so that something like this was possible: static PyObject * my_method(PyObject *self, PyObject *args, *PyObject *kwds) { Py_ssize_t foo; const char *some_buf; Py_ssize_t some_buf_size; PyObject *list; static PyArgInfo arg_info; static char *keywords[] = { "foo", "some_buf", "list", NULL }; if(! PyArg_FastParse(&arg_info, args, kwds, "ns#|O!", keywords, &foo, &some_buf, &some_buf_size, &PyList_Type, &list)) { return NULL; } /* do stuff */ } In that case that API is very familiar, and PyArg_FastParse() builds the cache on first invocation itself, but the supplied va_list is full of noise that needs to be carefully skipped somehow. The work involved in doing the skipping might introduce complexity that slows things down all over again. Any thoughts on a better API? Is there a need here? I'm obviously not the first to notice PyArg_ParseTupleAndKeywords is slow, and so I wonder how many people have sighed and brushed off the fact their module is slower than it could be. David

3 7

Re: [Python-ideas] Python-ideas Digest, Vol 90, Issue 30
by Raymond Hettinger 23 May '14

23 May '14

On May 21, 2014, at 1:21 PM, python-ideas-request(a)python.org wrote: > I propose that we add a way to completely disable the > optimizer. I think this opens a can of worms that is better left closed. * We will have to start running tests both with and without the switch turned on for example (because you're exposing yet another way to run Python with different code). * Over time, I expect that some of the functionality of the peepholer is going to be moved upstream into AST transformations you will have even less ability switch something on-and-off. * The code in-place has been in the code for over a decade and the tracker item has languished for years. That provides some evidence that the "need" here is very small. * I sympathize with "there is an irritating dimple in coverage.py" but that hasn't actually impaired its usability beyond creating a curiosity. Using that a reason to add a new CPython-only command-line switch seems like having the tail wag the dog. * As the other implementations of Python continue to develop, I don't think we should tie their hands with respect to code generation. * Ideally, the peepholer should be thought of as part of the code generation. As compilation improves over time, it should start to generate the same code as we're getting now. It probably isn't wise to expose the implementation detail that the constant folding and jump tweaks are done in a separate second pass. * Mostly, I don't want to open a new crack in the Python veneer where people are switching on and off two different streams of code generation (currently, there is one way to do it). I can't fully articulate my instincts here, but I think we'll regret opening this door when we didn't have to. That being said, I know how the politics of python-ideas works and I expect that my thoughts on the subject will quickly get buried by a discussion of which lettercode should be used for the command-line switch. Hopefully, some readers will focus on the question of whether it is worth it. Others might look at ways to improve the existing code (without an off-switch) so that the continue-statement jump-to-jump shows-up in your coverage tool. IMO, adding a new command-line switch is a big deal (we should do it very infrequently, limit it to things with a big payoff, and think about whether there are any downsides). Personally, I don't see any big wins here and have a sense that there are downsides that would make us regret exposing alternate code generation. Raymond

18 55