There's a whole matrix of these and I'm wondering why the matrix is
currently sparse rather than implementing them all. Or rather, why we
can't stack them as:
class foo(object):
@classmethod
@property
def bar(cls, ...):
...
Essentially the permutation are, I think:
{'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable
attribute}.
concreteness
implicit first arg
type
name
comments
{unadorned}
{unadorned}
method
def foo():
exists now
{unadorned} {unadorned} property
@property
exists now
{unadorned} {unadorned} non-callable attribute
x = 2
exists now
{unadorned} static
method @staticmethod
exists now
{unadorned} static property @staticproperty
proposing
{unadorned} static non-callable attribute {degenerate case -
variables don't have arguments}
unnecessary
{unadorned} class
method @classmethod
exists now
{unadorned} class property @classproperty or @classmethod;@property
proposing
{unadorned} class non-callable attribute {degenerate case - variables
don't have arguments}
unnecessary
abc.abstract {unadorned} method @abc.abstractmethod
exists now
abc.abstract {unadorned} property @abc.abstractproperty
exists now
abc.abstract {unadorned} non-callable attribute
@abc.abstractattribute or @abc.abstract;@attribute
proposing
abc.abstract static method @abc.abstractstaticmethod
exists now
abc.abstract static property @abc.staticproperty
proposing
abc.abstract static non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
abc.abstract class method @abc.abstractclassmethod
exists now
abc.abstract class property @abc.abstractclassproperty
proposing
abc.abstract class non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
I think the meanings of the new ones are pretty straightforward, but in
case they are not...
@staticproperty - like @property only without an implicit first
argument. Allows the property to be called directly from the class
without requiring a throw-away instance.
@classproperty - like @property, only the implicit first argument to the
method is the class. Allows the property to be called directly from the
class without requiring a throw-away instance.
@abc.abstractattribute - a simple, non-callable variable that must be
overridden in subclasses
@abc.abstractstaticproperty - like @abc.abstractproperty only for
@staticproperty
@abc.abstractclassproperty - like @abc.abstractproperty only for
@classproperty
--rich
At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
allocate.
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
"""
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
x.extend(a)
r -= bsize
x.extend([value]*r)
return x
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
inside macros.)
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--
--Guido van Rossum (python.org/~guido)
Handling of Paths with multiple extensions is currently not so easy with
pathlib. Specifically, I don't think there is an easy way to go from
"foo.tar.gz" to "foo.ext", because Path.with_suffix only replaces the last
suffix.
I would therefore like to suggest either
1/ add Path.replace_suffix, such that
Path("foo.tar.gz").replace_suffix(".tar.gz", ".ext") == Path("foo.ext")
(this would also provide extension-checking capabilities, raising
ValueError if the first argument is not a valid suffix of the initial
path); or
2/ add a second argument to Path.with_suffix, "n_to_strip" (although
perhaps with a better name), defaulting to 1, such that
Path("foo.tar.gz").with_suffix(".ext", 0) == Path("foo.tar.gz.ext")
Path("foo.tar.gz").with_suffix(".ext", 1) == Path("foo.tar.ext")
Path("foo.tar.gz").with_suffix(".ext", 2) == Path("foo.ext") # set
n_to_strip to len(path.suffixes) for stripping all of them.
Path("foo.tar.gz").with_suffix(".ext", 3) raises a ValueError.
Best,
Antony
I suggest implementing:
- `itertools.permutations.__getitem__`, for getting a permutation by its
index number, and possibly also slicing, and
- `itertools.permutations.index` for getting the index number of a given
permutation.
What do you think?
Thanks,
Ram.
** The problem
A long-standing problem with CPython is that the peephole optimizer
cannot be completely disabled. Normally, peephole optimization is a
good thing, it improves execution speed. But in some situations, like
coverage testing, it's more important to be able to reason about the
code's execution. I propose that we add a way to completely disable the
optimizer.
To demonstrate the problem, here is continue.py:
a = b = c = 0
for n in range(100):
if n % 2:
if n % 4:
a += 1
continue
else:
b += 1
c += 1
assert a == 50 and b == 50 and c == 50
If you execute "python3.4 -m trace -c -m continue.py", it produces this
continue.cover file:
1: a = b = c = 0
101: for n in range(100):
100: if n % 2:
50: if n % 4:
50: a += 1
>>>>>> continue
else:
50: b += 1
50: c += 1
1: assert a == 50 and b == 50 and c == 50
This indicates that the continue line is not executed. It's true: the
byte code for that statement is not executed, because the peephole
optimizer has removed the jump to the jump. But in reasoning about the
code, the continue statement is clearly part of the semantics of this
program. If you remove the statement, the program will run
differently. If you had to explain this code to a learner, you would of
course describe the continue statement as part of the execution. So the
trace output does not match our (correct) understanding of the program.
The reason we are running trace (or coverage.py) in the first place is
to learn something about our code, but it is misleading us. The peephole
optimizer is interfering with our ability to reason about the code. We
need a way to disable the optimizer so that this won't happen. This
type of control is well-known in C compilers, for the same reasons: when
running code, optimization is good for speed; when reasoning about code,
optimization gets in the way.
More details are in http://bugs.python.org/issue2506, which also
includes previous discussion of the idea.
This has come up on Python-Dev, and Guido seemed supportive:
https://mail.python.org/pipermail/python-dev/2012-December/123099.html .
** Implementation
Although it may seem like a big change to be able to disable the
optimizer, the heart of it is quite simple. In compile.c is the only
call to PyCode_Optimize. That function takes a string of bytecode and
returns another. If we skip that call, the peephole optimizer is disabled.
** User Interface
Unfortunately, the -O command-line switch does not lend itself to a new
value that means, "less optimization than the default." I propose a new
switch -P, to control the peephole optimizer, with a value of -P0
meaning no optimization at all. The PYTHONPEEPHOLE environment variable
would also control the option.
There are about a dozen places internal to CPython where optimization
level is indicated with an integer, for example, in
Py_CompileStringObject. Those uses also don't allow for new values
indicating less optimization than the default: 0 and -1 already have
meanings. Unless we want to start using -2 for less that the default.
I'm not sure we need to provide for those values, or if the
PYTHONPEEPHOLE environment variable provides enough control.
** Ramifications
This switch makes no changes to the semantics of Python programs,
although clearly, if you are tracing a program, the exact sequence of
lines and bytecodes will be different (this is the whole point).
In the ticket, one objection raised is that providing this option will
complicate testing, and that optimization is a difficult enough thing to
get right as it is. I disagree, I think providing this option will help
test the optimizer, because it will give us a way to test that code runs
the same with and without the optimizer. This gives us a tool to use to
demonstrate that the optimizer isn't changing the behavior of programs.
Hi,
I'm trying to find the best option to make CPython faster. I would
like to discuss here a first idea of making the Python code read-only
to allow new optimizations.
Make Python code read-only
==========================
I propose to add an option to Python to make the code read-only. In
this mode, module namespace, class namespace and function attributes
become read-only. It is still be possible to add a "__readonly__ =
False" marker to keep a module, a class and/or a function modifiable.
I chose to make the code read-only by default instead of the opposite.
In my test, almost all code can be made read-only without major issue,
few code requires the "__readonly__ = False" marker.
A module is only made read-only by importlib after the module is
loaded. The module is stil modifiable when code is executed until
importlib has set all its attributes (ex: __loader__).
I have a proof of concept: a fork of Python 3.5 making code read-only
if the PYTHONREADONLY environment variable is set to 1. Commands to
try it:
hg clone http://hg.python.org/sandbox/readonly
cd readonly && ./configure && make
PYTHONREADONLY=1 ./python -c 'import os; os.x = 1'
# ValueError: read-only dictionary
Status of the standard library (Lib/*.py): 139 modules are read-only,
25 are modifiable. Except of the sys module, all modules writen in C
are read-only.
I'm surprised that so few code rely on the ability to modify
everything. Most of the code can be read-only.
Optimizations possible when the code is read-only
=================================================
* Inline calls to functions.
* Replace calls to pure functions (without side effect) with the
result. For example, len("abc") can be replaced with 3.
* Constants can be replaced with their values (at least for simple
types like bytes, int and str).
It is for example possible to implement these optimizations by
manipulating the Abstract Syntax Tree (AST) during the compilation
from the source code to bytecode. See my astoptimizer project which
already implements similar optimizations:
https://bitbucket.org/haypo/astoptimizer
More optimizations
==================
My main motivation to make code read-only is to specialize a function:
optimize a function for a specific environment (type of parameters,
external symbols like other functions, etc). Checking the type of
parameters can be fast (especially when implemented in C), but it
would be expensive to check that all global variables used in the
function were not modified since the function has been "specialized".
For example, if os.path.isabs(path) is called: you have to check that
"os.path" and "os.path.isabs" attributes were not modified and that
the isabs() was not modified. If we know that globals are read-only,
these checks are no more needed and so it becomes cheap to decide if
the specialized function can be used or not.
It becomes possible to "learn" types (trace the execution of the
application, and then compile for the recorded types). Knowing the
type of function parameters, result and local variables opens an
interesting class of new optimizations, but I prefer to discuss this
later, after discussing the idea of making the code read-only.
One point remains unclear to me. There is a short time window between
a module is loaded and the module is made read-only. During this
window, we cannot rely on the read-only property of the code.
Specialized code cannot be used safetly before the module is known to
be read-only. I don't know yet how the switch from "slow" code to
optimized code should be implemented.
Issues with read-only code
==========================
* Currently, it's not possible to allow again to modify a module,
class or function to keep my implementation simple. With a registry of
callbacks, it may be possible to enable again modification and call
code to disable optimizations.
* PyPy implements this but thanks to its JIT, it can optimize again
the modified code during the execution. Writing a JIT is very complex,
I'm trying to find a compromise between the fast PyPy and the slow
CPython. Add a JIT to CPython is out of my scope, it requires too much
modifications of the code.
* With read-only code, monkey-patching cannot be used anymore. It's
annoying to run tests. An obvious solution is to disable read-only
mode to run tests, which can be seen as unsafe since tests are usually
used to trust the code.
* The sys module cannot be made read-only because modifying sys.stdout
and sys.ps1 is a common use case.
* The warnings module tries to add a __warningregistry__ global
variable in the module where the warning was emited to not repeat
warnings that should only be emited once. The problem is that the
module namespace is made read-only before this variable is added. A
workaround would be to maintain these dictionaries in the warnings
module directly, but it becomes harder to clear the dictionary when a
module is unloaded or reloaded. Another workaround is to add
__warningregistry__ before making a module read-only.
* Lazy initialization of module variables does not work anymore. A
workaround is to use a mutable type. It can be a dict used as a
namespace for module modifiable variables.
* The interactive interpreter sets a "_" variable in the builtins
namespace. I have no workaround for this. The "_" variable is no more
created in read-only mode. Don't run the interactive interpreter in
read-only mode.
* It is not possible yet to make the namespace of packages read-only.
For example, "import encodings.utf_8" adds the symbol "utf_8" to the
encodings namespace. A workaround is to load all submodules before
making the namespace read-only. This cannot be done for some large
modules. For example, the encodings has a lot of submodules, only a
few are needed.
Read the documentation for more information:
http://hg.python.org/sandbox/readonly/file/tip/READONLY.txt
More optimizations
==================
See my notes for all ideas to optimize CPython:
http://haypo-notes.readthedocs.org/faster_cpython.html
I explain there why I prefer to optimize CPython instead of working on
PyPy or another Python implementation like Pyston, Numba or similar
projects.
Victor
Early while working on py-lmdb I noticed that a huge proportion of
runtime was being lost to PyArg_ParseTupleAndKeywords, and so I
subsequently wrote a specialization for this extension module.
In the current code[0], parse_args() is much faster than
ParseTupleAndKeywords, responsible for a doubling of performance in
several of the library's faster code paths (e.g.
Cursor.put(append=True)). Ever since adding the rewrite I've wanted to
go back and either remove it or at least reduce the amount of custom
code, but it seems there really isn't a better approach to fast argument
parsing using the bare Python C API at the moment.
[0] https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L833
In the append=True path, parse_args() yields a method that can complete
1.1m insertions/sec on my crappy Core 2 laptop, compared to 592k/sec
using the same method rewritten with PyArg_ParseTupleAndKeywords.
Looking to other 'fast' projects for precedent, and studying Cython's
output in particular, it seems that Cython completely ignores the
standard APIs and expends a huge amount of .text on using almost every
imagineable C performance trick to speed up parsing (actually Cython's
output is a sheer marvel of trickery, it's worth study). So it's clear
the standard APIs are somewhat non-ideal, and those concerned with
performance are taking other approaches.
ParseTupleAndKeywords is competitive for positional arguments (1.2m/sec
vs 1.5m/sec for "Cursor.put(k, v)"), but things go south when a kwarg
dict is provided.
The primary goal of parse_args() was to avoid the continous temporary
allocations and hashing done by PyArg_ParseTupleAndKeywords, by way of
PyDict_GetItemString(), which invokes PyString_FromString() internally,
which subsequently causes alloc / strlen() and memcpy(), one for each
possible kwarg, on every function call.
The rewrite has been hacked over time, and honestly I'm not sure which
bits are responsible for the speed improvement, and which are totally
redundant. The tricks are:
* Intern keyword arg strings once at startup, avoiding the temporary
PyString creation and also causing their hash() to be cached across
calls. This uses an incredibly ugly pair of enum/const char *[]
static globals.[3]
[3] https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L79
* Use a per-function 'static const' array of structures to describe the
expected set of arguments. Since these arrays are built at compile
time, they cannot directly reference the runtime-generated interned
PyStrings, thus the use of an enum.
A nice side effect of the array's contents being purely small integer
is that each array element is small and thus quite cache-efficient.
In the current code array elements are 4 bytes each.
* Avoid use of variable-length argument lists. I'm not sure if this
helps at all, but certainly it simplifies the parsing code and makes
the call sites much more compact.
Instead of a va_arg list of destination pointers, parsed output is
represented as a per-function structure[1][2] definition, whose
offsets are encoded into the above argspec array, and at build time.
[1] https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L1265
[2] https://github.com/dw/py-lmdb/blob/master/lmdb/cpython.c#L704
This might hurt the compiler's ability to optimize the placement of
what were previouly small stack variables (e.g. I'm not sure if it
prevents the compiler making more use of registers). In any case the
overall result is much faster than before.
And most recently, giving a further 20% boost to append=True:
* Cache a dict that maps interned kwarg -> argspec array offset,
allowing the per-call kwarg dict to be iterated, and causing only one
hash lookup per supplied kwarg. Prior to the cache, presence of
kwargs would cause one hash lookup per argspec entry (e.g.
potentially 15 lookups instead of 1 or 2).
It's obvious this approach isn't generally useful, and looking at the
CPython source we can see the interning trick is already known, and
presumably not exposed in the CPython API because the method is quite
ugly. Still it seems there is room to improve the public API to include
something like this interning trick, and that's what this mail is about.
My initial thought is for a horribly macro-heavy API like:
PyObject *my_func(PyObject *self, PyObject *args, PyObject *kwargs)
{
Py_ssize_t foo;
const char *some_buf;
PyObject *list;
Py_BEGIN_ARGS
PY_ARG("foo", PY_ARG_SSIZE_T, NULL, PY_ARG_REQUIRED),
PY_ARG("some_buf", PY_ARG_BUFFER, NULL, PY_ARG_REQUIRED),
PY_ARG("list", PY_ARG_OBJECT, &PyList_Type, NULL, 0)
Py_END_ARGS
if(Py_PARSE_ARGS(args, kwds, &foo, &some_buf, &list)) {
return NULL;
}
/* do stuff */
}
Where:
struct py_arg_info; /* Opaque */
struct py_arg_spec {
const char *name;
enum { ... } type;
PyTypeObject *type;
int options;
};
#define PY_BEGIN_ARGS \
static struct py_arg_info *_py_arg_info; \
if(! _py_arg_info) { \
static const struct py_arg_spec _py_args[] = {
#define PY_END_ARGS \
}; \
_Py_InitArgInfo(&_py_arg_info, _py_args, \
sizeof _py_args / sizeof _py_args[0]); \
}
#define PY_ARG(name, type, type2, opts) {name, type, type2, opts}
#define Py_PARSE_ARGS(a, k, ...) \
_Py_ParseArgsFromInfo(&_py_arg_info, a, k, _VA_ARG_);
Here some implementation-internal py_arg_info structure is built up on
first function invocation, producing the cached mapping of argument
keywords to array index, and storing a reference to the py_arg_spec
array, or some version of it that has been internally transformed to a
more useful format.
You may notice this depends on va_arg macros, which breaks at least
Visual Studio, so at the very least that part is broken.
The above also doesn't deal with all the cases supported by the existing
PyArg_ routines, such as setting the function name and custom error
message, or unpacking tuples (is this still even supported in Python 3?)
Another approach might be to use a PyArg_ParseTupleAndKeywords-alike
API, so that something like this was possible:
static PyObject *
my_method(PyObject *self, PyObject *args, *PyObject *kwds)
{
Py_ssize_t foo;
const char *some_buf;
Py_ssize_t some_buf_size;
PyObject *list;
static PyArgInfo arg_info;
static char *keywords[] = {
"foo", "some_buf", "list", NULL
};
if(! PyArg_FastParse(&arg_info, args, kwds, "ns#|O!", keywords,
&foo, &some_buf, &some_buf_size,
&PyList_Type, &list)) {
return NULL;
}
/* do stuff */
}
In that case that API is very familiar, and PyArg_FastParse() builds the
cache on first invocation itself, but the supplied va_list is full of
noise that needs to be carefully skipped somehow. The work involved in
doing the skipping might introduce complexity that slows things down all
over again.
Any thoughts on a better API? Is there a need here? I'm obviously not
the first to notice PyArg_ParseTupleAndKeywords is slow, and so I wonder
how many people have sighed and brushed off the fact their module is
slower than it could be.
David
On May 21, 2014, at 1:21 PM, python-ideas-request(a)python.org wrote:
> I propose that we add a way to completely disable the
> optimizer.
I think this opens a can of worms that is better left closed.
* We will have to start running tests both with and without the switch
turned on for example (because you're exposing yet another way to
run Python with different code).
* Over time, I expect that some of the functionality of the peepholer
is going to be moved upstream into AST transformations you will
have even less ability switch something on-and-off.
* The code in-place has been in the code for over a decade and
the tracker item has languished for years. That provides some
evidence that the "need" here is very small.
* I sympathize with "there is an irritating dimple in coverage.py"
but that hasn't actually impaired its usability beyond creating a
curiosity. Using that a reason to add a new CPython-only
command-line switch seems like having the tail wag the dog.
* As the other implementations of Python continue to develop,
I don't think we should tie their hands with respect to code
generation.
* Ideally, the peepholer should be thought of as part of the code
generation. As compilation improves over time, it should start
to generate the same code as we're getting now. It probably
isn't wise to expose the implementation detail that the constant
folding and jump tweaks are done in a separate second pass.
* Mostly, I don't want to open a new crack in the Python veneer
where people are switching on and off two different streams of
code generation (currently, there is one way to do it). I can't
fully articulate my instincts here, but I think we'll regret opening
this door when we didn't have to.
That being said, I know how the politics of python-ideas works
and I expect that my thoughts on the subject will quickly get
buried by a discussion of which lettercode should be used for the
command-line switch.
Hopefully, some readers will focus on the question of whether
it is worth it. Others might look at ways to improve the existing
code (without an off-switch) so that the continue-statement
jump-to-jump shows-up in your coverage tool.
IMO, adding a new command-line switch is a big deal (we should
do it very infrequently, limit it to things with a big payoff, and
think about whether there are any downsides). Personally, I don't
see any big wins here and have a sense that there are downsides
that would make us regret exposing alternate code generation.
Raymond