Hi,
It has taken a fair amount of work but I have mostly gotten tagged
pointers working for small ints (currently 63 bits on 64-bit
platform). The configure option is --with-fixedint. I'm trying to
release early and often.
The whole test suite runs without crashing, which feels like some
kind of milestone to me. The following tests still fail:
test_ctypes test_fcntl test_fileio test_gdb test_inspect test_io
test_repl test_socket test_sqlite test_unicode test_userstring
Latest code in here:
https://github.com/nascheme/cpython/tree/tagged_int
Unfortunately, there is a net slowdown with the fixedint option
enabled. Full PGO benchmarks from pyperformance are below. I'm
hoping there is still some good work to be done reducing the number
of times fixed ints need to heap allocated. I suspect that is why
the pickle_list and pickle_dict are slower. I should also try to
measure the memory usage difference. The fixedint version should
use less RAM.
Here is a Linux perf report for the pickle_list benchmark:
http://python.ca/nas/python/perf-fixedint-pickle-list.txt
The addition of the extra test + jmp instructions to INCREF/DECREF
are hurting a fair bit. I'm not sure there is anything to be done
there. Based on the Linux perf results, I have the suspicion that
the extra instructions for INCREF/DECREF/_Py_TYPE are blowing up the
size of _PyEval_EvalFrameDefault. I need to investigate that more.
BTW, Linux perf is amazing. Anyone who does low-level optimization
work should study it.
I did consider trying to use a second tag for short strings. Not
sure it will help too much as some quick analysis shows that only
25% of strings used by PyDict_GetItem are short enough to fit.
This morning I dreamed up new idea: analyze normal Python programs
and build a list of strings commonly used for PyDict_GetItem. They
will be strings like "self", builtin functions names, etc. Then use
a tagged pointer to hold these common strings. I.e. a tag to denote
a string (or interned symbol, in Lisp speak) and an integer which is
the offset into the fixed array of interned strings. The savings
would have to come from avoiding the INCREF/DECREF accounting of
refcounts on those strings. Instead of fixed set of strings,
perhaps we could make the intern process dynamically allocate the
tag IDs. We could have a specialized lookdict that works for dicts
containing only interned strings.
$ ./python -m perf compare_to -G
../cpython-profile-tagged-off/base4.json fixedint5.json --min-spee
d 5
Slower (24):
- pickle_list: 3.06 us +- 0.04 us -> 3.74 us +- 0.03 us: 1.22x
slower (+22%)
- pickle_dict: 22.2 us +- 0.1 us -> 26.2 us +- 0.2 us: 1.18x slower
(+18%)
- raytrace: 501 ms +- 5 ms -> 565 ms +- 6 ms: 1.13x slower (+13%)
- crypto_pyaes: 113 ms +- 1 ms -> 126 ms +- 0 ms: 1.12x slower
(+12%)
- logging_silent: 210 ns +- 4 ns -> 234 ns +- 3 ns: 1.11x slower
(+11%)
- telco: 6.00 ms +- 0.09 ms -> 6.68 ms +- 0.14 ms: 1.11x slower
(+11%)
- float: 111 ms +- 2 ms -> 123 ms +- 1 ms: 1.11x slower (+11%)
- nbody: 122 ms +- 1 ms -> 135 ms +- 2 ms: 1.10x slower (+10%)
- mako: 17.1 ms +- 0.1 ms -> 18.8 ms +- 0.1 ms: 1.10x slower (+10%)
- json_dumps: 12.3 ms +- 0.2 ms -> 13.5 ms +- 0.1 ms: 1.10x slower
(+10%)
- scimark_monte_carlo: 103 ms +- 2 ms -> 113 ms +- 1 ms: 1.10x
slower (+10%)
- pickle_pure_python: 467 us +- 3 us -> 508 us +- 6 us: 1.09x slower
(+9%)
- logging_format: 10.2 us +- 0.1 us -> 11.1 us +- 2.2 us: 1.09x
slower (+9%)
- chameleon: 9.27 ms +- 0.09 ms -> 10.1 ms +- 0.1 ms: 1.09x slower
(+9%)
- sqlalchemy_imperative: 30.4 ms +- 0.8 ms -> 32.9 ms +- 0.9 ms:
1.08x slower (+8%)
- django_template: 122 ms +- 2 ms -> 131 ms +- 2 ms: 1.08x slower
(+8%)
- sympy_str: 184 ms +- 2 ms -> 198 ms +- 5 ms: 1.07x slower (+7%)
- unpickle_pure_python: 368 us +- 5 us -> 394 us +- 9 us: 1.07x
slower (+7%)
- sympy_expand: 426 ms +- 10 ms -> 452 ms +- 12 ms: 1.06x slower
(+6%)
- sympy_sum: 90.4 ms +- 0.6 ms -> 96.0 ms +- 1.0 ms: 1.06x slower
(+6%)
- regex_compile: 181 ms +- 7 ms -> 192 ms +- 7 ms: 1.06x slower
(+6%)
- scimark_lu: 173 ms +- 6 ms -> 182 ms +- 5 ms: 1.05x slower (+5%)
- genshi_xml: 62.7 ms +- 0.8 ms -> 66.1 ms +- 0.8 ms: 1.05x slower
(+5%)
- pickle: 9.11 us +- 0.13 us -> 9.59 us +- 0.06 us: 1.05x slower
(+5%)
Faster (2):
- unpack_sequence: 49.1 ns +- 0.7 ns -> 45.0 ns +- 1.3 ns: 1.09x
faster (-8%)
- scimark_sparse_mat_mult: 3.75 ms +- 0.05 ms -> 3.47 ms +- 0.05 ms:
1.08x faster (-8%)
Benchmark hidden because not significant (29): 2to3, chaos,
deltablue, dulwich_log, fannkuch, genshi_text, go, hexiom, html5lib,
json_loads, logging_simple, meteor_contest, nqueens, pathlib,
pidigits, python_startup, python_startup_no_site, regex_dna,
regex_effbot, regex_v8, richards, scimark_fft, scimark_sor,
spectral_norm, sqlite_synth, sympy_integrate, tornado_http,
unpickle, unpickle_list
Ignored benchmarks (5) of ../cpython-profile-tagged-off/base4.json:
sqlalchemy_declarative, xml_etree_generate, xml_etree_iterparse,
xml_etree_parse, xml_etree_process
Ignored benchmarks (4) of fixedint5.json:
xml_etree_pure_python_generate, xml_etree_pure_python_iterparse,
xml_etree_pure_python_parse,
xml_etree_pure_python_process
This is an experiment to exercise the new C-API we are trying to
design. I would like it to be able to support tagged pointers.
This requires PyObject* to be treated as an opaque pointer.
Note, I'm not suggesting that we should used tagged pointers in
CPython. Instead, I just want to see if the API will make it
possible to do it. The logic is that some Python implementations
might want to use tagged pointers. They would also like to provide
the same C-API as CPython. So, it would be real nice if we don't
make things really hard for them.
The patch adding the actual tagged pointer type is quite trivial.
There are a number of git commits preceeding it in order to make it
trivial. Coccinelle has been useful. Some of my semantic patches
> @@
> expression E;
> @@
>
> -E->ob_type
> +Py_TYPE(E)
> @@
> expression E, F;
> @@
>
> -Py_TYPE(E) = F
> +Py_SET_TYPE(E, F)
Run like:
spatch --sp-file ob_type.cocci <C source files>
To speed things up, I used grep to narrow down the set of files to process.
Here is the source code for the fixed int cpython. There is a new builtin
'fixedint'. It doesn't do much yet but at least doesn't immediately crash.
https://github.com/nascheme/cpython/tree/tagged_int
Not that long ago Brett, Barry, and I were talking about how to get
extension authors to move away from the C-API. Cython is the obvious
choice, but it isn't an official tool nor does it necessarily make
sense to make it one. Regardless, it would help all parties involved
if there *were* an official tool that was part of CPython (i.e. in the
repo). Cython could build on top of it and extension authors would be
encouraged to use it or Cython. If such a thing makes sense, I figure
we would follow the pattern set by asyncio (relative to twisted).
-eric
On 2018-09-07 14:47, Victor Stinner wrote:
> Inside CPython, the core and builtin modules, micro-optimizations must
> be used: abuse borrowed references, macros, access directly to all
> fields of C structure, etc.
>
> But here I'm talking about the public C API used by third party extensions.
Making a difference between "inside CPython" and "third party
extensions" is a bad idea. Making such a difference would be problematic:
1. Complexity: we should not have two different C APIs. There should be
just one, usable internally and externally.
2. Performance: if an optimization is important for CPython, then it's
also important for third-party extensions. We don't want third-party
code to be inherently slower than CPython itself. (this reminds me of
PEP 580, regarding several internal optimizations which aren't available
to third-party classes)
Jeroen.
On 2018-09-04 23:50, Victor Stinner wrote:
> I would like to design a new C API without borrow references. So
> Py_TYPE() should go, replaced by something else.
I don't see the problem with Py_TYPE(). Consider it an accessor macro,
just a nice way to access to ob_type field.
On 07.09.2018 10:22, Victor Stinner wrote:
> > I'm in discussion with PyPy developers, and they reported different
> > APIs which cause them troubles:
> > (...)
> > * almost all PyUnicode API functions have to go according to them.
> > PyPy3 uses UTF-8 internally, CPython uses "compact string" (array of
> > Py_UCS1, Py_UCS2 or Py_UCS4 depending on the string content).
> > https://pythoncapi.readthedocs.io/bad_api.html#pypy-requests
Le ven. 7 sept. 2018 à 10:33, M.-A. Lemburg <mal(a)egenix.com> a écrit :
> I'm -1 on removing the PyUnicode APIs. We deliberately created a
> useful and very complete C API for Unicode.
>
> The fact that PyPy chose to use a different internal representation
> is not a good reason to remove APIs and have CPython extension take
> the hit as a result. It would be better for PyPy rethink the
> internal representation or create a shim API which translates
> between the two worlds.
>
> Note that UTF-8 is not a good internal representation for Unicode
> if you want fast indexing and slicing. This is why we are using
> fixed code units to represent the Unicode strings.
The PyUnicode C API is not only an issue for PyPy, it's also an issue
for CPython. When the PEP 393 has been implemented, suddly, most of
the PyUnicode API has been directly deprecated: all functions using
the now legacy Py_UNICODE* type...
Python 3.7 still has to support both the legacy Py_UNICODE* API and
the new "compact string" API. It makes the CPython code base way more
complex that it should be: any function accepting a string is supposed
to call PyUnicode_Ready() and handle error properly. I would prefer to
be able to remove the legacy PyUnicodeObject type, to only use compact
strings everywhere.
Let me elaborate what are good and bad functions for PyUnicode.
Example of bad APIs:
* PyUnicode_IS_COMPACT(): this API really rely on the *current* implementation
* PyUnicode_2BYTE_DATA(): should only be used internally, there is no
need to export it
* PyUnicode_READ()
* Py_UNICODE_strcmp(): use Py_UNICODE which is an implementation detail
Good API:
* PyUnicode_Concat(): C API for str + str
* PyUnicode_Split()
* PyUnicode_FindChar()
Border line:
* PyUnicode_IS_ASCII(op): it's a O(1) operation on CPython, but it can
O(n) on other implementations (like PyPy which uses UTF-8). But we
also added str.isascii() in Python 3.7....
* PyUnicode_READ_CHAR()
* PyUnicode_CompareWithASCIIString(): the function name announces
ASCII but decodes the byte string from Latin1 :-)
Victor
On 2018-09-07 15:17, Victor Stinner wrote:
> I'm not sure if we can hide some functions for
> regular C extensions, but only give access to Cython?
Which problem would that solve? If you're happy with Cython using
"internal" functions, why shouldn't other C extensions use those same
internal functions?
Le ven. 7 sept. 2018 à 12:45, M.-A. Lemburg <mal(a)egenix.com> a écrit :
>
> On 07.09.2018 11:46, Victor Stinner wrote:
> > I'm trying to remove functions from the C API which allow to do things
> > which are not possible at the Python level:
> > https://pythoncapi.readthedocs.io/bad_api.html#no-public-c-functions-if-i...
> >
> > It's a request coming from PyPy developers.
>
> There are always things which you have to be able to do at the
> C level but cannot do at the Python level and that's intentional,
> since you're working at the C level and want to have direct
> access to the data, avoiding copying things around all the
> time and creating intermediate temporary objects or buffers for
> this.
Your motivation here is basically performance, am I right?
From what I understood, such optimization in the short term becomes a
high technical debt in the long term for PyPy, Gilectomy, and any
other Python implementation.
Inside CPython, the core and builtin modules, micro-optimizations must
be used: abuse borrowed references, macros, access directly to all
fields of C structure, etc.
But here I'm talking about the public C API used by third party extensions.
... Sadly, I didn't work on this topic yet, so I would prefer to not
go further in the discussion, since I have no concrete examples of
issue :-)
> You also want callbacks to work from both worlds (Python into
> C and C into Python), or embed the interpreter, or provide new
> ways of working with the existing objects, or work with
> code in a way which bypasses the exception machinery (in C
> you very often raise exceptions and catch them without the
> exception object itself every being created).
You should really replace "Python" (the language) with "CPython" in
your paragraph. I would like to move away from the current CPython.
Victor
On 2018-09-05 15:19, Hugh Fisher wrote:
> And this happens to be exactly what Apple did when they added tagged
> pointers to Objective-C. They told programmers not to access objectptr->isa
> directly, instead to use OBJECT_GETCLASS(objectptr). Then later they
> started implementing some objects as tagged pointers, and only code
> that hadn't been updated broke. OBJECT_GETCLASS was the "stable
> ABI"
This is a stable API, but I would guess not a stable ABI. Victor Stinner
wants a stable ABI.
Hi,
In my previous "Open questions about borrowed references" thread, I
asked if it's an issue or not that Py_TYPE() returns a borrowed
reference.
It seems like most people say that it's not an issue. If you want to
discuss if Py_TYPE() is an issue, please contribute to the other
thread. Here I only want to share my results. Honestly, I'm still not
100% convinced that Py_TYPE() is an issue :-)
I experimented anyway a C API without Py_TYPE(). I chose to add 3
functions/features:
* ``Py_GetType()``: similar to ``Py_TYPE()`` but returns a strong reference
* ``Py_TYPE_IS(ob, type)``: equivalent to ``Py_TYPE(ob) == type``
* ``%T`` format for ``PyUnicode_FromFormat()``
Most of the usage of Py_TYPE() are the following 5 patterns. For the
last one, I'm not sure that the "N" format of Py_BuildValue() should
stay since it's also based on borrowed reference. Replacing Py_TYPE()
requires a lot of changes, and I made many copy/paste mistakes. If we
chose to remove Py_TYPE(), we really need a tool to automate
refactoring. Otherwise, the risk of introducing a regression is just
too high and send the wrong signals to users.
Dealloc::
Py_TYPE(self)->tp_free((PyObject *)self);
becomes::
PyTypeObject *type = Py_GetType(self);
type->tp_free((PyObject *)self);
Py_DECREF(type);
Size::
res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated *
self->ob_descr->itemsize;
becomes::
PyTypeObject *type = Py_GetType(self);
res = _PyObject_SIZE(type) + self->allocated * self->ob_descr->itemsize;
Py_DECREF(type);
Error::
PyErr_Format(PyExc_TypeError,
"first argument must be a type object, not %.200s",
Py_TYPE(arraytype)->tp_name);
becomes::
PyErr_Format(PyExc_TypeError,
"first argument must be a type object, not %T",
arraytype);
Py_BuildValue::
result = Py_BuildValue(
"N(CO)O", Py_GetType(self), typecode, list, dict);
becomes::
result = Py_BuildValue(
"N(CO)O", Py_GetType(self), typecode, list, dict);
Victor