Update on PEP 523 and adding a co_extra field to code objects
For quick background for those that don't remember, part of PEP 523 proposed adding a co_extra field to code objects along with making the frame evaluation function pluggable ( https://www.python.org/dev/peps/pep-0523/#expanding-pycodeobject). The idea was that things like JITs and debuggers could use the field as a scratch space of sorts to store data related to the code object. People who objected to the new field did either for memory ("it adds another pointer to the struct that won't be typically used"), or for conceptual reasons ("the code object is immutable and you're proposing a mutable field"). The latter is addressed by not exposing the field in Python and clearly stating that code should never expect the field to be filled. For the former issue of whether the memory was worth it, Dino has been testing whether the field is necessary for performance from a JIT perspective. Well, Dino found the time to test Pyjion without the co_extra field and it isn't pretty. With the field, Pyjion is faster than stock Python in 15 benchmarks ( https://github.com/Microsoft/Pyjion/tree/master/Perf). Removing the co_extra field and using an unordered_map from the C++ STL drops that number to 2. Performance is even worse if we try and use a Python dictionary instead. That means we still want to find a solution to attach arbitrary data to code objects without sacrificing performance. One proposal is what's in PEP 523 for the extra field. Another option is to make the memory allocator for code objects pluggable and introduce a new flag that signals that the object was created using a non-default allocator. Obviously we prefer the former solution due to its simplicity. :) Anyway, we would like to get this settled this week so that I can get whatever solution we agree to (if any) in next week in time for Python 3.6b1 feature freeze that would be greatly appreciated.
Hi Brett
For what is worth, vmprof and similar tools would love such field
(there is an open question how can you use vmprof *and* another tool,
but later)
On Mon, Aug 29, 2016 at 11:38 PM, Brett Cannon
For quick background for those that don't remember, part of PEP 523 proposed adding a co_extra field to code objects along with making the frame evaluation function pluggable (https://www.python.org/dev/peps/pep-0523/#expanding-pycodeobject). The idea was that things like JITs and debuggers could use the field as a scratch space of sorts to store data related to the code object. People who objected to the new field did either for memory ("it adds another pointer to the struct that won't be typically used"), or for conceptual reasons ("the code object is immutable and you're proposing a mutable field"). The latter is addressed by not exposing the field in Python and clearly stating that code should never expect the field to be filled.
For the former issue of whether the memory was worth it, Dino has been testing whether the field is necessary for performance from a JIT perspective. Well, Dino found the time to test Pyjion without the co_extra field and it isn't pretty. With the field, Pyjion is faster than stock Python in 15 benchmarks (https://github.com/Microsoft/Pyjion/tree/master/Perf). Removing the co_extra field and using an unordered_map from the C++ STL drops that number to 2. Performance is even worse if we try and use a Python dictionary instead.
That means we still want to find a solution to attach arbitrary data to code objects without sacrificing performance. One proposal is what's in PEP 523 for the extra field. Another option is to make the memory allocator for code objects pluggable and introduce a new flag that signals that the object was created using a non-default allocator. Obviously we prefer the former solution due to its simplicity. :)
Anyway, we would like to get this settled this week so that I can get whatever solution we agree to (if any) in next week in time for Python 3.6b1 feature freeze that would be greatly appreciated.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
On Mon, 29 Aug 2016 at 15:12 Maciej Fijalkowski
Hi Brett
For what is worth, vmprof and similar tools would love such field (there is an open question how can you use vmprof *and* another tool, but later)
That's great to hear! Glad the solution has multiple use-cases. -Brett
For quick background for those that don't remember, part of PEP 523
On Mon, Aug 29, 2016 at 11:38 PM, Brett Cannon
wrote: proposed adding a co_extra field to code objects along with making the frame evaluation function pluggable (https://www.python.org/dev/peps/pep-0523/#expanding-pycodeobject). The idea was that things like JITs and debuggers could use the field as a scratch space of sorts to store data related to the code object. People who objected to the new field did either for memory ("it adds another pointer to the struct that won't be typically used"), or for conceptual reasons ("the code object is immutable and you're proposing a mutable field"). The latter is addressed by not exposing the field in Python and clearly stating that code should never expect the field to be filled.
For the former issue of whether the memory was worth it, Dino has been testing whether the field is necessary for performance from a JIT perspective. Well, Dino found the time to test Pyjion without the co_extra field and it isn't pretty. With the field, Pyjion is faster than stock Python in 15 benchmarks (https://github.com/Microsoft/Pyjion/tree/master/Perf). Removing the co_extra field and using an unordered_map from the C++ STL drops that number to 2. Performance is even worse if we try and use a Python dictionary instead.
That means we still want to find a solution to attach arbitrary data to code objects without sacrificing performance. One proposal is what's in PEP 523 for the extra field. Another option is to make the memory allocator for code objects pluggable and introduce a new flag that signals that the object was created using a non-default allocator. Obviously we prefer the former solution due to its simplicity. :)
Anyway, we would like to get this settled this week so that I can get whatever solution we agree to (if any) in next week in time for Python 3.6b1 feature freeze that would be greatly appreciated.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/fijall%40gmail.com
On 2016-08-29 23:38, Brett Cannon wrote:
That means we still want to find a solution to attach arbitrary data to code objects without sacrificing performance. One proposal is what's in PEP 523 for the extra field. Another option is to make the memory allocator for code objects pluggable and introduce a new flag that signals that the object was created using a non-default allocator. Obviously we prefer the former solution due to its simplicity. :)
May I remind you that you can have the field with no extra memory cost? :) The struct has sub-par alignments. Christian
Anyway, given the outcome of Dino's tests I have no objections to the
PEP. (Though using Christian's hack would be cool.)
On Mon, Aug 29, 2016 at 3:28 PM, Christian Heimes
On 2016-08-29 23:38, Brett Cannon wrote:
That means we still want to find a solution to attach arbitrary data to code objects without sacrificing performance. One proposal is what's in PEP 523 for the extra field. Another option is to make the memory allocator for code objects pluggable and introduce a new flag that signals that the object was created using a non-default allocator. Obviously we prefer the former solution due to its simplicity. :)
May I remind you that you can have the field with no extra memory cost? :) The struct has sub-par alignments.
Christian _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On Mon, 29 Aug 2016 at 15:51 Guido van Rossum
Anyway, given the outcome of Dino's tests I have no objections to the PEP. (Though using Christian's hack would be cool.)
Great! I'll mark the PEP as accepted and get the implementation in for 3.6.
On Mon, Aug 29, 2016 at 3:28 PM, Christian Heimes
wrote: On 2016-08-29 23:38, Brett Cannon wrote:
That means we still want to find a solution to attach arbitrary data to code objects without sacrificing performance. One proposal is what's in PEP 523 for the extra field. Another option is to make the memory allocator for code objects pluggable and introduce a new flag that signals that the object was created using a non-default allocator. Obviously we prefer the former solution due to its simplicity. :)
May I remind you that you can have the field with no extra memory cost? :)
Yes you may. :)
The struct has sub-par alignments.
So the struct in question can be found at https://github.com/python/cpython/blob/2d264235f6e066611b412f7c2e1603866e0f7... . The official docs say the fields can be changed at any time, so re-arranging them shouldn't break any ABI compatibility promises: https://docs.python.org/3/c-api/code.html#c.PyCodeObject . Would grouping all the fields of the same type together, sorting them by individual field size (i.e. PyObject*, void*, int, unsigned char*), and then adding the co_extra field at the end of the grouping of PyObject * fields do what you're suggesting?
On 2016-08-30 01:14, Brett Cannon wrote:
So the struct in question can be found at https://github.com/python/cpython/blob/2d264235f6e066611b412f7c2e1603866e0f7... . The official docs say the fields can be changed at any time, so re-arranging them shouldn't break any ABI compatibility promises: https://docs.python.org/3/c-api/code.html#c.PyCodeObject . Would grouping all the fields of the same type together, sorting them by individual field size (i.e. PyObject*, void*, int, unsigned char*), and then adding the co_extra field at the end of the grouping of PyObject * fields do what you're suggesting?
You don't have to resort them all, just move co_firstlineno after co_flags, so all int fields are together. Pointers are typically alignment to multiple of 64 on a 64bit machine. In its current shape PyCodeObject is padded with two unused areas of 32bit: 5 * int32 + 32 bits of padding, 9 * pointers (64 bits each), 1 * int32 + another 32 bits of padding, 3 * pointers. When you move co_firstlineno, you fill in the gap. Christian
On 8/29/2016 5:38 PM, Brett Cannon wrote:
who objected to the new field did either for memory ("it adds another pointer to the struct that won't be typically used"), or for conceptual reasons ("the code object is immutable and you're proposing a mutable field"). The latter is addressed by not exposing the field in Python and
Am I correct is thinking that you will also not add the new field as an argument to PyCode_New?
clearly stating that code should never expect the field to be filled.
I interpret this as "The only code that should access the field should be code that put something there." -- Terry Jan Reedy
On Mon, Aug 29, 2016, 17:06 Terry Reedy
On 8/29/2016 5:38 PM, Brett Cannon wrote:
who objected to the new field did either for memory ("it adds another pointer to the struct that won't be typically used"), or for conceptual reasons ("the code object is immutable and you're proposing a mutable field"). The latter is addressed by not exposing the field in Python and
Am I correct is thinking that you will also not add the new field as an argument to PyCode_New?
Correct.
clearly stating that code should never expect the field to be filled.
I interpret this as "The only code that should access the field should be code that put something there."
Yep, seems like a reasonable rule to follow. -brett
-- Terry Jan Reedy
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
On Tue, Aug 30, 2016 at 3:00 AM, Brett Cannon
On Mon, Aug 29, 2016, 17:06 Terry Reedy
wrote: On 8/29/2016 5:38 PM, Brett Cannon wrote:
who objected to the new field did either for memory ("it adds another pointer to the struct that won't be typically used"), or for conceptual reasons ("the code object is immutable and you're proposing a mutable field"). The latter is addressed by not exposing the field in Python and
Am I correct is thinking that you will also not add the new field as an argument to PyCode_New?
Correct.
clearly stating that code should never expect the field to be filled.
I interpret this as "The only code that should access the field should be code that put something there."
Yep, seems like a reasonable rule to follow.
-brett
How do we make sure that multuple tools don't stomp on each other?
On 8/30/2016 4:20 AM, Maciej Fijalkowski wrote:
On Tue, Aug 30, 2016 at 3:00 AM, Brett Cannon
wrote: On Mon, Aug 29, 2016, 17:06 Terry Reedy
wrote: On 8/29/2016 5:38 PM, Brett Cannon wrote:
who objected to the new field did either for memory ("it adds another pointer to the struct that won't be typically used"), or for conceptual reasons ("the code object is immutable and you're proposing a mutable field"). The latter is addressed by not exposing the field in Python and
Am I correct is thinking that you will also not add the new field as an argument to PyCode_New?
Correct.
clearly stating that code should never expect the field to be filled.
I interpret this as "The only code that should access the field should be code that put something there."
Yep, seems like a reasonable rule to follow.
-brett
How do we make sure that multuple tools don't stomp on each other?
AFAIK, we can't. The multiple tool people will have to work that out, or document incompatibilities between tools. -- Terry Jan Reedy
On Tue, 30 Aug 2016 at 01:20 Maciej Fijalkowski
On Tue, Aug 30, 2016 at 3:00 AM, Brett Cannon
wrote: On Mon, Aug 29, 2016, 17:06 Terry Reedy
wrote: On 8/29/2016 5:38 PM, Brett Cannon wrote:
who objected to the new field did either for memory ("it adds another pointer to the struct that won't be typically used"), or for
conceptual
reasons ("the code object is immutable and you're proposing a mutable field"). The latter is addressed by not exposing the field in Python and
Am I correct is thinking that you will also not add the new field as an argument to PyCode_New?
Correct.
clearly stating that code should never expect the field to be filled.
I interpret this as "The only code that should access the field should be code that put something there."
Yep, seems like a reasonable rule to follow.
-brett
How do we make sure that multuple tools don't stomp on each other?
It will be up to the tool. For Pyjion we just don't use the field if someone else is using it, so if vmprof chooses to take precedence then it can. Otherwise we can work out some common practice like if the field is set and it isn't an object you put there then create a list, push on what was already there, push on what you want to add, and set the field to the list. That lets us do a type-check for the common case of only one object being set, and in the odd case of the list things don't fail as you can search the list for your object while acknowledging performance will suffer (or we use a dict, I don't care really as long as we don't require a storage data structure for the field in the single user case). My point is that we can figure this out among Pyjion, PTVS, and vmprof if we are the first users and update the PEP accordingly as guidance.
On Mon, 29 Aug 2016 21:38:19 +0000
Brett Cannon
For quick background for those that don't remember, part of PEP 523 proposed adding a co_extra field to code objects along with making the frame evaluation function pluggable ( https://www.python.org/dev/peps/pep-0523/#expanding-pycodeobject). The idea was that things like JITs and debuggers could use the field as a scratch space of sorts to store data related to the code object. People who objected to the new field did either for memory ("it adds another pointer to the struct that won't be typically used"), or for conceptual reasons ("the code object is immutable and you're proposing a mutable field"). The latter is addressed by not exposing the field in Python and clearly stating that code should never expect the field to be filled.
Just a question: Maciej mentioned the field may be useful for vmprof. That's already two potential users (vmprof and Pyjion) for a single field. OTOH, the PEP says: """It is not recommended that multiple users attempt to use the co_extra simultaneously. While a dictionary could theoretically be set to the field and various users could use a key specific to the project, there is still the issue of key collisions as well as performance degradation from using a dictionary lookup on every frame evaluation. Users are expected to do a type check to make sure that the field has not been previously set by someone else.""" Does it mean that, in the event vmprof comes in and changes the field, Pyjion will have to recompile the function the next time it comes to execute it? And, conversely, if Pyjion compiles a function while vmprof is enabled, vmprof will lose timing information (or whatever else, because I'm not sure what vmprof plans to store thre) for that code object? Regards Antoine.
On Tue, 30 Aug 2016 at 09:08 Antoine Pitrou
On Mon, 29 Aug 2016 21:38:19 +0000 Brett Cannon
wrote: For quick background for those that don't remember, part of PEP 523 proposed adding a co_extra field to code objects along with making the frame evaluation function pluggable ( https://www.python.org/dev/peps/pep-0523/#expanding-pycodeobject). The idea was that things like JITs and debuggers could use the field as a scratch space of sorts to store data related to the code object. People who objected to the new field did either for memory ("it adds another pointer to the struct that won't be typically used"), or for conceptual reasons ("the code object is immutable and you're proposing a mutable field"). The latter is addressed by not exposing the field in Python and clearly stating that code should never expect the field to be filled.
Just a question: Maciej mentioned the field may be useful for vmprof. That's already two potential users (vmprof and Pyjion) for a single field.
PTVS has also said they would find it useful for debugging.
OTOH, the PEP says:
"""It is not recommended that multiple users attempt to use the co_extra simultaneously. While a dictionary could theoretically be set to the field and various users could use a key specific to the project, there is still the issue of key collisions as well as performance degradation from using a dictionary lookup on every frame evaluation. Users are expected to do a type check to make sure that the field has not been previously set by someone else."""
Does it mean that, in the event vmprof comes in and changes the field, Pyjion will have to recompile the function the next time it comes to execute it?
As of right now Pyjion simply doesn't JIT the function.
And, conversely, if Pyjion compiles a function while vmprof is enabled, vmprof will lose timing information (or whatever else, because I'm not sure what vmprof plans to store there) for that code object?
Depends on what vmprof chooses to do. Since the data is designed to be disposable it could decide it should always take precedence and overwrite the data if someone beat it to using the field. Basically I don't think we want co_extra1, co_extra2, etc. But we don't want to require a dict either as that kills performance. Using a list where users could push on objects might work, but I have no clue what that would do to perf as you would have to still needlessly search the list when only one piece of code uses the field. Basically I don't see a good way to make a general solution for people who want to use the field simultaneously, so tools that use the field will need to be clear on how they choose to handle the situation, such "as we use it if it isn't set" or "we always use it no matter what". This isn't a perfect solution in all cases and I think that's just going to have to be the way it is.
On Tue, 30 Aug 2016 17:14:31 +0000
Brett Cannon
Depends on what vmprof chooses to do. Since the data is designed to be disposable it could decide it should always take precedence and overwrite the data if someone beat it to using the field. Basically I don't think we want co_extra1, co_extra2, etc. But we don't want to require a dict either as that kills performance. Using a list where users could push on objects might work, but I have no clue what that would do to perf as you would have to still needlessly search the list when only one piece of code uses the field.
Perhaps a list would work indeed. Realistically, if there are at most 2-3 users of the field at any given time (and most probably only one or zero), a simple type check (by pointer equality) on each list item may be sufficient. Speaking about Numba, we don't have any planned use for the field, so I can't really give any further suggestion. Regards Antoine.
On Tue, 30 Aug 2016 at 10:32 Antoine Pitrou
On Tue, 30 Aug 2016 17:14:31 +0000 Brett Cannon
wrote: Depends on what vmprof chooses to do. Since the data is designed to be disposable it could decide it should always take precedence and overwrite the data if someone beat it to using the field. Basically I don't think
we
want co_extra1, co_extra2, etc. But we don't want to require a dict either as that kills performance. Using a list where users could push on objects might work, but I have no clue what that would do to perf as you would have to still needlessly search the list when only one piece of code uses the field.
Perhaps a list would work indeed. Realistically, if there are at most 2-3 users of the field at any given time (and most probably only one or zero), a simple type check (by pointer equality) on each list item may be sufficient.
Let's see what Maciej says, but we could standardize on switching the field to a list when a conflict of usage is detected so the common case in the frame eval function is checking for your own type, and if that fails then doing a PyList_CheckExact() and look for your object, otherwise make a list and move over to that for everyone to use. A little bit more code, but it's simple code and takes care of conflicts only when it calls for it.
On Tue, 30 Aug 2016 17:35:35 +0000
Brett Cannon
Perhaps a list would work indeed. Realistically, if there are at most 2-3 users of the field at any given time (and most probably only one or zero), a simple type check (by pointer equality) on each list item may be sufficient.
Let's see what Maciej says, but we could standardize on switching the field to a list when a conflict of usage is detected so the common case in the frame eval function is checking for your own type, and if that fails then doing a PyList_CheckExact() and look for your object, otherwise make a list and move over to that for everyone to use. A little bit more code, but it's simple code and takes care of conflicts only when it calls for it.
That's a bit obscure and confusing, though (I *think* the weakref module uses a similar kludge in some place). If you want to iterate on it you have to write some bizarre macro to share the loop body between the two different code-paths (list / non-list), or some equally tedious function-pointer-based code. Why not make it always a list? List objects are reasonably cheap in memory and access time... (unlike dicts) Regards Antoine.
On Tue, 30 Aug 2016 at 10:49 Antoine Pitrou
Perhaps a list would work indeed. Realistically, if there are at most 2-3 users of the field at any given time (and most probably only one or zero), a simple type check (by pointer equality) on each list item may be sufficient.
Let's see what Maciej says, but we could standardize on switching the field to a list when a conflict of usage is detected so the common case in the frame eval function is checking for your own type, and if that fails then doing a PyList_CheckExact() and look for your object, otherwise make a
On Tue, 30 Aug 2016 17:35:35 +0000 Brett Cannon
wrote: list and move over to that for everyone to use. A little bit more code, but it's simple code and takes care of conflicts only when it calls for it.
That's a bit obscure and confusing, though (I *think* the weakref module uses a similar kludge in some place). If you want to iterate on it you have to write some bizarre macro to share the loop body between the two different code-paths (list / non-list), or some equally tedious function-pointer-based code.
I don't quite follow where the complexity you're suggesting comes from. The frame evaluation function in Pyjion would just do: if (co_extra == NULL) { // No one using the field. co_extra = pyjion_cache = PyPyjion_New(); } else if (!is_pyjion_object(co_extra)) { // Someone other than us is using the field. if (PyList_CheckExact(co_extra)) { // Field is already a list. ... look for object ... if (ob_found != NULL) { // We're in the list. pyjion_cache = ob_found; } else { // Not in the list, so add ourselves. pyjion_cache = PyPyjion_New(); co_extra.append(pyjion_cache); } } else { // Someone else in the field, not a list (yet). other_ob = co_extra; co_extra = PyList_New(); co_extra.append(other_ob); pyjion_cache = PyPyjion_New(); co_extra.append(pyjion_cache); } } else { // We're in the field. pyjion_cache = co_extra; }
Why not make it always a list? List objects are reasonably cheap in memory and access time... (unlike dicts)
Because I would prefer to avoid any form of unnecessary performance overhead for the common case.
On Tue, 30 Aug 2016 18:12:01 +0000
Brett Cannon
Why not make it always a list? List objects are reasonably cheap in memory and access time... (unlike dicts)
Because I would prefer to avoid any form of unnecessary performance overhead for the common case.
But the performance overhead of iterating over a 1-element list is small enough (it's just an array access after a pointer dereference) that it may not be larger than the overhead of the multiple tests and conditional branches your example shows. Regards Antoine.
On 30.08.16 21:20, Antoine Pitrou wrote:
On Tue, 30 Aug 2016 18:12:01 +0000 Brett Cannon
wrote: Why not make it always a list? List objects are reasonably cheap in memory and access time... (unlike dicts)
Because I would prefer to avoid any form of unnecessary performance overhead for the common case.
But the performance overhead of iterating over a 1-element list is small enough (it's just an array access after a pointer dereference) that it may not be larger than the overhead of the multiple tests and conditional branches your example shows.
Iterating over a tuple is even faster. It needs one pointer dereference less. And for memory efficiency we can use just a raw array of pointers.
On Tue, 30 Aug 2016 at 11:56 Serhiy Storchaka
On 30.08.16 21:20, Antoine Pitrou wrote:
On Tue, 30 Aug 2016 18:12:01 +0000 Brett Cannon
wrote: Why not make it always a list? List objects are reasonably cheap in memory and access time... (unlike dicts)
Because I would prefer to avoid any form of unnecessary performance overhead for the common case.
But the performance overhead of iterating over a 1-element list is small enough (it's just an array access after a pointer dereference) that it may not be larger than the overhead of the multiple tests and conditional branches your example shows.
Iterating over a tuple is even faster. It needs one pointer dereference less.
I'll talk it over with Dino and see what he thinks.
And for memory efficiency we can use just a raw array of pointers.
I would rather not do that as that leads to having to track the end of the array, special memory cleanup, etc.
On Wed, Aug 31, 2016 at 4:55 AM, Serhiy Storchaka
On 30.08.16 21:20, Antoine Pitrou wrote:
On Tue, 30 Aug 2016 18:12:01 +0000 Brett Cannon
wrote: Why not make it always a list? List objects are reasonably cheap in memory and access time... (unlike dicts)
Because I would prefer to avoid any form of unnecessary performance overhead for the common case.
But the performance overhead of iterating over a 1-element list is small enough (it's just an array access after a pointer dereference) that it may not be larger than the overhead of the multiple tests and conditional branches your example shows.
Iterating over a tuple is even faster. It needs one pointer dereference less.
And for memory efficiency we can use just a raw array of pointers.
Didn't all this kind of thing come up when function annotations were discussed? Insane schemes like dictionaries with UUID keys and so on. The decision then was YAGNI. The decision now, IMO, should be the same. Keep things simple. ChrisA
On 31 August 2016 at 07:11, Chris Angelico
Didn't all this kind of thing come up when function annotations were discussed? Insane schemes like dictionaries with UUID keys and so on. The decision then was YAGNI. The decision now, IMO, should be the same. Keep things simple.
Different use case - for annotations, the *reader* of the code is one of the intended audiences, so as the author of the code, you decide what you want to tell them, and that then constrains the tools you can use (or vice-versa - you pick the kinds of tools you want to use, and that constrains what you can tell your readers). This case is different - there are no human readers involved, only automated tools, so adding a mandatory redirection through a sequence is just a small performance hit rather than a readability problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
So I ran the tests with both a list and a tuple. They were about 5% slower on a handful of benchmarks, and then the difference between the tuple and list again had a few benchmarks that were around 5% slower. There was one benchmark where the tuple one significantly for some reason (mako_v2) which was 1.4x slower. It seems to me we should go with the tuple just because the common case will be having a single object and it'll be even less common to have these changing very frequently.
-----Original Message-----
From: Python-Dev [mailto:python-dev-bounces+dinov=microsoft.com@python.org] On Behalf Of Chris Angelico
Sent: Tuesday, August 30, 2016 2:11 PM
To: python-dev
On 30.08.16 21:20, Antoine Pitrou wrote:
On Tue, 30 Aug 2016 18:12:01 +0000 Brett Cannon
wrote: Why not make it always a list? List objects are reasonably cheap in memory and access time... (unlike dicts)
Because I would prefer to avoid any form of unnecessary performance overhead for the common case.
But the performance overhead of iterating over a 1-element list is small enough (it's just an array access after a pointer dereference) that it may not be larger than the overhead of the multiple tests and conditional branches your example shows.
Iterating over a tuple is even faster. It needs one pointer dereference less.
And for memory efficiency we can use just a raw array of pointers.
Didn't all this kind of thing come up when function annotations were discussed? Insane schemes like dictionaries with UUID keys and so on. The decision then was YAGNI. The decision now, IMO, should be the same. Keep things simple. ChrisA _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fmail.python.org%2fmailman%2flistinfo%2fpython-dev&data=01%7c01%7cdinov%40microsoft.com%7c9d750b06b2134a2145c708d3d11a4ab0%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=szub1gsDW2rdns3IQGV68J3tCqWiNcjqG77xYIfoORc%3d Unsubscribe: https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fmail.python.org%2fmailman%2foptions%2fpython-dev%2fdinov%2540microsoft.com&data=01%7c01%7cdinov%40microsoft.com%7c9d750b06b2134a2145c708d3d11a4ab0%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=TEzMSyJLmAe2BVZGPugXAh6bga2xN1WQw3bR0z0b%2fLg%3d
So it looks like both list and tuple are about within 5% of using co_extra directly. Using a tuple instead of a list is about a wash except for make_v2 where list is 1.4x slower for some reason (which I didn't dig into). I would say that using a tuple and copying the tuple on updates makes sense as we don't expect these to change very often and we don't expect collisions to happen very often.
-----Original Message----- From: Python-Dev [mailto:python-dev- bounces+dinov=microsoft.com@python.org] On Behalf Of Chris Angelico Sent: Tuesday, August 30, 2016 2:11 PM To: python-dev
Subject: Re: [Python-Dev] Update on PEP 523 and adding a co_extra field to code objects On Wed, Aug 31, 2016 at 4:55 AM, Serhiy Storchaka
wrote: On 30.08.16 21:20, Antoine Pitrou wrote:
On Tue, 30 Aug 2016 18:12:01 +0000 Brett Cannon
wrote: Why not make it always a list? List objects are reasonably cheap in memory and access time... (unlike dicts)
Because I would prefer to avoid any form of unnecessary performance overhead for the common case.
But the performance overhead of iterating over a 1-element list is small enough (it's just an array access after a pointer dereference) that it may not be larger than the overhead of the multiple tests and conditional branches your example shows.
Iterating over a tuple is even faster. It needs one pointer dereference less.
And for memory efficiency we can use just a raw array of pointers.
Didn't all this kind of thing come up when function annotations were discussed? Insane schemes like dictionaries with UUID keys and so on. The decision then was YAGNI. The decision now, IMO, should be the same. Keep things simple.
ChrisA _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fmail.pyt hon.org%2fmailman%2flistinfo%2fpython- dev&data=01%7c01%7cdinov%40microsoft.com%7c9d750b06b2134a2145c70 8d3d11a4ab0%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=szub1gs DW2rdns3IQGV68J3tCqWiNcjqG77xYIfoORc%3d Unsubscribe: https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fmail.pyt hon.org%2fmailman%2foptions%2fpython- dev%2fdinov%2540microsoft.com&data=01%7c01%7cdinov%40microsoft.co m%7c9d750b06b2134a2145c708d3d11a4ab0%7c72f988bf86f141af91ab2d7c d011db47%7c1&sdata=TEzMSyJLmAe2BVZGPugXAh6bga2xN1WQw3bR0z0b %2fLg%3d
On Fri, 2 Sep 2016 at 13:31 Dino Viehland via Python-Dev < python-dev@python.org> wrote:
So it looks like both list and tuple are about within 5% of using co_extra directly. Using a tuple instead of a list is about a wash except for make_v2 where list is 1.4x slower for some reason (which I didn't dig into).
I would say that using a tuple and copying the tuple on updates makes sense as we don't expect these to change very often and we don't expect collisions to happen very often.
So would making co_extra a PyTupleObject instead of PyObject alleviate people's worry of a collision problem? You're going to have to hold the GIL anyway to interact with the tuple so there won't be any race condition in replacing the tuple when it's grown (or initially set). -Brett
-----Original Message----- From: Python-Dev [mailto:python-dev- bounces+dinov=microsoft.com@python.org] On Behalf Of Chris Angelico Sent: Tuesday, August 30, 2016 2:11 PM To: python-dev
Subject: Re: [Python-Dev] Update on PEP 523 and adding a co_extra field to code objects On Wed, Aug 31, 2016 at 4:55 AM, Serhiy Storchaka
wrote: On 30.08.16 21:20, Antoine Pitrou wrote:
On Tue, 30 Aug 2016 18:12:01 +0000 Brett Cannon
wrote: Why not make it always a list? List objects are reasonably cheap in memory and access time... (unlike dicts)
Because I would prefer to avoid any form of unnecessary performance overhead for the common case.
But the performance overhead of iterating over a 1-element list is small enough (it's just an array access after a pointer dereference) that it may not be larger than the overhead of the multiple tests and conditional branches your example shows.
Iterating over a tuple is even faster. It needs one pointer dereference less.
And for memory efficiency we can use just a raw array of pointers.
Didn't all this kind of thing come up when function annotations were discussed? Insane schemes like dictionaries with UUID keys and so on. The decision then was YAGNI. The decision now, IMO, should be the same. Keep things simple.
ChrisA _______________________________________________ Python-Dev mailing list Python-Dev@python.org
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fmail.pyt
hon.org%2fmailman%2flistinfo%2fpython- dev&data=01%7c01%7cdinov%40microsoft.com%7c9d750b06b2134a2145c70 8d3d11a4ab0%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=szub1gs DW2rdns3IQGV68J3tCqWiNcjqG77xYIfoORc%3d Unsubscribe:
https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fmail.pyt
hon.org%2fmailman%2foptions%2fpython- dev%2fdinov%2540microsoft.com&data=01%7c01%7cdinov%40microsoft.co m%7c9d750b06b2134a2145c708d3d11a4ab0%7c72f988bf86f141af91ab2d7c d011db47%7c1&sdata=TEzMSyJLmAe2BVZGPugXAh6bga2xN1WQw3bR0z0b %2fLg%3d
Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
On Sat, Sep 3, 2016 at 7:56 AM, Brett Cannon
On Fri, 2 Sep 2016 at 13:31 Dino Viehland via Python-Dev
wrote: So it looks like both list and tuple are about within 5% of using co_extra directly. Using a tuple instead of a list is about a wash except for make_v2 where list is 1.4x slower for some reason (which I didn't dig into).
I would say that using a tuple and copying the tuple on updates makes sense as we don't expect these to change very often and we don't expect collisions to happen very often.
So would making co_extra a PyTupleObject instead of PyObject alleviate people's worry of a collision problem? You're going to have to hold the GIL anyway to interact with the tuple so there won't be any race condition in replacing the tuple when it's grown (or initially set).
I'm not following how this solves the collision problem. If you have a tuple, how do the two (or more) users of it know which index they're using? They'd need to keep track separately for each object, or else inefficiently search the tuple for an object of appropriate type every time. What am I missing here? ChrisA
On Fri, 2 Sep 2016 at 15:11 Chris Angelico
On Sat, Sep 3, 2016 at 7:56 AM, Brett Cannon
wrote: On Fri, 2 Sep 2016 at 13:31 Dino Viehland via Python-Dev
wrote: So it looks like both list and tuple are about within 5% of using
co_extra
directly. Using a tuple instead of a list is about a wash except for make_v2 where list is 1.4x slower for some reason (which I didn't dig into).
I would say that using a tuple and copying the tuple on updates makes sense as we don't expect these to change very often and we don't expect collisions to happen very often.
So would making co_extra a PyTupleObject instead of PyObject alleviate people's worry of a collision problem? You're going to have to hold the GIL anyway to interact with the tuple so there won't be any race condition in replacing the tuple when it's grown (or initially set).
I'm not following how this solves the collision problem. If you have a tuple, how do the two (or more) users of it know which index they're using? They'd need to keep track separately for each object, or else inefficiently search the tuple for an object of appropriate type every time. What am I missing here?
You're not missing anything, you just have to pay for the search cost, otherwise we're back to square one here of not worrying about the case of multiple users. I don't see how you can have multiple users of a single struct field and yet not have to do some search of some data structure to find the relevant object you care about. We've tried maps and dicts and they were too slow, and we proposed not worrying about multiple users but people didn't like the idea of either not caring or relying on some implicit practice that evolved around the co_extra field. Using a tuple seems to be the best option we can come up with short of developing a linked list which isn't that much better than a tuple if you're simply storing PyObjects. So either we're sticking with the lack of coordination as outlined in the PEP because you don't imagine people using a combination of Pyjion, vmprof, and/or some debugger simultaneously, or you do and we have to just eat the performance degradation.
On Sat, Sep 3, 2016 at 8:45 AM, Brett Cannon
I'm not following how this solves the collision problem. If you have a tuple, how do the two (or more) users of it know which index they're using? They'd need to keep track separately for each object, or else inefficiently search the tuple for an object of appropriate type every time. What am I missing here?
You're not missing anything, you just have to pay for the search cost, otherwise we're back to square one here of not worrying about the case of multiple users. I don't see how you can have multiple users of a single struct field and yet not have to do some search of some data structure to find the relevant object you care about. We've tried maps and dicts and they were too slow, and we proposed not worrying about multiple users but people didn't like the idea of either not caring or relying on some implicit practice that evolved around the co_extra field. Using a tuple seems to be the best option we can come up with short of developing a linked list which isn't that much better than a tuple if you're simply storing PyObjects. So either we're sticking with the lack of coordination as outlined in the PEP because you don't imagine people using a combination of Pyjion, vmprof, and/or some debugger simultaneously, or you do and we have to just eat the performance degradation.
Got it, thanks. I hope the vagaries of linear search don't mess with profilers - a debugger isn't going to be bothered by whether it gets first slot or second, but profiling and performance might get subtle differences based on which thing looks at a function first. A dict would avoid that (constant-time lookups with a pre-selected key will be consistent), but costs a lot more. ChrisA
On 3 September 2016 at 08:50, Chris Angelico
Got it, thanks. I hope the vagaries of linear search don't mess with profilers - a debugger isn't going to be bothered by whether it gets first slot or second, but profiling and performance might get subtle differences based on which thing looks at a function first. A dict would avoid that (constant-time lookups with a pre-selected key will be consistent), but costs a lot more.
Profiling with a debugger enabled is going to see a lot more interference from the debugger than it is from a linear search through a small tuple for its own state :) Optimising compilers and VM profilers are clearly a case where cooperation will be desirable, as are optimising compilers and debuggers. However, that cooperation is still going to need to be worked out on a pairwise basis - the PEP can't magically make arbitrary pairs of plugins compatible, all it can do is define some rules and guidelines that make it easier for plugins to cooperate when they want to do so. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sun, Sep 4, 2016 at 2:09 AM, Nick Coghlan
On 3 September 2016 at 08:50, Chris Angelico
wrote: Got it, thanks. I hope the vagaries of linear search don't mess with profilers - a debugger isn't going to be bothered by whether it gets first slot or second, but profiling and performance might get subtle differences based on which thing looks at a function first. A dict would avoid that (constant-time lookups with a pre-selected key will be consistent), but costs a lot more.
Profiling with a debugger enabled is going to see a lot more interference from the debugger than it is from a linear search through a small tuple for its own state :)
Right; I was contrasting the debugger at one end (linear search is utterly dwarfed by other costs) with a profiler at the other end (wants minimal cost, and minimal noise, and a linear search gives cost and noise). In between, an optimizer is an example of something that could mess with the profiler based on activation ordering (and thus which one gets first slot).
Optimising compilers and VM profilers are clearly a case where cooperation will be desirable, as are optimising compilers and debuggers. However, that cooperation is still going to need to be worked out on a pairwise basis - the PEP can't magically make arbitrary pairs of plugins compatible, all it can do is define some rules and guidelines that make it easier for plugins to cooperate when they want to do so.
Obviously, but AIUI the rules sound pretty simple: 1) Base compiler: co_extra = () 2) Modifier: co_extra += (MyState(),) 3) Repeat #2 for other tools 4) for obj in co_extra: if obj.__class__ is MyState: do stuff Anyone who puts a non-tuple into co_extra is playing badly with other people. Anyone who doesn't use a custom class is risking collisions. Beyond that, it should be pretty straight-forward. ChrisA
On 2016-09-02 23:45, Brett Cannon wrote:
On Fri, 2 Sep 2016 at 15:11 Chris Angelico
mailto:rosuav@gmail.com> wrote: On Sat, Sep 3, 2016 at 7:56 AM, Brett Cannon
mailto:brett@python.org> wrote: > On Fri, 2 Sep 2016 at 13:31 Dino Viehland via Python-Dev > mailto:python-dev@python.org> wrote: >> >> So it looks like both list and tuple are about within 5% of using co_extra >> directly. Using a tuple instead of a list is about a wash except for >> make_v2 where list is 1.4x slower for some reason (which I didn't dig into). >> >> I would say that using a tuple and copying the tuple on updates makes >> sense as we don't expect these to change very often and we don't expect >> collisions to happen very often. > > > So would making co_extra a PyTupleObject instead of PyObject alleviate > people's worry of a collision problem? You're going to have to hold the GIL > anyway to interact with the tuple so there won't be any race condition in > replacing the tuple when it's grown (or initially set). > I'm not following how this solves the collision problem. If you have a tuple, how do the two (or more) users of it know which index they're using? They'd need to keep track separately for each object, or else inefficiently search the tuple for an object of appropriate type every time. What am I missing here?
You're not missing anything, you just have to pay for the search cost, otherwise we're back to square one here of not worrying about the case of multiple users. I don't see how you can have multiple users of a single struct field and yet not have to do some search of some data structure to find the relevant object you care about. We've tried maps and dicts and they were too slow, and we proposed not worrying about multiple users but people didn't like the idea of either not caring or relying on some implicit practice that evolved around the co_extra field. Using a tuple seems to be the best option we can come up with short of developing a linked list which isn't that much better than a tuple if you're simply storing PyObjects. So either we're sticking with the lack of coordination as outlined in the PEP because you don't imagine people using a combination of Pyjion, vmprof, and/or some debugger simultaneously, or you do and we have to just eat the performance degradation.
Could the users register themselves first? They could then be told what index to use.
On Fri, 2 Sep 2016 at 17:37 MRAB
On 2016-09-02 23:45, Brett Cannon wrote:
On Fri, 2 Sep 2016 at 15:11 Chris Angelico
mailto:rosuav@gmail.com> wrote: On Sat, Sep 3, 2016 at 7:56 AM, Brett Cannon
mailto:brett@python.org> wrote: > On Fri, 2 Sep 2016 at 13:31 Dino Viehland via Python-Dev > mailto:python-dev@python.org> wrote: >> >> So it looks like both list and tuple are about within 5% of using co_extra >> directly. Using a tuple instead of a list is about a wash except for
>> make_v2 where list is 1.4x slower for some reason (which I didn't dig into). >> >> I would say that using a tuple and copying the tuple on updates
makes
>> sense as we don't expect these to change very often and we don't expect >> collisions to happen very often. > > > So would making co_extra a PyTupleObject instead of PyObject
alleviate
> people's worry of a collision problem? You're going to have to hold the GIL > anyway to interact with the tuple so there won't be any race condition in > replacing the tuple when it's grown (or initially set). >
I'm not following how this solves the collision problem. If you have
a
tuple, how do the two (or more) users of it know which index they're using? They'd need to keep track separately for each object, or else inefficiently search the tuple for an object of appropriate type
every
time. What am I missing here?
You're not missing anything, you just have to pay for the search cost, otherwise we're back to square one here of not worrying about the case of multiple users. I don't see how you can have multiple users of a single struct field and yet not have to do some search of some data structure to find the relevant object you care about. We've tried maps and dicts and they were too slow, and we proposed not worrying about multiple users but people didn't like the idea of either not caring or relying on some implicit practice that evolved around the co_extra field. Using a tuple seems to be the best option we can come up with short of developing a linked list which isn't that much better than a tuple if you're simply storing PyObjects. So either we're sticking with the lack of coordination as outlined in the PEP because you don't imagine people using a combination of Pyjion, vmprof, and/or some debugger simultaneously, or you do and we have to just eat the performance degradation.
Could the users register themselves first? They could then be told what index to use.
But that requires they register before any tuple is created, else they run the risk of seeing a tuple that was created before they registered. To cover that issue you then have to check the length at which point it's no more expensive than just iterating through a tuple (especially in the common case of a tuple of length 1).
On 31 August 2016 at 04:55, Serhiy Storchaka
On 30.08.16 21:20, Antoine Pitrou wrote:
But the performance overhead of iterating over a 1-element list is small enough (it's just an array access after a pointer dereference) that it may not be larger than the overhead of the multiple tests and conditional branches your example shows.
Iterating over a tuple is even faster. It needs one pointer dereference less.
That comes at the cost of making metadata additions a bit more complicated though - you'd have to replace the existing tuple with a new one that adds your own metadata, rather than just appending to a list. I do think there are enough subtleties here (going from no metadata -> some metadata, and some metadata -> more metadata) that it makes sense to provide a standard API for it (excluded from the stable ABI), rather than expecting plugin developers to roll their own. Strawman: PyObject * PyCode_GetExtra(PyCodeObject *code, PyTypeObject *extra_type); int PyCode_SetExtra(PyCodeObject *code, PyObject *extra); int PyCode_DelExtra(PyCodeObject *code, PyTypeObject *extra_type); Then Brett's example code would become: pyjion_cache = PyCode_GetExtra(code_obj, &PyPyjion_Type); if (pyjion_cache == NULL) { pyjion_cache = PyPyjion_New(); if (PyCode_SetExtra(code_obj, pyjion_cache) < 0) { /* Something went wrong, report that somehow */ } } /* pyjion_cache is valid here */ Making those APIs fast (for an assumed small number of simultaneously active interpreter plugins) and thread-safe is then an internal CPython implementation detail, rather than being something plugin writers need to concern themselves with. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan schrieb am 31.08.2016 um 06:30:
On 31 August 2016 at 04:55, Serhiy Storchaka wrote:
On 30.08.16 21:20, Antoine Pitrou wrote:
But the performance overhead of iterating over a 1-element list is small enough (it's just an array access after a pointer dereference) that it may not be larger than the overhead of the multiple tests and conditional branches your example shows.
Iterating over a tuple is even faster. It needs one pointer dereference less.
That comes at the cost of making metadata additions a bit more complicated though - you'd have to replace the existing tuple with a new one that adds your own metadata, rather than just appending to a list.
I do think there are enough subtleties here (going from no metadata -> some metadata, and some metadata -> more metadata) that it makes sense to provide a standard API for it (excluded from the stable ABI), rather than expecting plugin developers to roll their own.
Strawman:
PyObject * PyCode_GetExtra(PyCodeObject *code, PyTypeObject *extra_type); int PyCode_SetExtra(PyCodeObject *code, PyObject *extra); int PyCode_DelExtra(PyCodeObject *code, PyTypeObject *extra_type);
Then Brett's example code would become:
pyjion_cache = PyCode_GetExtra(code_obj, &PyPyjion_Type); if (pyjion_cache == NULL) { pyjion_cache = PyPyjion_New(); if (PyCode_SetExtra(code_obj, pyjion_cache) < 0) { /* Something went wrong, report that somehow */ } } /* pyjion_cache is valid here */
Making those APIs fast (for an assumed small number of simultaneously active interpreter plugins) and thread-safe is then an internal CPython implementation detail, rather than being something plugin writers need to concern themselves with.
Looks like a good idea. New non-trivial field, new API. GetExtra() can be a macro that implements the "only one entry and type pointer matches" case for speed, then call back into the list lookup for the less common cases. Stefan
The PEP 445, C API for malloc, allows to plug multiple wrappers and each wrapper has its own "void* context" data. When you register a new wrapper, you store the current context and function to later chain it. See the hooks example: https://www.python.org/dev/peps/pep-0445/#use-case-3-setup-hooks-on-memory-b... Since the PEP 523 also adds a function, would it be possible to somehow design a mecanism to "chain wrappers"? I know that the PEP 523 has a different design, so maybe it's not possible. For example, the context can be passed to PyFrameEvalFunction. In this case, each project would have to register its own eval function, including vmprof. I don't know if it makes sense for vmprof to modify the behaviour at runtime (add a C frame per Python eval frame). Victor
participants (13)
-
Antoine Pitrou
-
Brett Cannon
-
Chris Angelico
-
Christian Heimes
-
Dino Viehland
-
Guido van Rossum
-
Maciej Fijalkowski
-
MRAB
-
Nick Coghlan
-
Serhiy Storchaka
-
Stefan Behnel
-
Terry Reedy
-
Victor Stinner