namedtuple with ordereddict

Given that (1) dicts now always pay the price for ordering (2) namedtuple is being accelerated is there any reason not to simply define it as a view on a dict, or at least as a limited proxy to one? Then constructing a specific instance from the arguments used to create it could be as simple as keeping a reference to the temporary created to pass those arguments... -jJ

Jim J. Jewett wrote:
is there any reason not to simply define it as a view on a dict, or at least as a limited proxy to one?
Some valuable characteristics of namedtuples as they are now: * Instances are very lightweight * Access by index is fast * Can be used as a dict key All of those would be lost if namedtuple became a dict view. -- Greg

On Tue, Jul 18, 2017 at 06:16:26PM -0400, Jim J. Jewett wrote:
Tuples are much more memory efficient than dicts, they support lookup by index, and you'll break a whole lot of code that treats namedtuples as tuples and performs tuple operations on them. For instance, tuple concatenation.
The bottleneck isn't creating the instances themselves, the expensive part is calling namedtuple() to generate the named tuple CLASS itself. Creating the instances themselves should be fast, they're just tuples. -- Steve

On Wed, Jul 19, 2017 at 3:27 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Still much slower (-4.3x) than plain tuples though: $ python3.7 -m timeit -s "import collections; Point = collections.namedtuple('Point', ('x', 'y'));" "Point(5, 11)" 1000000 loops, best of 5: 313 nsec per loop $ python3.7 -m timeit "tuple((5, 11))" 5000000 loops, best of 5: 71.4 nsec per loop Giampaolo - http://grodola.blogspot.com

[Giampaolo Rodola' <g.rodola@gmail.com>]
I believe this was pointed out earlier: in the second case, 1. (5, 11) is built at _compile_ time, so at runtime it's only measuring a LOAD_FAST to fetch it from the code's constants block. 2. The tuple() constructor does close to nothing when passed a tuple: it just increments the argument's reference count and returns it.
In other words, the second case isn't measuring tuple _creation_ time in any sense: it's just measuring how long it takes to look up the name "tuple" and increment the refcount on a tuple that was created at compile time.

On Wed, Jul 19, 2017 at 5:20 PM, Tim Peters <tim.peters@gmail.com> wrote:
Oh right, I didn't realize that, sorry. Should have been something like this instead: $ python3.7 -m timeit -s "import collections; Point = collections.namedtuple('Point', ('x', 'y')); x = [5, 1]" "Point(*x)" 1000000 loops, best of 5: 311 nsec per loop $ python3.7 -m timeit -s "x = [5, 1]" "tuple(x)" 5000000 loops, best of 5: 89.8 nsec per loop -- Giampaolo - http://grodola.blogspot.com

On Wed, Jul 19, 2017 at 12:10 PM, Giampaolo Rodola' <g.rodola@gmail.com> wrote:
This looks like a typical python function call overhead. Consider a toy class: $ cat c.py class C(tuple): def __new__(cls, *items): return tuple.__new__(cls, items) Comparing to a naked tuple, creation of a C instance is more than 3x slower. $ python3 -m timeit -s "from c import C; x = [1, 2]" "C(*x)" 1000000 loops, best of 3: 0.363 usec per loop $ python3 -m timeit -s "x = [1, 2]" "tuple(x)" 10000000 loops, best of 3: 0.114 usec per loop

On 7/19/2017 12:10 PM, Giampaolo Rodola' wrote:
I thing "x,y = 5, 1" in the setup and "Point(x,y)", and "(x,y)" better model real situations. "x,y" cannot be optimized away but reflects how people would construct a tuple given x and y. On my Win10 machine with 3.7 with debug win32 build (half as fast as without debug), I get F:\dev\3x>python -m timeit -s "import collections as c; Point = c.namedtuple('Point',('x','y')); x,y=5,1", "Point(x,y)" 200000 loops, best of 5: 1.86 usec per loop F:\dev\3x>python -m timeit -s "x,y=5,1", "(x,y)" 2000000 loops, best of 5: 156 nsec per loop If one starts with a tuple, then the Point call is pure extra overhead. If one does start with a list, I get 1.85 usec and 419 nsec
-- Terry Jan Reedy

Jim J. Jewett wrote:
is there any reason not to simply define it as a view on a dict, or at least as a limited proxy to one?
Some valuable characteristics of namedtuples as they are now: * Instances are very lightweight * Access by index is fast * Can be used as a dict key All of those would be lost if namedtuple became a dict view. -- Greg

On Tue, Jul 18, 2017 at 06:16:26PM -0400, Jim J. Jewett wrote:
Tuples are much more memory efficient than dicts, they support lookup by index, and you'll break a whole lot of code that treats namedtuples as tuples and performs tuple operations on them. For instance, tuple concatenation.
The bottleneck isn't creating the instances themselves, the expensive part is calling namedtuple() to generate the named tuple CLASS itself. Creating the instances themselves should be fast, they're just tuples. -- Steve

On Wed, Jul 19, 2017 at 3:27 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Still much slower (-4.3x) than plain tuples though: $ python3.7 -m timeit -s "import collections; Point = collections.namedtuple('Point', ('x', 'y'));" "Point(5, 11)" 1000000 loops, best of 5: 313 nsec per loop $ python3.7 -m timeit "tuple((5, 11))" 5000000 loops, best of 5: 71.4 nsec per loop Giampaolo - http://grodola.blogspot.com

[Giampaolo Rodola' <g.rodola@gmail.com>]
I believe this was pointed out earlier: in the second case, 1. (5, 11) is built at _compile_ time, so at runtime it's only measuring a LOAD_FAST to fetch it from the code's constants block. 2. The tuple() constructor does close to nothing when passed a tuple: it just increments the argument's reference count and returns it.
In other words, the second case isn't measuring tuple _creation_ time in any sense: it's just measuring how long it takes to look up the name "tuple" and increment the refcount on a tuple that was created at compile time.

On Wed, Jul 19, 2017 at 5:20 PM, Tim Peters <tim.peters@gmail.com> wrote:
Oh right, I didn't realize that, sorry. Should have been something like this instead: $ python3.7 -m timeit -s "import collections; Point = collections.namedtuple('Point', ('x', 'y')); x = [5, 1]" "Point(*x)" 1000000 loops, best of 5: 311 nsec per loop $ python3.7 -m timeit -s "x = [5, 1]" "tuple(x)" 5000000 loops, best of 5: 89.8 nsec per loop -- Giampaolo - http://grodola.blogspot.com

On Wed, Jul 19, 2017 at 12:10 PM, Giampaolo Rodola' <g.rodola@gmail.com> wrote:
This looks like a typical python function call overhead. Consider a toy class: $ cat c.py class C(tuple): def __new__(cls, *items): return tuple.__new__(cls, items) Comparing to a naked tuple, creation of a C instance is more than 3x slower. $ python3 -m timeit -s "from c import C; x = [1, 2]" "C(*x)" 1000000 loops, best of 3: 0.363 usec per loop $ python3 -m timeit -s "x = [1, 2]" "tuple(x)" 10000000 loops, best of 3: 0.114 usec per loop

On 7/19/2017 12:10 PM, Giampaolo Rodola' wrote:
I thing "x,y = 5, 1" in the setup and "Point(x,y)", and "(x,y)" better model real situations. "x,y" cannot be optimized away but reflects how people would construct a tuple given x and y. On my Win10 machine with 3.7 with debug win32 build (half as fast as without debug), I get F:\dev\3x>python -m timeit -s "import collections as c; Point = c.namedtuple('Point',('x','y')); x,y=5,1", "Point(x,y)" 200000 loops, best of 5: 1.86 usec per loop F:\dev\3x>python -m timeit -s "x,y=5,1", "(x,y)" 2000000 loops, best of 5: 156 nsec per loop If one starts with a tuple, then the Point call is pure extra overhead. If one does start with a list, I get 1.85 usec and 419 nsec
-- Terry Jan Reedy
participants (8)
-
Alexander Belopolsky
-
Giampaolo Rodola'
-
Greg Ewing
-
Jim J. Jewett
-
Serhiy Storchaka
-
Steven D'Aprano
-
Terry Reedy
-
Tim Peters