[Python-Dev] Impact of Namedtuple on startup time

Giampaolo Rodola' g.rodola at gmail.com
Mon Jul 17 19:17:24 EDT 2017


On Mon, Jul 17, 2017 at 11:24 PM, Tim Peters <tim.peters at gmail.com> wrote:

> [Giampaolo Rodola' <g.rodola at gmail.com>]
> > ....
> > To be entirely honest, I'm not even sure why they need to be forcefully
> > declared upfront in the first place, instead of just having a first-class
> > function (builtin?) written in C:
> >
> > >>> ntuple(x=1, y=0)
> > (x=1, y=0)
> >
> > ...or even a literal as in:
> >
> > >>> (x=1, y=0)
> > (x=1, y=0)
>
> How do you propose that the resulting object T know that T.x is 1. T.y
> is 0, and T.z doesn't make sense?


I'm not sure I understand your concern. That's pretty much what
PyStructSequence already does.


> Declaring a namedtuple up front
> allows the _class_ to know that all of its instances map attribute "x"
> to index 0 and attribute "y" to index 1.  The instances know nothing
> about that on their own


Hence why I was talking about a "(lightweight) anonymous tuple with named
attributes". The primary use case for namedtuples is accessing values by
name (obj.x). Personally I've always considered the upfront module-level
declaration only an annoyance which unnecessarily pollutes the API and adds
extra overhead. I typically end up putting all namedtuples in a private
module:
https://github.com/giampaolo/psutil/blob/8b8da39e0c62432504fb5f67c418715aad35b291/psutil/_common.py#L156-L225
...then import them from elsewhere and make sure they are not exposed
publicly because the intermediate object returned by
collections.namedtuple() is basically useless for the end-user. Also
picking up a sensible name for the namedtuple is an annoyance and kinda
weird. Consider this:

    from collections import namedtuple

    Coordinates = namedtuple('coordinates', ['x', 'y'])

    def get_coordinates():
        return Coordinates(10, 20)

...vs. this:

    def get_coordinates():
        return ntuple(x=10, y=20)

...or this:

    def get_coordinates():
        return (x=10, y=20)

If your `ntuple()` returns an object implementing its own
> mapping, it loses a primary advantage (0 memory overhead) of
> namedtuples.
>

The extra memory overhead is a price I would be happy to pay considering
that collections.namedtuple is considerably slower than a plain tuple.
Other than the additional overhead on startup / import time, instantiation
is 4.5x slower than a plain tuple:

    $ python3.7 -m timeit -s "from collections import namedtuple; nt =
namedtuple('xxx', ('x', 'y'))" "nt(1, 2)"
    1000000 loops, best of 5: 313 nsec per loop

    $ python3.7 -m timeit "tuple((1, 2))"
    5000000 loops, best of 5: 68.4 nsec per loop

...and name access is 2x slower than index access:

    $ python3.7 -m timeit -s "from collections import namedtuple; nt =
namedtuple('xxx', ('x', 'y')); x = nt(1, 2)" "x.x"
    5000000 loops, best of 5: 41.9 nsec per loop

    $ python3.7 -m timeit -s "from collections import namedtuple; nt =
namedtuple('xxx', ('x', 'y')); x = nt(1, 2)" "x[0]"
    10000000 loops, best of 5: 20.2 nsec per loop
    $ python3.7 -m timeit -s "x = (1, 2)" "x[0]"
    10000000 loops, best of 5: 20.5 nsec per loop

-- 
Giampaolo - http://grodola.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170718/a655d09c/attachment.html>


More information about the Python-Dev mailing list