[Python-Dev] Impact of Namedtuple on startup time
solipsis at pitrou.net
Mon Jul 17 08:43:19 EDT 2017
Cost of creating a namedtuple has been identified as a contributor to
Python startup time. Not only Python core and the stdlib, but any
third-party library creating namedtuple classes (there are many of
them). An issue was created for this:
Raymond decided to close the issue because:
1) the proposed resolution makes the "_source" attribute empty (or, at
least, something else than it currently is). Raymond claims the
"_source" attribute is an essential feature of namedtuples.
2) optimizing startup cost is supposedly not worth the effort.
To this, I will counter-argument:
As for 1), a search for "namedtuple" and "_source" in a code search
engine (*) brings *only* false positives of different kinds:
* clones of the CPython repo
* copies of the namedtuple class instantiation source code with slight
tweaks (*not* reading the _source attribute of an existing namedtuple)
* modules using namedtuples and also using a "_source" attribute on
As for 2), startup time is actually a very important consideration
nowadays, both for small scripts *and* for interactive use with the
now very wide-spread use of Jupyter Notebooks. A 1 ms. cost when
importing a single module can translate into a large slowdown when your
library imports (directly or indirectly) hundreds of modules, many of
which may create their own namedtuple classes.
Nick pointed out that one alternative is to make the C-written "struct
sequence" class user-visible.
My opinion is that, while better than nothing, this would complicate
things by exposing two very similar primitives in the stdlib, without
there being a clear choice for users. Should I use the well-known
namedtuple? Should I use the new-ish "struct sequence", with similar
characteristics and better performance, but worse compatibility (now I
have to write fallback code for Python versions where the "struct
sequence" isn't exposed)?
And not to mention all third-party libraries must be migrated to the
newly-exposed "struct sequence" + compatibility fallback code...
So my take is:
1) Usage of "_source" in open source code (as per the search above)
2) If the primary intent of "_source" is to show-case how to write a
tuple subclass, well, why not write a recipe or tutorial somewhere?
The Python stdlib is generally not a place where we reify tutorials or
educational snippets as public APIs.
3) The well-known namedtuple would really benefit from a performance
boost, without asking all maintainers of dependent code (that's a
*ton*) to migrate to a new idiom + compatibility fallback.
More information about the Python-Dev