<div dir="ltr"><div class="gmail-m_8705408839792984946gmail_signature"><div dir="ltr"><div class="gmail_quote">On Tue, Jul 18, 2017 at 6:31 AM, Guido van Rossum <span dir="ltr"><<a href="mailto:guido@python.org" target="_blank">guido@python.org</a>></span> <wbr>wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><span class="gmail-m_8705408839792984946gmail-"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jul 17, 2017 at 6:25 PM, Eric Snow <span dir="ltr"><<a href="mailto:ericsnowcurrently@gmail.com" target="_blank">ericsnowcurrently@gmail.<wbr>com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, Jul 17, 2017 at 6:01 PM, Ethan Furman <<a href="mailto:ethan@stoneleaf.us" target="_blank">ethan@stoneleaf.us</a>> wrote:<br>> Guido has decreed that namedtuple shall be reimplemented with speed in mind.<br><br>FWIW, I'm sure that any changes to namedtuple will be kept as minimal<br>as possible. Changes would be limited to the underlying<br>implementation, and would not include the namedtuple() signature, or<br>using metaclasses, etc. However, I don't presume to speak for Guido<br>or Raymond. :)<span class="gmail-m_8705408839792984946gmail-m_-7103101867843701345HOEnZb"></span><br clear="all"></blockquote></div><br></div></span><div class="gmail_extra">Indeed. I referred people here for discussion of ideas like this:<br><br></div><div class="gmail_extra">>>> a = (x=1, y=0)<span class="gmail-m_8705408839792984946gmail-HOEnZb"><font color="#888888"><br></font></span></div><span class="gmail-m_8705408839792984946gmail-HOEnZb"><font color="#888888"><div class="gmail_extra"></div></font></span></div></blockquote></div><div><br></div><div><div><div>Thanks for bringing this up, I'm gonna summarize my idea in form of a PEP-like draft, hoping to collect some feedback.</div><div><br></div><div>Proposal</div><div>========</div><div><br></div><div>Introduction of a new syntax and builtin function to create lightweight namedtuples "on the fly" as in:</div><div><br></div><div>  >>> (x=10, y=20)</div><div>  (x=10, y=20)</div><div><br></div><div>  >>> ntuple(x=10, y=20)</div><div>  (x=10, y=20)</div><div><br></div><div><br></div><div>Motivations</div><div>===========</div><div><br></div><div>Avoid declaration</div><div>-----------------</div><div><br></div><div>Other than the startup time cost:</div><div><a href="https://mail.python.org/pipermail/python-dev/2017-July/148592.html">https://mail.python.org/pipermail/python-dev/2017-July/148592.html</a></div><div>...the fact that namedtuples need to be declared upfront implies they mostly end up being used only in public, end-user APIs / functions. For generic functions returning more than 1 argument it would be nice to just do:</div><div><br></div><div>  def get_coordinates():</div><div>    return (x=10, y=20)</div><div><br></div><div>...instead of:</div><div><br></div><div>  from collections import namedtuple</div><div><br></div><div>  Coordinates = namedtuple('coordinates', ['x', 'y'])</div><div><br></div><div>  def get_coordinates():</div><div>    return Coordinates(10, 20)</div><div><br></div><div>Declaration also has the drawback of unnecessarily polluting the module API with an object (Coordinates) which is rarely needed. AFAIU namedtuple was designed this way for efficiency of the pure-python implementation currently in place and for serialization purposes (e.g. pickle), but I may be missing something else. Generally namedtuples are declared in a private module, imported from elsewhere and they are never exposed in the main namespace, which is kind of annoying. In case of one module scripts it's not uncommon to add a leading underscore which makes __repr__ uglier. To me, this suggests that the factory function should have been a first-class function instead.</div><div><br></div><div>Speed</div><div>------</div><div><br></div><div>Other than the startup declaration overhead, a namedtuple is slower than a tuple or a C structseq in almost any aspect:</div><div><br></div><div>- Declaration (50x slower than cnamedtuple):</div><div><br></div><div>  $ python3.7 -m timeit -s "from collections import namedtuple" \</div><div>    "namedtuple('Point', ('x', 'y'))"</div><div>  1000 loops, best of 5: 264 usec per loop</div><div><br></div><div>  $ python3.7 -m timeit -s "from cnamedtuple import namedtuple" \</div><div>    "namedtuple('Point', ('x', 'y'))"</div><div>  50000 loops, best of 5: 5.27 usec per loop</div><div><br></div><div>- Instantiation (3.5x slower than tuple):</div><div><br></div><div>  $ python3.7 -m timeit -s "import collections; Point = collections.namedtuple('Point', ('x', 'y')); x = [1, 2]" "Point(*x)"</div><div>  1000000 loops, best of 5: 310 nsec per loop</div><div><br></div><div>  $ python3.7 -m timeit -s "x = [1, 2]" "tuple(x)"</div><div>  5000000 loops, best of 5: 88 nsec per loop</div><div><br></div><div>- Unpacking (2.8x slower than tuple):</div><div><br></div><div>  $ python3.7 -m timeit -s "import collections; p = collections.namedtuple( \</div><div>    'Point', ('x', 'y'))(5, 11)" "x, y = p"</div><div>  5000000 loops, best of 5: 41.9 nsec per loop</div><div><br></div><div>  $ python3.7 -m timeit -s "p = (5, 11)" "x, y = p"</div><div>  20000000 loops, best of 5: 14.8 nsec per loop</div><div><br></div><div>- Field access by name (1.9x slower than structseq and cnamedtuple):</div><div><br></div><div>  $ python3.7 -m timeit -s "from collections import namedtuple as nt; \</div><div>    p = nt('Point', ('x', 'y'))(5, 11)" "p.x"</div><div>  5000000 loops, best of 5: 42.7 nsec per loop</div><div><br></div><div>  $ python3.7 -m timeit -s "from cnamedtuple import namedtuple as nt; \</div><div>    p = nt('Point', ('x', 'y'))(5, 11)" "p.x"</div><div>  10000000 loops, best of 5: 22.5 nsec per loop</div><div><br></div><div>  $ python3.7 -m timeit -s "import os; p = os.times()" "p.user"</div><div>  10000000 loops, best of 5: 22.6 nsec per loop</div><div><br></div><div>- Field access by index is the same as tuple:</div><div><br></div><div>  $ python3.7 -m timeit -s "from collections import namedtuple as nt; \</div><div>    p = nt('Point', ('x', 'y'))(5, 11)" "p[0]"</div><div>  10000000 loops, best of 5: 20.3 nsec per loop</div><div><br></div><div>  $ python3.7 -m timeit -s "p = (5, 11)" "p[0]"</div><div>  10000000 loops, best of 5: 20.5 nsec per loop</div><div><br></div><div>It is being suggested that most of these complaints about speed aren't an issue but in certain circumstances such as busy loops, getattr() being 1.9x slower could make a difference, e.g.:</div><div><a href="https://github.com/python/cpython/blob/3e2ad8ec61a322370a6fbdfb2209cf74546f5e08/Lib/asyncio/selector_events.py#L523">https://github.com/python/cpython/blob/3e2ad8ec61a322370a6fbdfb2209cf74546f5e08/Lib/asyncio/selector_events.py#L523</a></div><div>Same goes for values unpacking.</div><div><br></div><div>isinstance()</div><div>------------</div><div><br></div><div>Probably a minor complaint, I just bring this up because I recently had to do this in psutil's unit tests. Anyway, checking a namedtuple instance isn't exactly straightforward:</div><div><a href="https://stackoverflow.com/a/2166841">https://stackoverflow.com/a/2166841</a></div><div><br></div><div>Backward compatibility</div><div>======================</div><div><br></div><div>This is probably the biggest barrier other than the "a C implementation is less maintainable" argument. In order to avoid duplication of functionality it would be great if collections.namedtuple() could remain a (deprecated) factory function using ntuple() internally. FWIW I tried running stdlib's unittests against <a href="https://github.com/llllllllll/cnamedtuple">https://github.com/llllllllll/cnamedtuple</a>, I removed the ones about "_source", "verbose" and "module" arguments and I get a couple of errors about __doc__. I'm not sure about more advanced use cases (subclassing, others...?) but overall it appears pretty doable.</div><div><br></div><div>collections.namedtuple() Python wrapper can include the necessary logic to implement "verbose" and "rename" parameters when they're used. I'm not entirely sure about the implications of the "module" parameter though (Raymond?).</div><div><br></div><div>_make(), _asdict(), _replace() and _fields attribute should also be exposed; as for "_source" it appears it can easily be turned into a property which would also save some memory.</div><div><br></div><div>The biggest annoyance is probably fields' __doc__ assignment:</div><div><a href="https://github.com/python/cpython/blob/ced36a993fcfd1c76637119d31c03156a8772e11/Lib/selectors.py#L53-L58">https://github.com/python/cpython/blob/ced36a993fcfd1c76637119d31c03156a8772e11/Lib/selectors.py#L53-L58</a>  </div><div>...which would require returning a clever class object slowing down the namedtuple declaration also in case no parameters are passed, but considering that the long-term plan is the replace collections.namedtuple() with ntuple() I consider this acceptable.  </div><div><br></div><div>Thoughts?</div><div><br></div><div>--</div><div>Giampaolo - <a href="http://grodola.blogspot.com">http://grodola.blogspot.com</a></div></div><div><br></div><div><br></div></div></div></div>
</div>