[Python-ideas] namedtuple literals [Was: RE a new namedtuple]

Thu Jul 27 12:22:05 EDT 2017

> To avoid introducing a new built-in, we could do object.bag =
SimpleNamespace

I am liking the idea of making SimpleNamespace more accessible, but maybe
we need to think a bit more about why one might want a tuple-with-names,
rather than just an easy way to create
an object-with-just-attributes.

That is -- how many times do folks use a namedtuple rather than
SimpleNamespace just because they know about it, rather than because
they really need it. I know that is often the case...

but here are some reasons to want an actual tuple (or, an actual
ImutableSequence)

1) Backward compatibility with tuples.
    This may have been a common use case when they were new, and maybe
still is, but If we are future-looking, I don't think this the the primary
use case. But maybe some of the features you get from that are important.

2) order-preserving: this makes them a good match for "records" from a
DB or CSV file or something.

3) unpacking: x, y = a_point

4) iterating: for coord in a_point:
                          ...

5) immutability: being able to use them as a key in a dict.

What else?

So the question is -- If we want an easier way to create a namedtuple-like
object -- which of these features are desired?

Personally, I think an immutable SimpleNamespace would be good. And if you
want the other stuff, use a NamedTuple. And a quick and easy way to make
one would be nice.

I understand that the ordering could be confusing to folks, but I'm still
thinking yes -- in the spirit of duck-typing, I think having to think
about the Type is unfortunate.

And will people really get confused if:

ntuple(x=1, y=2) == ntuple(y=2, x=1)

returns False?

If so -- then, if we are will to introduce new syntax, then we can make
that more clear.

Note that:

ntuple(x=1, y=2) == ntuple(z=1, w=2)

Should also be False.

and

ntuple(x=1, y=2) == (1, 2)

also False (this is losing tuple-compatibility)

That is, the names, and the values, and the order are all fixed.

If we use a tuple to define the "type" == ('x','y') then it's easy enough
to cache and compare based on that. If, indeed, you need to cache at all.

BTW, I think we need to be careful about what assumptions we are making in
terms of "dicts are order-preserving". My understanding is that the fact
that the latest dict in cpython is order preserving should be considered an
implementation detail, and not relied on.

But that we CAN count on **kwargs being order-preserving. That is, **kwargs
is an order-preserving mapping, but the fact that it IS a dict is an
implementation detail.

Have I got that right?

Of course, this will make it hard to back-port a "ntuple" implementation....

And

ntuple(('x', 2), ('y', 3))

is unfortunate.

-CHB

On Thu, Jul 27, 2017 at 4:48 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 27 July 2017 at 10:38, Steven D'Aprano <steve at pearwood.info> wrote:
> > On Thu, Jul 27, 2017 at 11:46:45AM +1200, Greg Ewing wrote:
> >> Nick Coghlan wrote:
> >> >The same applies to the ntuple concept, expect there it's the fact
> >> >that it's a *tuple* that conveys the "order matters" expectation.
> >>
> >> That assumes there's a requirement that it be a tuple in
> >> the first place. I don't see that requirement in the use
> >> cases suggested here so far.
> >
> > This is an excellent point. Perhaps we should just find a shorter name
> > for SimpleNamespace and promote it as the solution.
> >
> > I'm not sure about other versions, but in Python 3.5 it will even save
> > memory for small records:
> >
> > py> from types import SimpleNamespace
> > py> spam = SimpleNamespace(flavour='up', charge='1/3')
> > py> sys.getsizeof(spam)
> > 24
>
> sys.getsizeof() isn't recursive, so this is only measuring the
> overhead of CPython's per-object bookkeeping. The actual storage
> expense is incurred via the instance dict:
>
>     >>> sys.getsizeof(spam.__dict__)
>     240
>     >>> data = dict(charge='1/3', flavour='up')
>     >>> sys.getsizeof(data)
>     240
>
> Note: this is a 64-bit system, so the per-instance overhead is also
> higher (48 bytes rather than 24), and tuple incur a cost of 8 bytes
> per item rather than 4 bytes.
>
> It's simply not desirable to rely on dicts for this kind of use case,
> as the per-instance cost of their bookkeeping machinery is overly high
> for small data classes and key-sharing only mitigates that problem, it
> doesn't eliminate it.
>
> By contrast, tuples are not only the most memory efficient data
> structure Python offers, they're also one of the fastest to allocate:
> since they're fixed length, they can be allocated as a single
> contiguous block, rather than requiring multiple memory allocations
> per instance (and that's before taking the free list into account).
>
> As a result, "Why insist on a tuple?" has three main answers:
>
> - lowest feasible per-instance memory overhead
> - lowest feasible runtime allocation cost overhead
> - backwards compatibility with APIs that currently return a tuple
> without impacting either of the above benefits
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170727/f102c078/attachment.html>