namedtuple literals [Was: RE a new namedtuple]
data:image/s3,"s3://crabby-images/02573/025732c254c3bfef379ac4c320c4d99544742163" alt=""
On Tue, Jul 18, 2017 at 6:31 AM, Guido van Rossum <guido@python.org> wrote:
Thanks for bringing this up, I'm gonna summarize my idea in form of a PEP-like draft, hoping to collect some feedback. Proposal ======== Introduction of a new syntax and builtin function to create lightweight namedtuples "on the fly" as in: >>> (x=10, y=20) (x=10, y=20) >>> ntuple(x=10, y=20) (x=10, y=20) Motivations =========== Avoid declaration ----------------- Other than the startup time cost: https://mail.python.org/pipermail/python-dev/2017-July/148592.html ...the fact that namedtuples need to be declared upfront implies they mostly end up being used only in public, end-user APIs / functions. For generic functions returning more than 1 argument it would be nice to just do: def get_coordinates(): return (x=10, y=20) ...instead of: from collections import namedtuple Coordinates = namedtuple('coordinates', ['x', 'y']) def get_coordinates(): return Coordinates(10, 20) Declaration also has the drawback of unnecessarily polluting the module API with an object (Coordinates) which is rarely needed. AFAIU namedtuple was designed this way for efficiency of the pure-python implementation currently in place and for serialization purposes (e.g. pickle), but I may be missing something else. Generally namedtuples are declared in a private module, imported from elsewhere and they are never exposed in the main namespace, which is kind of annoying. In case of one module scripts it's not uncommon to add a leading underscore which makes __repr__ uglier. To me, this suggests that the factory function should have been a first-class function instead. Speed ------ Other than the startup declaration overhead, a namedtuple is slower than a tuple or a C structseq in almost any aspect: - Declaration (50x slower than cnamedtuple): $ python3.7 -m timeit -s "from collections import namedtuple" \ "namedtuple('Point', ('x', 'y'))" 1000 loops, best of 5: 264 usec per loop $ python3.7 -m timeit -s "from cnamedtuple import namedtuple" \ "namedtuple('Point', ('x', 'y'))" 50000 loops, best of 5: 5.27 usec per loop - Instantiation (3.5x slower than tuple): $ python3.7 -m timeit -s "import collections; Point = collections.namedtuple('Point', ('x', 'y')); x = [1, 2]" "Point(*x)" 1000000 loops, best of 5: 310 nsec per loop $ python3.7 -m timeit -s "x = [1, 2]" "tuple(x)" 5000000 loops, best of 5: 88 nsec per loop - Unpacking (2.8x slower than tuple): $ python3.7 -m timeit -s "import collections; p = collections.namedtuple( \ 'Point', ('x', 'y'))(5, 11)" "x, y = p" 5000000 loops, best of 5: 41.9 nsec per loop $ python3.7 -m timeit -s "p = (5, 11)" "x, y = p" 20000000 loops, best of 5: 14.8 nsec per loop - Field access by name (1.9x slower than structseq and cnamedtuple): $ python3.7 -m timeit -s "from collections import namedtuple as nt; \ p = nt('Point', ('x', 'y'))(5, 11)" "p.x" 5000000 loops, best of 5: 42.7 nsec per loop $ python3.7 -m timeit -s "from cnamedtuple import namedtuple as nt; \ p = nt('Point', ('x', 'y'))(5, 11)" "p.x" 10000000 loops, best of 5: 22.5 nsec per loop $ python3.7 -m timeit -s "import os; p = os.times()" "p.user" 10000000 loops, best of 5: 22.6 nsec per loop - Field access by index is the same as tuple: $ python3.7 -m timeit -s "from collections import namedtuple as nt; \ p = nt('Point', ('x', 'y'))(5, 11)" "p[0]" 10000000 loops, best of 5: 20.3 nsec per loop $ python3.7 -m timeit -s "p = (5, 11)" "p[0]" 10000000 loops, best of 5: 20.5 nsec per loop It is being suggested that most of these complaints about speed aren't an issue but in certain circumstances such as busy loops, getattr() being 1.9x slower could make a difference, e.g.: https://github.com/python/cpython/blob/3e2ad8ec61a322370a6fbdfb2209cf74546f5... Same goes for values unpacking. isinstance() ------------ Probably a minor complaint, I just bring this up because I recently had to do this in psutil's unit tests. Anyway, checking a namedtuple instance isn't exactly straightforward: https://stackoverflow.com/a/2166841 Backward compatibility ====================== This is probably the biggest barrier other than the "a C implementation is less maintainable" argument. In order to avoid duplication of functionality it would be great if collections.namedtuple() could remain a (deprecated) factory function using ntuple() internally. FWIW I tried running stdlib's unittests against https://github.com/llllllllll/cnamedtuple, I removed the ones about "_source", "verbose" and "module" arguments and I get a couple of errors about __doc__. I'm not sure about more advanced use cases (subclassing, others...?) but overall it appears pretty doable. collections.namedtuple() Python wrapper can include the necessary logic to implement "verbose" and "rename" parameters when they're used. I'm not entirely sure about the implications of the "module" parameter though (Raymond?). _make(), _asdict(), _replace() and _fields attribute should also be exposed; as for "_source" it appears it can easily be turned into a property which would also save some memory. The biggest annoyance is probably fields' __doc__ assignment: https://github.com/python/cpython/blob/ced36a993fcfd1c76637119d31c03156a8772... ...which would require returning a clever class object slowing down the namedtuple declaration also in case no parameters are passed, but considering that the long-term plan is the replace collections.namedtuple() with ntuple() I consider this acceptable. Thoughts? -- Giampaolo - http://grodola.blogspot.com
data:image/s3,"s3://crabby-images/02573/025732c254c3bfef379ac4c320c4d99544742163" alt=""
On Thu, Jul 20, 2017 at 2:14 AM, Giampaolo Rodola' <g.rodola@gmail.com> wrote
In case of one module scripts it's not uncommon to add a leading underscore which makes __repr__ uglier.
Actually forget about this: __repr__ is dictated by the first argument. =) -- Giampaolo - http://grodola.blogspot.com
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
The proposal in your email seems incomplete -- there's two paragraphs on the actual proposal, and the entire rest of your email is motivation. That may be appropriate for a PEP, but while you're working on a proposal it's probably better to focus on clarifying the spec. Regarding that spec, I think there's something missing: given a list (or tuple!) of values, how do you turn it into an 'ntuple'? That seems a common use case, e.g. when taking database results like row_factory in sqlite3. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/dcdbd/dcdbd8ddec664b034475bdd79a7426bde32cc735" alt=""
On Wed, Jul 19, 2017 at 6:08 PM, Guido van Rossum <guido@python.org> wrote:
One obvious choice is to allow for construction from a dict with **kwargs unpacking. This actually works now that keyword arguments are ordered. This would mean either ntuple(**kwargs) or the possibly too cute (**kwargs) .
data:image/s3,"s3://crabby-images/69c89/69c89f17a2d4745383b8cc58f8ceebca52d78bb7" alt=""
On Wed, Jul 19, 2017 at 9:08 PM, Guido van Rossum <guido@python.org> wrote:
The proposal in your email seems incomplete
The proposal does not say anything about type((x=1, y=2)). I assume it will be the same as the type currently returned by namedtuple(?, 'x y'), but will these types be cached? Will type((x=1, y=2)) is type((x=3, y=4)) be True?.
Regarding that spec, I think there's something missing: given a list (or tuple!) of values, how do you turn it into an 'ntuple'?
Maybe type((x=1, y=2))(values) will work?
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 20 July 2017 at 11:35, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Right, this is one of the key challenges to be addressed, as is the question of memory consumption - while Giampaolo's write-up is good in terms of covering the runtime performance motivation, it misses that one of the key motivations of the namedtuple design is to ensure that the amortised memory overhead of namedtuple instances is *zero*, since the name/position mapping is stored on the type, and *not* on the individual instances.
From my point of view, I think the best available answers to those questions are:
- ntuple literals will retain the low memory overhead characteristics of collections.namedtuple - we will use a type caching strategy akin to string interning - ntuple types will be uniquely identified by their field names and order - if you really want to prime the type cache, just create a module level instance without storing it: (x=1, y=2) # Prime the ntuple type cache A question worth asking will be whether or not "collections.namedtuple" will implicitly participate in the use of the type cache, and I think the answer needs to be "No". The problem is twofold: 1. collections.namedtuple accepts an additional piece of info that won't be applicable for ntuple types: the *name* 2. collections.namedtuple has existed for years *without* implicit type caching, so adding it now would be a bit weird That means the idiomatic way of getting the type of an ntuple would be to create an instance and take the type of it: type((x=1, y=2)) The could still be the same kind of type as is created by collections.namedtuple, or else a slight variant that tailors repr() and pickling support based on the fact it's a kind of tuple literal. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
I'm concerned in the proposal about losing access to type information (i.e. name) in this proposal. For example, I might write some code like this now:
The proposal to define this as:
smart = (cost=18_900, hp=89, weight=949) harley = (cost=18_900, hp=89, weight=949)
Doesn't seem to leave any way to distinguish the objects of different types that happen to have the same fields. Comparing ` smart._fields==harley._fields` doesn't help here, nor does any type constructed solely from the fields. Yes, I know a Harley-Davidson only weighs about half as much as a SmartCar, although the price and HP aren't far off. I can think of a few syntax ideas for how we might mix in a "name" to the `ntuple` objects, but I don't want to bikeshed. I'd just like to have the option of giving a name or class that isn't solely derived from the field names. On Wed, Jul 19, 2017 at 9:06 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
data:image/s3,"s3://crabby-images/735d9/735d937548be7e044a6af7241efaa4feb82d7484" alt=""
If the type is a data, it probably belongs to the inside of the tuple: smart = (type="Car", cost=18_900, hp=89, weight=949) harley = (type="Motorcycle", cost=18_900, hp=89, weight=949) both_vehicles = (type(smart) == type(harley)) # True - type+cost+hp+weight on both sides same_vehicles = (smart == harley) # False - cost, hp and weight are identical, but not type Le 20/07/17 à 07:12, David Mertz a écrit :
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
On Jul 20, 2017 1:13 AM, "David Mertz" <mertz@gnosis.cx> wrote: I'm concerned in the proposal about losing access to type information (i.e. name) in this proposal. For example, I might write some code like this now:
The proposal to define this as:
smart = (cost=18_900, hp=89, weight=949) harley = (cost=18_900, hp=89, weight=949)
Doesn't seem to leave any way to distinguish the objects of different types that happen to have the same fields. Comparing ` smart._fields==harley._fields` doesn't help here, nor does any type constructed solely from the fields. What about making a syntax to declare a type? The ones that come to mind are name = (x=, y=) Or name = (x=pass, y=pass) They may not be clear enough, though.
data:image/s3,"s3://crabby-images/4d61d/4d61d487866c8cb290837cb7b1cd911c7420eb10" alt=""
I'm not sure why everybody have such a grip on the type. When we use regular tuples, noone care, it's all tuples, no matter what. Well in that case, let's make all those namedtuple and be done with it. If somebody really needs a type, this person will either used collections.namedtuple the old way, or use a namespace or a class. If using the type "namedtuple" is an issue because it already exist, let's find a name for this new type that convey the meaning, like labelledtuple or something. The whole point of this is to make it a litteral, simple and quick to use. If you make it more than it is, we already got everything to do this and don't need to modify the language. Le 23/07/2017 à 18:08, Todd a écrit :
data:image/s3,"s3://crabby-images/57f17/57f172f0cf4086452e8f193e1590042b5113a553" alt=""
23.7.2017 20.59 "Michel Desmoulin" <desmoulinmichel@gmail.com> wrote: I'm not sure why everybody have such a grip on the type. When we use regular tuples, noone care, it's all tuples, no matter what. Well in that case, let's make all those namedtuple and be done with it. If somebody really needs a type, this person will either used collections.namedtuple the old way, or use a namespace or a class. If using the type "namedtuple" is an issue because it already exist, let's find a name for this new type that convey the meaning, like labelledtuple or something. The whole point of this is to make it a litteral, simple and quick to use. If you make it more than it is, we already got everything to do this and don't need to modify the language. +1 to this, why not just have: type((x=0, y=0)) == namedtuple similar to how tuples work. If you want to go advanced, feel free to use classes. Also, would it be crazy to suggest mixing tuples and named tuples:
Just an idea, I'm not sure if it would have any use though.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jul 23, 2017 at 07:47:16PM +0200, Michel Desmoulin wrote:
I'm not sure why everybody have such a grip on the type.
When we use regular tuples, noone care, it's all tuples, no matter what.
Some people care. This is one of the serious disadvantages of ordinary tuples as a record/struct type. There's no way to distinguish between (let's say) rectangular coordinates (1, 2) and polar coordinates (1, 2), or between (name, age) and (movie_title, score). They're all just 2-tuples. [...]
I disagree: in my opinion, the whole point is to make namedtuple faster, so that Python's startup time isn't affected so badly. Creating new syntax for a new type of tuple is scope-creep. Even if we had that new syntax, the problem of namedtuple slowing down Python startup would remain. People can't use this new syntax until they have dropped support for everything before 3.7, which might take many years. But a fast namedtuple will give them benfit immediately their users upgrade to 3.7. I agree that there is a strong case to be made for a fast, built-in, easy way to make record/structs without having to pre-declare them. But as the Zen of Python says: Now is better than never. Although never is often better than *right* now. Let's not rush into designing a poor record/struct builtin just because we have a consensus (Raymond dissenting?) that namedtuple is too slow. The two issues are, not unrelated, but orthogonal. Record syntax would be still useful even if namedtuple was accelerated, and faster namedtuple would still be necessary even if we have record syntax. I believe that a couple of people (possibly including Guido?) are already thinking about a PEP for that. If that's the case, let's wait and see what they come up with. In the meantime, lets get back to the original question here: how can we make namedtuple faster? - Guido has ruled out using a metaclass as the implementation, as that makes it hard to inherit from namedtuple and another class with a different metaclass. - Backwards compatibility is a must. - *But* maybe we can afford to bend backwards compatibility a bit. Perhaps we don't need to generate the *entire* class using exec, just __new__. - I don't think that the _source attribute itself makes namedtuple slow. That might effect the memory usage of the class object itself, but its just a name binding: result._source = class_definition The expensive part is, I'm fairly sure, this: exec(class_definition, namespace) (Taken from the 3.5 collections/__init__.py.) I asked on PythonList@python.org whether people made us of the _source attribute, and the overwhelming response was that they either didn't know it existed, or if they did know, they didn't use it. https://mail.python.org/pipermail/python-list/2017-July/723888.html *If* it is accurate to say that nobody uses _source, then perhaps we might be willing to make this minor backwards-incompatible change in 3.7 (but not in a bug-fix release): - Only the __new__ method is generated by exec (my rough tests suggest that may make namedtuple four times faster); - _source only gives the source to __new__; - or perhaps we can save backwards compatibility by making _source generate the rest of the template lazily, when needed, even if the entire template isn't used by exec. That risks getting the *actual* source and the *reported* source getting out of sync. Maybe its better to just break compatibility rather than risk introducing a discrepancy between the two. -- Steve
data:image/s3,"s3://crabby-images/4d61d/4d61d487866c8cb290837cb7b1cd911c7420eb10" alt=""
Le 24/07/2017 à 15:31, Steven D'Aprano a écrit :
You are just using my figure of speech as a way to counter argument. It's not a very useful thing to do. Of course some people care, there are always a few people caring about anything. But you just created your manual namedtuple or a namespace and be done with it. Rejecting completly the literal syntax just because it doesn't improve this use case you already had and worked but was a bit verbose is very radical. Unless you have a very nice counter proposal that makes everyone happy, accepting the current one doesn't take anything from you.
You are in the wrong thread. This thread is specifically about namedtupels literal. Making namedtuple faster can be done in many other ways and doesn't require a literal syntax. A literal syntax, while making things slightly faster by nature, is essentially to make things faster to read and write.
Again you are mixing the 2 things. This is why we have 2 threads: the debate splitted.
I agree that there is a strong case to be made for a fast, built-in, easy way to make record/structs without having to pre-declare them.
Do other languages have such a thing that can be checked against types ?
I agree. I don't thing we need to rush it. I can live without it now. I can live without it at all.
Let's not rush into designing a poor record/struct builtin just because we have a consensus (Raymond dissenting?) that namedtuple is too slow.
We don't. We can solve the slowness problem without having the namedtuple. The litteral is a convenience.
On that we agree.
Yes but it's about making classes less verbose if I recall. Or at least use the class syntax. It's nice but not the same thing. Namedtuple litterals are way more suited for scripting. You really don't want to write a class in quick scripts, when you do exploratory programming or data analysis on the fly.
In the meantime, lets get back to the original question here: how can we make namedtuple faster?
The go to the other thread for that.
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 24 July 2017 at 17:37, Michel Desmoulin <desmoulinmichel@gmail.com> wrote:
You are in the wrong thread. This thread is specifically about namedtupels literal.
In which case, did you not see Guido's post "Honestly I would like to declare the bare (x=1, y=0) proposal dead."? The namedtuple literal proposal that started this thread is no longer an option, so can we move on? Preferably by dropping the whole idea - no-one has to my mind offered any sort of "replacement namedtuple" proposal that can't be implemented as a 3rd party library on PyPI *except* the (x=1, y=0) syntax proposal, and I see no justification for adding a *fourth* implementation of this type of object in the stdlib (which means any proposal would have to include deprecation of at least one of namedtuple, structseq or types.SimpleNamespace). The only remaining discussion on the table that I'm aware of is how we implement a more efficient version of the stdlib namedtuple class (and there's not much of that to be discussed here - implementation details can be thrashed out on the tracker issue). Paul
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Mon, Jul 24, 2017 at 6:31 AM, Steven D'Aprano <steve@pearwood.info> wrote:
sure -- but Python is dynamically typed, and we all like to talk abou tit as duck typing -- so asking: Is this a "rect_coord" or a "polar_coord" object isn't only unnecessary, it's considered non-pythonic. Bad example, actually, as a rect_coord would likely have names like 'x' and 'y', while a polar_coord would have "r' and 'theta' -- showing why having a named-tuple-like structure is helpful, even without types. So back to the example before of "Motorcycle" vs "Car" -- if they have the same attributes, then who cares which it is? If there is different functionality tied to each one, then that's what classes and sub-classing are for. I think the entire point of this proposed object is that it be as lightweight as possible -- it's just a data storage object -- if you want to switch functionality on type, then use subclasses. As has been said, NameTupule is partly the way it is because it was desired to be a drop-in replacement for a regular tuple, and need to be reasonably implemented in pure python. If we can have an object that is: immutable indexable like a tuple has named attributes is lightweight and efficient I think that would be very useful, and would take the place of NamedTuple for most use-cases, while being both more pythonic and more efficient. Whether it gets a literal or a simple constructor makes little difference, though if it got a literal, it would likely end up seeing much wider use (kind of like the set literal). I disagree: in my opinion, the whole point is to make namedtuple faster,
so that Python's startup time isn't affected so badly. Creating new syntax for a new type of tuple is scope-creep.
I think making it easier to access and use is a worthwhile goal, too. If we are re-thinking this, a littel scope creep is OK. Even if we had that new syntax, the problem of namedtuple slowing down
These aren't mutually exclusive, if 3.7 has collection.NamedTuple wrap the new object. IIUC, the idea of chached types would mean that objects _would_ be a Type, even if that wasn't usually exposed -- so it could be exposed in the case where it was constructed from a collections.NamedTuple() -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Wed, Jul 19, 2017 at 9:06 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The problem with namedtuple's semantics are that they're perfect for its original use case (replacing legacy tuple returns without breaking backwards compatibility), but turn out to be sub-optimal for pretty much anything else, which is one of the motivations behind stuff like attrs and Eric's dataclasses PEP: https://github.com/ericvsmith/dataclasses/blob/61bc9354621694a93b215e79a7187... that namedtuple is already arguably *too* convenient, in the sense that it's become an attractive nuisance that gets used in places where it isn't really appropriate. Also, what's the advantage of (x=1, y=2) over ntuple(x=1, y=2)? I.e., why does this need to be syntax instead of a library? -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 20 July 2017 at 07:58, Nathaniel Smith <njs@pobox.com> wrote:
Agreed. This discussion was prompted by the fact that namedtuple class creation was slow, resulting in startup time issues. It seems to have morphed into a generalised discussion of how we design a new "named values" type. While I know that if we're rewriting the implementation, that's a good time to review the semantics, but it feels like we've gone too far in that direction. As has been noted, the new proposal - no longer supports multiple named types with the same set of field names - doesn't allow creation from a simple sequence of values I would actually struggle to see how this can be considered a replacement for namedtuple - it feels like a completely independent beast. Certainly code intended to work on multiple Python versions would seem to have no motivation to change.
Also, what's the advantage of (x=1, y=2) over ntuple(x=1, y=2)? I.e., why does this need to be syntax instead of a library?
Agreed. Now that keyword argument dictionaries retain their order, there's no need for new syntax here. In fact, that's one of the key motivating reasons for the feature. Paul
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 20 July 2017 at 10:15, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
I don't think anyone has suggested that the instance creation time penalty for namedtuple is the issue (it's the initial creation of the class that affects interpreter startup time), so it's not clear that we need to optimise that (at this stage). However, it's also true that namedtuple instances are created from sequences, not dictionaries (because the class holds the position/name mapping, so instance creation doesn't need it). So it could be argued that the backward-incompatible means of creating instances is *also* a problem because it's slower... Paul PS Taking ntuple as "here's a neat idea for a new class", rather than as a possible namedtuple replacement, changes the context of all of the above significantly. Just treating ntuple purely as a new class being proposed, I quite like it, but I'm not sure it's justified given all of the similar approaches available, so let's see how a 3rd party implementation fares. And it's too early to justify new syntax, but if the overhead of a creation function turns out to be too high in practice, we can revisit that question. But that's *not* what this thread is about, as I understand it.
data:image/s3,"s3://crabby-images/7f583/7f58305d069b61dd85ae899024335bf8cf464978" alt=""
Something probably not directly related, but since we started to talk about syntactic changes... I think what would be great to eventually have is some form of pattern matching. Essentially pattern matching could be just a "tagged" unpacking protocol. For example, something like this will simplify a common pattern with a sequence of if isinstance() branches: class Single(NamedTuple): x: int class Pair(NamedTuple): x: int y: int def func(arg: Union[Single, Pair]) -> int: whether arg: Single as a: return a + 2 Pair as a, b: return a * b else: return 0 The idea is that the expression before ``as`` is evaluated, then if ``arg`` is an instance of the result, then ``__unpack__`` is called on it. Then the resulting tuple is unpacked into the names a, b, etc. I think named tuples could provide the __unpack__, and especially it would be great for dataclasses to provide the __unpack__ method. (Maybe we can then call it __data__?) -- Ivan On 20 July 2017 at 11:39, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
On Thu, Jul 20, 2017 at 9:58 AM, Nathaniel Smith <njs@pobox.com> wrote:
Well put! I agree that adding attribute names to elements in a tuple (e.g. return values) in a backwards-compatible way is where namedtuple is great.
I do think it makes sense to add a convenient way to upgrade a function to return named values. Is there any reason why that couldn't replace structseq completely? These anonymous namedtuple classes could also be made fast to create (and more importantly, cached).
Also, what's the advantage of (x=1, y=2) over ntuple(x=1, y=2)? I.e., why does this need to be syntax instead of a library?
Indeed, we might need the syntax (x=1, y=2) later for something different. However, I hope we can forget about 'ntuple', because it suggests a tuple of n elements. Maybe something like return tuple.named(x=foo, y=bar) which is backwards compatible with return foo, bar -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
20.07.17 04:35, Alexander Belopolsky пише:
Yes, this is the key problem with this idea. If the type of every namedtuple literal is unique, this is a waste of memory and CPU time. Creating a new type is much more slower than instantiating it, even without compiling. If support the global cache of types, we have problems with mutability and life time. If types are mutable (namedtuple classes are), setting the __doc__ or __name__ attributes of type((x=1, y=2)) will affect type((x=3, y=4)). How to create two different named tuple types with different names and docstrings? In Python 2 all types are immortal, in python 3 they can be collected as ordinary objects, and you can create types dynamically without a fear of spent too much memory. If types are cached, we should take care about collecting unused types, this will significantly complicate the implementation.
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
On Fri, Jul 21, 2017 at 8:49 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
How about just making a named namedtuple if you want to mutate the type? Or perhaps make help() work better for __doc__ attributes on instances. Currently,
does not show "Hello" at all. In Python 2 all types are immortal, in python 3 they can be collected as
Hmm. Good point. Even if making large amounts of arbitrary disposable anonymous namedtuples is probably not a great idea, someone might do it. Maybe having a separate type for each anonymous named tuple is not worth it. After all, keeping references to the attribute names in the object shouldn't take up that much memory. And the tuples are probably often short-lived. Given all this, the syntax for creating anonymous namedtuples efficiently probably does not really need to be super convenient on the Python side, but having it available and unified with that structseq thing would seem useful. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Honestly I would like to declare the bare (x=1, y=0) proposal dead. Let's encourage the use of objects rather than tuples (named or otherwise) for most data exchanges. I know of a large codebase that uses dicts instead of objects, and it's a mess. I expect the bare ntuple to encourage the same chaos. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/70d22/70d229dc59c135445304f3c3ceb082e78329143f" alt=""
Languages since the original Pascal have had a way to define types by structure. If Python did the same, ntuples with the same structure would be typed "objects" that are not pre-declared. In Python's case, because typing of fields is not required and thus can't be used to hint the structures type, the names and order of fields could be used. Synthesizing a (reserved) type name for (x=1, y=0) should be straight forward. I short,
isinstance(x=None, y=None), type((x=1, y=0))) True
That can be implemented with namedtuple with some ingenious mangling for the (quasi-anonymous) type name. Equivalence of types by structure is useful, and is very different from the mess that using dicts as records can produce. Cheers, -- Juancarlo *Añez*
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 22 July 2017 at 01:18, Guido van Rossum <guido@python.org> wrote:
That sounds sensible to me - given ordered keyword arguments, anything that bare syntax could do can be done with a new builtin instead, and be inherently more self-documenting as a result. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/4d61d/4d61d487866c8cb290837cb7b1cd911c7420eb10" alt=""
Le 24/07/2017 à 16:12, Nick Coghlan a écrit :
This is the people working on big code base talking. Remember, Python is not just for Google and Dropbox. We have thousands of user just being sysadmin, mathematicians, bankers, analysts, that just want a quick way to make a record. They don't want nor need a class. Dictionaries and collections.namedtuple are verbose and so they just used regular tuples. They don't use mypy either so having a type would be moot for them. In many languages we have the opposite problem: people using classes as a container for everything. It makes things very complicated with little value. Python actually has a good balance here. Yes, Python doesn't have pattern matching witch makes it harder to check if a nested data structure match the desired schema but all in all, the bloat/expressiveness equilibrium is quite nice. A litteral namedtuple would allow a clearer way to make a quick and simple record.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 25 July 2017 at 02:46, Michel Desmoulin <desmoulinmichel@gmail.com> wrote:
Dedicated syntax: (x=1, y=0) New builtin: ntuple(x=1, y=0) So the only thing being ruled out is the dedicated syntax option, since it doesn't let us do anything that a new builtin can't do, it's harder to find help on (as compared to "help(ntuple)" or searching online for "python ntuple"), and it can't be readily backported to Python 3.6 as part of a third party library (you can't easily backport it any further than that regardless, since you'd be missing the order-preservation guarantee for the keyword arguments passed to the builtin). Having such a builtin implictly create and cache new namedtuple type definitions so the end user doesn't need to care about pre-declaring them is still fine, and remains the most straightforward way of building a capability like this atop the underlying `collections.namedtuple` type. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 25 July 2017 at 11:57, Nick Coghlan <ncoghlan@gmail.com> wrote:
I've updated the example I posted in the other thread with all the necessary fiddling required for full pickle compatibility with auto-generated collections.namedtuple type definitions: https://gist.github.com/ncoghlan/a79e7a1b3f7dac11c6cfbbf59b189621 This shows that given ordered keyword arguments as a building block, most of the actual implementation complexity now lies in designing an implicit type cache that plays nicely with the way pickle works: from collections import namedtuple class _AutoNamedTupleTypeCache(dict): """Pickle compatibility helper for autogenerated collections.namedtuple type definitions""" def __new__(cls): # Ensure that unpickling reuses the existing cache instance self = globals().get("_AUTO_NTUPLE_TYPE_CACHE") if self is None: maybe_self = super().__new__(cls) self = globals().setdefault("_AUTO_NTUPLE_TYPE_CACHE", maybe_self) return self def __missing__(self, fields): cls_name = "_ntuple_" + "_".join(fields) return self._define_new_type(cls_name, fields) def __getattr__(self, cls_name): parts = cls_name.split("_") if not parts[:2] == ["", "ntuple"]: raise AttributeError(cls_name) fields = tuple(parts[2:]) return self._define_new_type(cls_name, fields) def _define_new_type(self, cls_name, fields): cls = namedtuple(cls_name, fields) cls.__module__ = __name__ cls.__qualname__ = "_AUTO_NTUPLE_TYPE_CACHE." + cls_name # Rely on setdefault to handle race conditions between threads return self.setdefault(fields, cls) _AUTO_NTUPLE_TYPE_CACHE = _AutoNamedTupleTypeCache() def auto_ntuple(**items): cls = _AUTO_NTUPLE_TYPE_CACHE[tuple(items)] return cls(*items.values()) But given such a cache, you get implicitly defined types that are automatically shared between instances that want to use the same field names: >>> p1 = auto_ntuple(x=1, y=2) >>> p2 = auto_ntuple(x=4, y=5) >>> type(p1) is type(p2) True >>> >>> import pickle >>> p3 = pickle.loads(pickle.dumps(p1)) >>> p1 == p3 True >>> type(p1) is type(p3) True >>> >>> p1, p2, p3 (_ntuple_x_y(x=1, y=2), _ntuple_x_y(x=4, y=5), _ntuple_x_y(x=1, y=2)) >>> type(p1) <class '__main__._AUTO_NTUPLE_TYPE_CACHE._ntuple_x_y'> And writing the pickle out to a file and reloading it also works without needing to explicitly predefine that particular named tuple variant: >>> with open("auto_ntuple.pkl", "rb") as f: ... p1 = pickle.load(f) ... >>> p1 _ntuple_x_y(x=1, y=2) In effect, implicitly named tuples would be like key-sharing dictionaries, but sharing at the level of full type objects rather than key sets. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/70d22/70d229dc59c135445304f3c3ceb082e78329143f" alt=""
If an important revamp of namedtuple will happen (actually, "easy and friendly immutable structures"), I'd suggest that the new syntax is not discarded upfront, but rather be left as a final decision, after all the other forces are resolved. FWIW, there's another development thread about "easy class declarations (with typining)". From MHPOV, the threads are different enough to remain separate. Cheers! -- Juancarlo *Añez*
data:image/s3,"s3://crabby-images/2eb67/2eb67cbdf286f4b7cb5a376d9175b1c368b87f28" alt=""
On 2017-07-25 02:57, Nick Coghlan wrote:
[snip] I think it's a little like function arguments. Arguments can be all positional, but you have to decide in what order they are listed. Named arguments are clearer than positional arguments when calling functions. So an ntuple would be like a tuple, but with names (attributes) instead of positions. I don't see how they could be compatible with tuples because the positions aren't fixed. You would need a NamedTuple where the type specifies the order. I think...
data:image/s3,"s3://crabby-images/02573/025732c254c3bfef379ac4c320c4d99544742163" alt=""
On Tue, Jul 25, 2017 at 7:49 PM, MRAB <python@mrabarnett.plus.com> wrote:
Most likely ntuple() will require keyword args only, whereas for collections.namedtuple they are mandatory only during declaration. The order is the same as kwargs, so:
What's less clear is how isinstance() should behave. Perhaps:
-- Giampaolo - http://grodola.blogspot.com
data:image/s3,"s3://crabby-images/2eb67/2eb67cbdf286f4b7cb5a376d9175b1c368b87f28" alt=""
On 2017-07-25 19:48, Giampaolo Rodola' wrote:
Given:
nt = ntuple(x=1, y=2)
you have nt[0] == 1 because that's the order of the args. But what about:
nt2 = ntuple(y=2, x=1)
? Does that mean that nt[0] == 2? Presumably, yes. Does nt == nt2? If it's False, then you've lost some of the advantage of using names instead of positions. It's a little like saying that functions can be called with keyword arguments, but the order of those arguments still matters!
data:image/s3,"s3://crabby-images/02573/025732c254c3bfef379ac4c320c4d99544742163" alt=""
On Tue, Jul 25, 2017 at 9:30 PM, MRAB <python@mrabarnett.plus.com> wrote:
Mmmm excellent point. I would expect "nt == nt2" to be True because collections.namedtuple() final instance works like that (compares pure values), because at the end of the day it's a tuple subclass and so should be ntuple() (meaning I expect "isinstance(ntuple(x=1, y=2), tuple)" to be True). On the other hand it's also legitimate to expect "nt == nt2" to be False because field names are different. That would be made clear in the doc, but the fact that people will have to look it up means it's not obvious. -- Giampaolo - http://grodola.blogspot.com
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Tue, Jul 25, 2017 at 08:30:14PM +0100, MRAB wrote:
It better be.
Not at all. It's a *tuple*, so the fields have a definite order. If you don't want a tuple, why are using a tuple? Use SimpleNamespace for an unordered "bag of attributes": py> from types import SimpleNamespace py> x = SimpleNamespace(spam=4, eggs=3) py> y = SimpleNamespace(eggs=3, spam=4) py> x == y True
It's a little like saying that functions can be called with keyword arguments, but the order of those arguments still matters!
That's the wrong analogy and it won't work. But people will expect that it will, and be surprised when it doesn't! The real problem here is that we're combining two distinct steps into one. The *first* step should be to define the order of the fields in the record (a tuple): [x, y] is not the same as [y, x]. Once the field order is defined, then you can *instantiate* those fields either positionally, or by name in any order. But by getting rid of that first step, we no longer have the option to specify the order of the fields. We can only infer them from the order they are given when you instantiate the fields. Technically, Nick's scheme to implicitly cache the type could work around this at the cost of making it impossible to have two types with the same field names in different orders. Given: ntuple(y=1, x=2) ntuple could look up the *unordered set* {y, x} in the cache, and if found, use that type. If not found, define a new type with the fields in the stated order [y, x]. So now you can, or at least you will *think* that you can, safely write this: spam = ntuple(x=2, y=1, z=0) # defines the field order [x, y, z] eggs = ntuple(z=0, y=1, x=2) # instantiate using kwargs in any order assert spam=eggs But this has a hidden landmine. If *any* module happens to use ntuple with the same field names as you, but in a different order, you will have mysterious bugs: x, y, z = spam You expect x=2, y=1, z=0 because that's the way you defined the field order, but unknown to you some other module got in first and defined it as [z, y, x] and so your code will silently do the wrong thing. Even if the cache is per-module, the same problem will apply. If the spam and eggs assignments above are in different functions, the field order will depend on which function happens to be called first, which may not be easily predictable. I don't see any way that this proposal can be anything by a subtle source of bugs. We have two *incompatible* requirements: - we want to define the order of the fields according to the order we give keyword arguments; - we want to give keyword arguments in any order without caring about the field order. We can't have both, and we can't give up either without being a surprising source of annoyance and bugs. As far as I am concerned, this kills the proposal for me. If you care about field order, then use namedtuple and explicitly define a class with the field order you want. If you don't care about field order, use SimpleNamespace. -- Steve
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 26 July 2017 at 11:05, Steven D'Aprano <steve@pearwood.info> wrote:
I think the second stated requirement isn't a genuine requirement, as that *isn't* a general expectation. After all, one of the reasons we got ordered-by-default keyword arguments is because people were confused by the fact that you couldn't reliably do: mydict = collections.OrderedDict(x=1, y=2) Now, though, that's fully supported and does exactly what you'd expect: >>> from collections import OrderedDict >>> OrderedDict(x=1, y=2) OrderedDict([('x', 1), ('y', 2)]) >>> OrderedDict(y=2, x=1) OrderedDict([('y', 2), ('x', 1)]) In this case, the "order matters" expectation is informed by the nature of the constructor being called: it's an *ordered* dict, so the constructor argument order matters. The same applies to the ntuple concept, expect there it's the fact that it's a *tuple* that conveys the "order matters" expectation. ntuple(x=1, y=2) == ntuple(y=1, x=2) == tuple(1, 2) ntuple(x=2, y=1) == ntuple(y=2, x=1) == tuple(2, 1) Putting the y-coordinate first would be *weird* though, and I don't think it's an accident that we mainly discuss tuples with strong order conventions in the context of implicit typing: they're the ones where it feels most annoying to have to separately define that order rather than being able to just use it directly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Thu, Jul 27, 2017 at 02:05:47AM +1000, Nick Coghlan wrote:
Indeed. But the reason we got *keyword arguments* in the first place was so you didn't need to care about the order of parameters. As is often the case, toy examples with arguments x and y don't really demonstrate the problem in real code. We need more realistic, non-trivial examples. Most folks can remember the first two arguments to open: open(name, 'w') but for anything more complex, we not only want to skip arguments and rely on their defaults, but we don't necessarily remember the order of definition: open(name, 'w', newline='\r', encoding='macroman', errors='replace') Without checking the documentation, how many people could tell you whether that order matches the positional order? I know I couldn't. You say
Certainly, if you're used to the usual mathematics convention that the horizontal coordinate x comes first. But if you are used to the curses convention that the vertical coordinate y comes first, putting y first is completely natural. And how about ... ? ntuple(flavour='strange', spin='1/2', mass=95.0, charge='-1/3', isospin='-1/2', hypercharge='1/3') versus: ntuple(flavour='strange', mass=95.0, spin='1/2', charge='-1/3', hypercharge='1/3', isospin='-1/2') Which one is "weird"? This discussion has been taking place for many days, and it is only now (thanks to MRAB) that we've noticed this problem. I think it is dangerous to assume that the average Python coder will either: - always consistently specify the fields in the same order; - or recognise ahead of time (during the design phase of the program) that they should pre-declare a class with the fields in a particular order. Some people will, of course. But many won't. Instead, they'll happily start instantiating ntuples with keyword arguments in inconsistent order, and if they are lucky they'll get unexpected, tricky to debug exceptions. If they're unlucky, their program will silently do the wrong thing, and nobody will notice that their results are garbage. SimpleNamespace doesn't have this problem: the fields in SimpleNamespace aren't ordered, and cannot be packed or unpacked by position. namedtuple doesn't have this problem: you have to predeclare the fields in a certain order, after which you can instantiate them by keyword in any order, and unpacking the tuple will always honour that order.
I don't think that's a great analogy. There's no real equivalent of packing/unpacking OrderedDicts by position to trip us up here. It is better to think of OrderedDicts as "order-preserving dicts" rather than "dicts where the order matters". Yes, it does matter, in a weak sense. But not in the important sense of binding values to keys: py> from collections import OrderedDict py> a = OrderedDict([('spam', 1), ('eggs', 2)]) py> b = OrderedDict([('eggs', -1), ('spam', 99)]) py> a.update(b) py> a OrderedDict([('spam', 99), ('eggs', -1)]) update() has correctly bound 99 to key 'spam', even though the keys are in the wrong order. The same applies to dict unpacking: a.update(**b) In contrast, named tuples aren't just order-preserving. The field order is part of their definition, and tuple unpacking honours the field order, not the field names. While we can't update tuples in place, we can and often do unpack them into variables. When we do, we need to know the field order: flavour, charge, mass, spin, isospin, hypercharge = mytuple but the risk is that the field order may not be what we expect unless we are scrupulously careful to *always* call ntuple(...) with the arguments in the same order. -- Steve
data:image/s3,"s3://crabby-images/908c2/908c2b47e093da38afb445e2e9ca2a37007c766b" alt=""
On 2017-07-26 01:10 PM, Steven D'Aprano wrote:
The main use case for ntuple literals, imo, would be to replace functions like this:
With the more convenient for the caller
Ntuple literals don't introduce a new field-ordering problem, because this problem already existed with the bare tuple literal it replaced. In the case where you need to create compatible ntuples for multiple functions to create, collections.namedtuple is still available to predefine the named tuple type. Or you can use a one-liner helper function like this:
Alex Brault
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
On Wed, Jul 26, 2017 at 8:47 PM, Alexandre Brault <abrault@mapgears.com> wrote:
Yes, but for the caller it's just as convenient without new namedtuple syntax. If there's new *syntax* for returning multiple values, it would indeed hopefully look more into the future and not create a tuple. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
On Wed, Jul 26, 2017 at 8:10 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Careful here, this is misleading. What you say applies to the normal dict since 3.6, which now *preserves* order. But in OrderedDict, order matters in quite a strong way: od1 = OrderedDict(a=1, b=2) od2 = OrderedDict(b=2, a=1) # (kwargs order obviously matters) od1 == od2 # gives False !! od1 == dict(a=1, b=2) # gives True od2 == dict(a=1, b=2) # gives True od1 == OrderedDict(a=1, b=2) # gives True I also think this is how OrderedDict *should* behave to earn its name. It's great that we now also have an order-*preserving* dict, because often you want that, but still dict(a=!, b=2) == dict(b=2, a=1). But not in the important sense of binding values to keys:
The reason for this is that the order is determined by the first binding, not by subsequent updates to already-existing keys.
I hope this was already clear to people in the discussion, but in case not, thanks for clarifying.
This is indeed among the reasons why the tuple api is desirable mostly for backwards compatibility in existing functions, as pointed out early in this thread. New functions will hopefully use something with only attribute access to the values, unless there is a clear reason to also have integer indexing and unpacking by order. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 27 July 2017 at 03:10, Steven D'Aprano <steve@pearwood.info> wrote:
Trivial examples in ad hoc throwaway scripts, analysis notebooks, and student exercises *are* the use case. For non-trivial applications and non-trivial data structures with more than a few fields, the additional overhead of predefining and appropriately documenting a suitable class (whether with collections.namedtuple, a data class library like attrs, or completely by hand) is going to be small relative to the overall complexity of the application, so the most sensible course of action is usually going to be to just go ahead and do that. Instead, as Alexandre describes, the use cases that the ntuple builtin proposal aims to address *aren't* those where folks are already willing to use a properly named tuple: it's those where they're currently creating a *plain* tuple, and we want to significantly lower the barrier to making such objects a bit more self-describing, by deliberately eliminating the need to predefine the related type. In an educational setting, it may even provide a gentler introduction to the notion of custom class definitions for developers just starting out.
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 07/26/2017 09:05 AM, Nick Coghlan wrote:
On 26 July 2017 at 11:05, Steven D'Aprano <steve@pearwood.info> wrote:
I have to agree with D'Aprano on this one. I certainly do not *expect* keyword argument position to matter, and it seems to me the primary reason to make it matter was not for dicts, but because a class name space is implemented by dicts. Tuples, named or otherwise, are positional first -- order matters. Specifying point = ntuple(y=2, x=-3) and having point[0] == 3 is going to be bizarre. This will be a source for horrible bugs. -- ~Ethan~
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
Ethan Furman writes:
I don't see how you get that? Anyway, I expect that ntuples will *very* frequently be *written* in an order-dependent (and probably highly idiomatic) way, and *read* using attribute notation: def db_from_csv(sheet): db = [] names = next(sheet) for row in sheet: db.append(ntuple(**zip(names, row))) return db my_db = [] for sheet in my_sheets: my_db.extend(db_from_csv(sheet)) x_index = my_db[:].sort(key=lambda row: row.x) y_index = my_db[:].sort(key=lambda row: row.y) (untested). As far as I can see, this is just duck-typed collection data, as Chris Barker puts it. Note that the above idiom can create a non-rectangular database from sheets of arbitrary column orders as long as both 'x' and 'y' columns are present in all of my sheets. A bit hacky, but it's the kind of thing you might do in a one-off script, or when you're aggregating data collected by unreliable RAs. Sure, this can be abused, but an accidental pitfall? Seems to me just as likely that you'd get that with ordinary tuples. I can easily imagine scenarios like "Oh, these are tuples but I need even *more* performance. I know! I'll read my_db into a numpy array!" But I would consider that an abuse, or at least a hack (consider how you'd go about getting variable names for the numpy array columns).
data:image/s3,"s3://crabby-images/dd81a/dd81a0b0c00ff19c165000e617f6182a8ea63313" alt=""
On 07/27/2017 06:24 PM, Stephen J. Turnbull wrote:
Ethan Furman writes:
How I get the point[0] == 3? The first definition of an ntuple had the order as x, y, and since the proposal is only comparing field names (not order), this (y, x) ntuple ends up being reversed to how it was specified.
Sure, but they can also be unpacked, and order matters there. Also, as D'Aprano pointed out, if the first instance of an ntuple has the fields in a different order than expected, all subsequent ntuples that are referenced in an order-dependent fashion will be returning data from the wrong indexes. -- ~Ethan~
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Thu, Jul 27, 2017 at 7:42 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
I'm not sure there ever was a "proposal" per se, but: ntuple(x=a, y=b) had better be a different type than: ntuple(y=b, x=a) but first we need to decide if we want an easy way to make an namedtuple-like object or a SimpleNemaspace-like object.... but if you are going to allow indexing by integer, then order needs to be part of the definition. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/b95e3/b95e396bc8fdf61a56bb414dc1bca38be1beca74" alt=""
My $0.02 on the entire series of nametuple threads is… there *might* be value in an immutable namespace type, and a mutable namespace type, but namedtuple’s promise is that they can be used anywhere a tuple can be used. If passing in kwargs to create the potential replacement to namedtuple is sensitive to dict iteration order, it really isn’t a viable replacement for namedtuple. I do feel like there isn’t that big of a usecase for an immutable namespace type as there is for a namedtuple. I would rather namedtuple class creation be quicker. From: Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com@python.org] On Behalf Of Chris Barker Sent: Friday, July 28, 2017 8:27 PM To: Ethan Furman <ethan@stoneleaf.us> Cc: Python-Ideas <python-ideas@python.org> Subject: Re: [Python-ideas] namedtuple literals [Was: RE a new namedtuple] On Thu, Jul 27, 2017 at 7:42 PM, Ethan Furman <ethan@stoneleaf.us <mailto:ethan@stoneleaf.us> > wrote: How I get the point[0] == 3? The first definition of an ntuple had the order as x, y, and since the proposal is only comparing field names (not order), this (y, x) ntuple ends up being reversed to how it was specified. I'm not sure there ever was a "proposal" per se, but: ntuple(x=a, y=b) had better be a different type than: ntuple(y=b, x=a) but first we need to decide if we want an easy way to make an namedtuple-like object or a SimpleNemaspace-like object.... but if you are going to allow indexing by integer, then order needs to be part of the definition. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov <mailto:Chris.Barker@noaa.gov>
data:image/s3,"s3://crabby-images/4d61d/4d61d487866c8cb290837cb7b1cd911c7420eb10" alt=""
Le 29/07/2017 à 18:14, Alex Walters a écrit :
In Python 3.6, kwargs order is preserved and guaranteed. It's currently implemented by relying on the non guaranteed dict order. But the 2 are not linked. The spec does guaranty that for now on, kwargs order is always preserved whether the dict order is or not.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 30 July 2017 at 20:03, Alex Walters <tritium-list@sdamon.com> wrote:
Did you mean "MyNT(**data)" in the last line? Either way, this is just normal predefined namedtuple creation, where the field order is set when the type is defined. Rather than being about any changes on that front, these threads are mostly about making it possible to write that first line as: MyNT = type(implicitly_typed_named_tuple_factory(foo=None, bar=None)) ... (While they do occasionally veer into discussing the idea of yet-another-kind-of-data-storage-type, that is an extraordinarily unlikely outcome) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 30 July 2017 at 16:24, Nick Coghlan <ncoghlan@gmail.com> wrote:
Is that really true, though? There's a lot of discussion about whether ntuple(x=1, y=2) and ntuple(y=2, x=1) are equal (which implies they are the same type). If there's any way they can be the same type, then your definition of MyNT above is inherently ambiguous, depending on whether we've previously referred to implicitly_typed_named_tuple_factory(bar=None, foo=None). For me, the showstopper with regard to this whole discussion about ntuple(x=1, y=2) is this key point - every proposed behaviour has turned out to be surprising to someone (and not just in a "hmm, that's odd" sense, but rather in the sense that it'd almost certainly result in bugs as a result of misunderstood behaviour). Paul
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 31 July 2017 at 04:31, Paul Moore <p.f.moore@gmail.com> wrote:
No, they're different types, because the requested field order is different, just as if you made two separate calls to "collections.namedtuple". If you want them to be the same type, so that the parameter order in the second call gets ignored, then you need to ask for that explicitly (either by using "collections.namedtuple" directly, or by calling type() on an implicitly typed instance), or else by keeping the field order consistent.
This is why any implicit type definition would *have* to use the field order as given: anything else opens up the opportunity for action-at-a-distance that changes the field order based on the order in which instances are created. (Even without that concern, you'd also get a problematic combinatorial expansion when searching for matching existing field definitions as the number of field names increases)
I suspect the only way it would make sense is if the addition was made in tandem with a requirement that the builtin dictionary type be insertion ordered by default. The reason I say that is that given such a rule, it would *consistently* be true that: tuple(dict(x=1, y=2).items()) != tuple(dict(y=2, y=1).items()) Just as this is already reliably true in Python 3.6 today: >>> from collections import OrderedDict >>> x_first = tuple(OrderedDict(x=1, y=2).items()) >>> y_first = tuple(OrderedDict(y=2, x=1).items()) >>> x_first != y_first True >>> x_first (('x', 1), ('y', 2)) >>> y_first (('y', 2), ('x', 1)) In both PyPy and CPython 3.6+, that's actually true for the builtin dict as well (since their builtin implementations are order preserving and that's now a requirement for keyword argument and class execution namespace handling). That way, the invariant that folks would need to learn would just be: ntuple(x=1, y=2) == tuple(dict(x=1, y=2).values()) ntuple(y=2, x=1) == tuple(dict(y=2, x=1).values()) rather than the current: from collections import OrderedDict auto_ntuple(x=1, y=2) == tuple(OrderedDict(x=1, y=2).values()) auto_ntuple(y=2, x=1) == tuple(OrderedDict(y=2, x=1).values()) (Using Python 3.6 and auto_ntuple from https://gist.github.com/ncoghlan/a79e7a1b3f7dac11c6cfbbf59b189621#file-auto_... ) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/57f17/57f172f0cf4086452e8f193e1590042b5113a553" alt=""
I've been experimenting with this: class QuickNamedTuple(tuple): def __new__(cls, **kwargs): inst = super().__new__(cls, tuple(kwargs.values())) inst._names = tuple(kwargs.keys()) return inst def __getattr__(self, attr): if attr in self._names: return self[self._names.index(attr)] raise AttributeError(attr) def __repr__(self): values = [] for i, name in enumerate(self._names): values.append(f'{name}={self[i]}') return f'({", ".join(values)})' It's a quick scrap and probably not ideal code, but the idea is the point. I believe this is how the new "quick" named tuple should ideally work: In: ntuple = QuickNamedTuple(x=1, y=2, z=-1) In: ntuple Out: (x=1, y=2, z=-1) In: ntuple[1] == ntuple.y Out: True In: ntuple == (1, 2, 3) Out: True In: ntuple == QuickNamedTuple(z=-1, y=2, x=1) Out: False So yeah, the order of the keyword arguments would matter in my case, and I've found it to work the best. How often do you get the keywords in a random order? And for those cases, you have SimpleNameSpace, or you can just use the old namedtuple. But most of the time you always have the same attributes in the same order (think of reading a CSV for example), and this would be just a normal tuple, but with custom names for the indexes. Just my two cents and thoughts from an everyday Python developer.
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
On 7/30/2017 2:57 PM, Markus Meskanen wrote:
Using a name to position map: class QuickNamedTuple(tuple): def __new__(cls, **kwargs): inst = super().__new__(cls, tuple(kwargs.values())) inst._namepos = {name: i for i, name in enumerate(kwargs.keys())} return inst def __getattr__(self, attr): try: return self[self._namepos[attr]] except KeyError: raise AttributeError(attr) from None def __repr__(self): values = [] for name, i in self._namepos.items(): values.append(f'{name}={self[i]}') return f'({", ".join(values)})' Same outputs as above. -- Terry Jan Reedy
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jul 30, 2017 at 09:57:19PM +0300, Markus Meskanen wrote:
"Random" order? Never. *Arbitrary* order? All the time. That's the whole point of keyword arguments: you don't have to care about the order.
And for those cases, you have SimpleNameSpace,
Which is no good for when you need a tuple.
If you're reading from CSV, you probably aren't specifying the arguments by keyword, you're probably reading them and assigning by position. You may not even know what the columns are until you read the CSV file. Let's think some more about reading from a CSV file. How often do you have three one-letter column names like "x", "y", "z"? I don't know about you, but for me, never. I'm more likely to have a dozen columns, or more, and I can't remember and don't want to remember what order they're supposed to be *every single time* I read a row or make a tuple of values. The point of using keywords is to avoid needing to remember the order. If I have to remember the order, why bother naming them? I think this proposal combines the worst of both worlds: - like positional arguments, you have to care about the order, and if you get it wrong, your code will likely silently break in a hard to debug way; - and like keyword arguments, you have the extra typing of having to include the field names; - but unlike keyword arguments, you have to include every single one, in the right order. -- Steve
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
You apparently live in a halcyon world of data cleanliness where CSV data is so well behaved. In my world, I more typically deal with stuff like data1.csv: -------------- name,age,salaryK John,39,50 Sally,52,37 data2.csv: -------------- name,salaryK,age Juan,47,31 Siu,88,66 I'm likely to define different namedtuples for dealing with this: NameSalAge = namedtuple('NSA','name salary age') NameAgeSal = namedtuple('NAS','name age salary') Then later, indeed, I might ask: if employee1.salary == employee2.salary: ... And this would work even though I got the data from the different formats. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
Yep. DictRreader is better for my simple example. Just pointing out that encountering attributes in different orders isn't uncommon. On Jul 30, 2017 10:55 PM, "Chris Angelico" <rosuav@gmail.com> wrote: On Mon, Jul 31, 2017 at 3:41 PM, David Mertz <mertz@gnosis.cx> wrote:
Then you want csv.DictReader and dictionary lookups. ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Nick Coghlan wrote:
The same applies to the ntuple concept, expect there it's the fact that it's a *tuple* that conveys the "order matters" expectation.
That assumes there's a requirement that it be a tuple in the first place. I don't see that requirement in the use cases suggested here so far. -- Greg
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Thu, Jul 27, 2017 at 11:46:45AM +1200, Greg Ewing wrote:
This is an excellent point. Perhaps we should just find a shorter name for SimpleNamespace and promote it as the solution. I'm not sure about other versions, but in Python 3.5 it will even save memory for small records: py> from types import SimpleNamespace py> spam = SimpleNamespace(flavour='up', charge='1/3') py> sys.getsizeof(spam) 24 py> from collections import namedtuple py> eggs = namedtuple('record', 'flavour charge')(charge='1/3', flavour='up') py> sys.getsizeof(eggs) 32 py> sys.getsizeof(('up', '1/3')) 32 -- Steve
data:image/s3,"s3://crabby-images/29b39/29b3942a63eb62ccdbf1017071ca08bf05e5ca70" alt=""
Many times in the olden days when I needed a bag o' attributes to be passed around like a struct I'd make a dummy class, then instantiate it. (A lot harder than the javascript equivalent.) Unfortunately, the modern Python solution: from types import SimpleNamespace as ns is only a few characters shorter. Perhaps a 'ns()' or 'bag()' builtin alias could fit the bill. Another idea I had not too long ago, was to let an object() be writable, then no further changes would be necessary. -Mike On 2017-07-26 17:38, Steven D'Aprano wrote:
This is an excellent point. Perhaps we should just find a shorter name for SimpleNamespace and promote it as the solution.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 27 July 2017 at 10:38, Steven D'Aprano <steve@pearwood.info> wrote:
sys.getsizeof() isn't recursive, so this is only measuring the overhead of CPython's per-object bookkeeping. The actual storage expense is incurred via the instance dict: >>> sys.getsizeof(spam.__dict__) 240 >>> data = dict(charge='1/3', flavour='up') >>> sys.getsizeof(data) 240 Note: this is a 64-bit system, so the per-instance overhead is also higher (48 bytes rather than 24), and tuple incur a cost of 8 bytes per item rather than 4 bytes. It's simply not desirable to rely on dicts for this kind of use case, as the per-instance cost of their bookkeeping machinery is overly high for small data classes and key-sharing only mitigates that problem, it doesn't eliminate it. By contrast, tuples are not only the most memory efficient data structure Python offers, they're also one of the fastest to allocate: since they're fixed length, they can be allocated as a single contiguous block, rather than requiring multiple memory allocations per instance (and that's before taking the free list into account). As a result, "Why insist on a tuple?" has three main answers: - lowest feasible per-instance memory overhead - lowest feasible runtime allocation cost overhead - backwards compatibility with APIs that currently return a tuple without impacting either of the above benefits Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
To avoid introducing a new built-in, we could do object.bag = SimpleNamespace
I am liking the idea of making SimpleNamespace more accessible, but maybe we need to think a bit more about why one might want a tuple-with-names, rather than just an easy way to create an object-with-just-attributes. That is -- how many times do folks use a namedtuple rather than SimpleNamespace just because they know about it, rather than because they really need it. I know that is often the case... but here are some reasons to want an actual tuple (or, an actual ImutableSequence) 1) Backward compatibility with tuples. This may have been a common use case when they were new, and maybe still is, but If we are future-looking, I don't think this the the primary use case. But maybe some of the features you get from that are important. 2) order-preserving: this makes them a good match for "records" from a DB or CSV file or something. 3) unpacking: x, y = a_point 4) iterating: for coord in a_point: ... 5) immutability: being able to use them as a key in a dict. What else? So the question is -- If we want an easier way to create a namedtuple-like object -- which of these features are desired? Personally, I think an immutable SimpleNamespace would be good. And if you want the other stuff, use a NamedTuple. And a quick and easy way to make one would be nice. I understand that the ordering could be confusing to folks, but I'm still thinking yes -- in the spirit of duck-typing, I think having to think about the Type is unfortunate. And will people really get confused if: ntuple(x=1, y=2) == ntuple(y=2, x=1) returns False? If so -- then, if we are will to introduce new syntax, then we can make that more clear. Note that: ntuple(x=1, y=2) == ntuple(z=1, w=2) Should also be False. and ntuple(x=1, y=2) == (1, 2) also False (this is losing tuple-compatibility) That is, the names, and the values, and the order are all fixed. If we use a tuple to define the "type" == ('x','y') then it's easy enough to cache and compare based on that. If, indeed, you need to cache at all. BTW, I think we need to be careful about what assumptions we are making in terms of "dicts are order-preserving". My understanding is that the fact that the latest dict in cpython is order preserving should be considered an implementation detail, and not relied on. But that we CAN count on **kwargs being order-preserving. That is, **kwargs is an order-preserving mapping, but the fact that it IS a dict is an implementation detail. Have I got that right? Of course, this will make it hard to back-port a "ntuple" implementation.... And ntuple(('x', 2), ('y', 3)) is unfortunate. -CHB On Thu, Jul 27, 2017 at 4:48 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Fri, Jul 28, 2017 at 7:22 AM, Pavol Lisy <pavol.lisy@gmail.com> wrote:
What you're asking for is something like JavaScript's "object destructuring" syntax. It would sometimes be cool, but I haven't ever really yearned for it in Python. But you'd need to decide whether you want attributes (spam.x, spam.y) or items (spam["x"], spam["y"]). Both would be useful at different times. ChrisA
data:image/s3,"s3://crabby-images/57f17/57f172f0cf4086452e8f193e1590042b5113a553" alt=""
But you'd need to decide whether you
want attributes (spam.x, spam.y) or items (spam["x"], spam["y"]). Both would be useful at different times. ChrisA If something like this was ever added, it'd probably be items, then you could implement a custom __unpack__ (or whatever name it'd be) method that would return a dict.
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Thu, Jul 27, 2017 at 2:50 PM, Chris Angelico <rosuav@gmail.com> wrote:
Wasn't there just a big long discussion about something like that on this list? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Fri, Jul 28, 2017 at 9:31 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Yeah, and the use cases just aren't as strong in Python. I think part of it is because a JS function taking keyword arguments looks like this: function fetch(url, options) { const {method, body, headers} = options; // ... } fetch("http://httpbin.org/post", {method: "POST", body: "blah"}); whereas Python would spell it this way: def fetch(url, *, method="GET", body=None, headers=[]): ... fetch("http://httpbin.org/post", method="POST", body="blah"); So that's one big slab of use-case gone, right there. ChrisA
data:image/s3,"s3://crabby-images/d1d84/d1d8423b45941c63ba15e105c19af0a5e4c41fda" alt=""
MRAB writes:
Sure. And if you use a dict, you've lost some of the advantage of using names instead positions too. I'm not sure a somewhat hacky use case (see my reply to Ethan elsewhere in the thread) justifies a builtin, but I can easily see using it myself if it did exist. Steve
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 26 July 2017 at 03:49, MRAB <python@mrabarnett.plus.com> wrote:
Python 3.6+ guarantees that keyword-argument order is preserved in the namespace passed to the called function. This means that a function that only accepts **kwargs can reliably check the order of arguments used in the call, and hence tell the difference between: ntuple(x=1, y=2) ntuple(y=1, x=2) So because these are implicitly typed, you *can't* put the arguments in an arbitrary order - you have to put them in the desired field order, or you're going to accidentally define a different type. If that possibility bothers someone and they want to avoid it, then the solution is straightforward: predefine an explicit type, and use that instead of an implicitly defined one. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Nick Coghlan wrote:
New builtin:
ntuple(x=1, y=0)
Do we really want this to be a tuple, with ordered fields? If so, what determines the order? If it's the order of the keyword arguments, this means that ntuple(x=1, y=0) and ntuple(y=0, x=1) would give objects with different behaviour. This goes against the usual expectation that keyword arguments of a constructor can be written in any order. That's one of the main benefits of using keyword arguments, that you don't have to remember a specific order for them. If we're going to have such a type, I suggest making it a pure named-fields object without any tuple aspects. In which case "ntuple" wouldn't be the right name for it, and something like "record" or "struct" would be better. Also, building a whole type object for each combination of fields seems like overkill to me. Why not have just one type of object with an attribute referencing a name-to-slot mapping? -- Greg
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Wed, Jul 26, 2017 at 11:58:44AM +1200, Greg Ewing wrote:
Guido's time machine strikes again. from types import SimpleNamespace By the way: records and structs define their fields in a particular order too. namedtuple does quite well at modelling records and structs in other languages.
You mean one globally shared mapping for all ntuples? So given: spam = ntuple(name="fred", age=99) eggs = ntuple(model=2, colour="green") we would have spam.colour == 99, and eggs.name == 2. Personally, I think this whole proposal for implicitly deriving type information from the way we instantiate a tuple is a bad idea. I don't see this becoming anything other than a frustrating and annoying source of subtle, hard to diagnose bugs. -- Steve
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Fri, Jul 21, 2017 at 8:18 AM, Guido van Rossum <guido@python.org> wrote:
I've seen the same sort of mess, but I think it's because folks have come down on the wrong side of "what's code, and what's data?" Data belongs in dicts (and tuples, and lists, and...) and code belongs in objects. With Python's dynamic nature, it is very easy to blur these lines, but the way I define it: a_point['x'] is accessing data, and a_point.x is running code. It more or less comes down to -- "if you know the names you need when you are writing the code, then it is probably code. So be wary if you are using literals for dict keys frequently. But from this perspective a NamedTuple (with or without a clearly defined type) is code, as it should be. In the duck-typing spirit, you should be able to do something like: p = get_the_point(something) do_something_with(p.x, p.y) And not know or care what type p is. With this perspective, a NamedTuple, with a known type or otherwise, AVOIDS the chaos of passing dicts around, and thus should be encouraged. And indeed, making it as easy as possible to create and pass an object_with_attributes around, rather than a plain tuple or dict would be a good thing. I do agree that we have multiple goals on the table, and DON'T want to have any more similar, but slightly different, lightweight objects with named attributes. So it makes sense to: 1) make namedtuple faster and then, optionally: 2) make it easier to quickly whip out an (anonymous) namedtuple. Maybe types.SimpleNamespace is the "better" solution to the above, but it hasn't gained the traction that namedtuple has. And there is a lot to be said for imutablilty, and the SimpleNamespace docs even say: "... for a structured record type use namedtuple() <https://docs.python.org/3/library/collections.html#collections.namedtuple> instead." -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/02573/025732c254c3bfef379ac4c320c4d99544742163" alt=""
On Thu, Jul 20, 2017 at 3:35 AM, Alexander Belopolsky < alexander.belopolsky@gmail.com> wrote:
I suppose that the type should be immutable at least as long as field names are the same, and the cache will occur on creation, in order to retain the 0 memory footprint. Will type((x=1, y=2)) is type((x=3, y=4)) be True?. Yes.
Maybe type((x=1, y=2))(values) will work?
It's supposed to behave like a tuple or any other primitive type (list, set, etc.), so yes.
Regarding that spec, I think there's something missing: given a list (or tuple!) of values, how do you turn it into an 'ntuple'?
As already suggested, it probably makes sense to just reuse the dict syntax:
data:image/s3,"s3://crabby-images/b3d87/b3d872f9a7bbdbbdbd3c3390589970e6df22385a" alt=""
For me, namedtuple was first used to upgrade an old API from returning a tuple to a "named" tuple. There was a hard requirement on backward compatibility: namedtuple API is a superset of the tuple API. For new code, there is no such backward compatibility issue. If you don't need a type, types.Namespace is a good choice. Using ns=types.Namespace, you can replace (x=0, y=1) with ns(x=0, y=1). It already works, no syntax change. *If* someone really wants (x=0, y=1) syntax sugar, I would prefer to get a namespace (no indexed (tuple) API). Victor Le 20 juil. 2017 2:15 AM, "Giampaolo Rodola'" <g.rodola@gmail.com> a écrit : On Tue, Jul 18, 2017 at 6:31 AM, Guido van Rossum <guido@python.org> wrote:
Thanks for bringing this up, I'm gonna summarize my idea in form of a PEP-like draft, hoping to collect some feedback. Proposal ======== Introduction of a new syntax and builtin function to create lightweight namedtuples "on the fly" as in: >>> (x=10, y=20) (x=10, y=20) >>> ntuple(x=10, y=20) (x=10, y=20) Motivations =========== Avoid declaration ----------------- Other than the startup time cost: https://mail.python.org/pipermail/python-dev/2017-July/148592.html ...the fact that namedtuples need to be declared upfront implies they mostly end up being used only in public, end-user APIs / functions. For generic functions returning more than 1 argument it would be nice to just do: def get_coordinates(): return (x=10, y=20) ...instead of: from collections import namedtuple Coordinates = namedtuple('coordinates', ['x', 'y']) def get_coordinates(): return Coordinates(10, 20) Declaration also has the drawback of unnecessarily polluting the module API with an object (Coordinates) which is rarely needed. AFAIU namedtuple was designed this way for efficiency of the pure-python implementation currently in place and for serialization purposes (e.g. pickle), but I may be missing something else. Generally namedtuples are declared in a private module, imported from elsewhere and they are never exposed in the main namespace, which is kind of annoying. In case of one module scripts it's not uncommon to add a leading underscore which makes __repr__ uglier. To me, this suggests that the factory function should have been a first-class function instead. Speed ------ Other than the startup declaration overhead, a namedtuple is slower than a tuple or a C structseq in almost any aspect: - Declaration (50x slower than cnamedtuple): $ python3.7 -m timeit -s "from collections import namedtuple" \ "namedtuple('Point', ('x', 'y'))" 1000 loops, best of 5: 264 usec per loop $ python3.7 -m timeit -s "from cnamedtuple import namedtuple" \ "namedtuple('Point', ('x', 'y'))" 50000 loops, best of 5: 5.27 usec per loop - Instantiation (3.5x slower than tuple): $ python3.7 -m timeit -s "import collections; Point = collections.namedtuple('Point', ('x', 'y')); x = [1, 2]" "Point(*x)" 1000000 loops, best of 5: 310 nsec per loop $ python3.7 -m timeit -s "x = [1, 2]" "tuple(x)" 5000000 loops, best of 5: 88 nsec per loop - Unpacking (2.8x slower than tuple): $ python3.7 -m timeit -s "import collections; p = collections.namedtuple( \ 'Point', ('x', 'y'))(5, 11)" "x, y = p" 5000000 loops, best of 5: 41.9 nsec per loop $ python3.7 -m timeit -s "p = (5, 11)" "x, y = p" 20000000 loops, best of 5: 14.8 nsec per loop - Field access by name (1.9x slower than structseq and cnamedtuple): $ python3.7 -m timeit -s "from collections import namedtuple as nt; \ p = nt('Point', ('x', 'y'))(5, 11)" "p.x" 5000000 loops, best of 5: 42.7 nsec per loop $ python3.7 -m timeit -s "from cnamedtuple import namedtuple as nt; \ p = nt('Point', ('x', 'y'))(5, 11)" "p.x" 10000000 loops, best of 5: 22.5 nsec per loop $ python3.7 -m timeit -s "import os; p = os.times()" "p.user" 10000000 loops, best of 5: 22.6 nsec per loop - Field access by index is the same as tuple: $ python3.7 -m timeit -s "from collections import namedtuple as nt; \ p = nt('Point', ('x', 'y'))(5, 11)" "p[0]" 10000000 loops, best of 5: 20.3 nsec per loop $ python3.7 -m timeit -s "p = (5, 11)" "p[0]" 10000000 loops, best of 5: 20.5 nsec per loop It is being suggested that most of these complaints about speed aren't an issue but in certain circumstances such as busy loops, getattr() being 1.9x slower could make a difference, e.g.: https://github.com/python/cpython/blob/3e2ad8ec61a322370a6fbdfb2209cf 74546f5e08/Lib/asyncio/selector_events.py#L523 Same goes for values unpacking. isinstance() ------------ Probably a minor complaint, I just bring this up because I recently had to do this in psutil's unit tests. Anyway, checking a namedtuple instance isn't exactly straightforward: https://stackoverflow.com/a/2166841 Backward compatibility ====================== This is probably the biggest barrier other than the "a C implementation is less maintainable" argument. In order to avoid duplication of functionality it would be great if collections.namedtuple() could remain a (deprecated) factory function using ntuple() internally. FWIW I tried running stdlib's unittests against https://github.com/llllllllll/cnamedtuple, I removed the ones about "_source", "verbose" and "module" arguments and I get a couple of errors about __doc__. I'm not sure about more advanced use cases (subclassing, others...?) but overall it appears pretty doable. collections.namedtuple() Python wrapper can include the necessary logic to implement "verbose" and "rename" parameters when they're used. I'm not entirely sure about the implications of the "module" parameter though (Raymond?). _make(), _asdict(), _replace() and _fields attribute should also be exposed; as for "_source" it appears it can easily be turned into a property which would also save some memory. The biggest annoyance is probably fields' __doc__ assignment: https://github.com/python/cpython/blob/ced36a993fcfd1c76637119d31c031 56a8772e11/Lib/selectors.py#L53-L58 ...which would require returning a clever class object slowing down the namedtuple declaration also in case no parameters are passed, but considering that the long-term plan is the replace collections.namedtuple() with ntuple() I consider this acceptable. Thoughts? -- Giampaolo - http://grodola.blogspot.com _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
data:image/s3,"s3://crabby-images/2f884/2f884aef3ade483ef3f4b83e3a648e8cbd09bb76" alt=""
On Thu, Jul 20, 2017 at 5:19 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
It's a minor point, but the main reason I use namedtuple is because it's far easier to get a hashable object than writing one yourself. Namespaces are not hashable. If the (x=0, y=1) sugar is accepted, IMO it should immutable and hashable like tuples/namedtuples. Best, Lucas
data:image/s3,"s3://crabby-images/02573/025732c254c3bfef379ac4c320c4d99544742163" alt=""
On Thu, Jul 20, 2017 at 2:14 AM, Giampaolo Rodola' <g.rodola@gmail.com> wrote
In case of one module scripts it's not uncommon to add a leading underscore which makes __repr__ uglier.
Actually forget about this: __repr__ is dictated by the first argument. =) -- Giampaolo - http://grodola.blogspot.com
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
The proposal in your email seems incomplete -- there's two paragraphs on the actual proposal, and the entire rest of your email is motivation. That may be appropriate for a PEP, but while you're working on a proposal it's probably better to focus on clarifying the spec. Regarding that spec, I think there's something missing: given a list (or tuple!) of values, how do you turn it into an 'ntuple'? That seems a common use case, e.g. when taking database results like row_factory in sqlite3. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/dcdbd/dcdbd8ddec664b034475bdd79a7426bde32cc735" alt=""
On Wed, Jul 19, 2017 at 6:08 PM, Guido van Rossum <guido@python.org> wrote:
One obvious choice is to allow for construction from a dict with **kwargs unpacking. This actually works now that keyword arguments are ordered. This would mean either ntuple(**kwargs) or the possibly too cute (**kwargs) .
data:image/s3,"s3://crabby-images/69c89/69c89f17a2d4745383b8cc58f8ceebca52d78bb7" alt=""
On Wed, Jul 19, 2017 at 9:08 PM, Guido van Rossum <guido@python.org> wrote:
The proposal in your email seems incomplete
The proposal does not say anything about type((x=1, y=2)). I assume it will be the same as the type currently returned by namedtuple(?, 'x y'), but will these types be cached? Will type((x=1, y=2)) is type((x=3, y=4)) be True?.
Regarding that spec, I think there's something missing: given a list (or tuple!) of values, how do you turn it into an 'ntuple'?
Maybe type((x=1, y=2))(values) will work?
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 20 July 2017 at 11:35, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Right, this is one of the key challenges to be addressed, as is the question of memory consumption - while Giampaolo's write-up is good in terms of covering the runtime performance motivation, it misses that one of the key motivations of the namedtuple design is to ensure that the amortised memory overhead of namedtuple instances is *zero*, since the name/position mapping is stored on the type, and *not* on the individual instances.
From my point of view, I think the best available answers to those questions are:
- ntuple literals will retain the low memory overhead characteristics of collections.namedtuple - we will use a type caching strategy akin to string interning - ntuple types will be uniquely identified by their field names and order - if you really want to prime the type cache, just create a module level instance without storing it: (x=1, y=2) # Prime the ntuple type cache A question worth asking will be whether or not "collections.namedtuple" will implicitly participate in the use of the type cache, and I think the answer needs to be "No". The problem is twofold: 1. collections.namedtuple accepts an additional piece of info that won't be applicable for ntuple types: the *name* 2. collections.namedtuple has existed for years *without* implicit type caching, so adding it now would be a bit weird That means the idiomatic way of getting the type of an ntuple would be to create an instance and take the type of it: type((x=1, y=2)) The could still be the same kind of type as is created by collections.namedtuple, or else a slight variant that tailors repr() and pickling support based on the fact it's a kind of tuple literal. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/c437d/c437dcdb651291e4422bd662821948cd672a26a3" alt=""
I'm concerned in the proposal about losing access to type information (i.e. name) in this proposal. For example, I might write some code like this now:
The proposal to define this as:
smart = (cost=18_900, hp=89, weight=949) harley = (cost=18_900, hp=89, weight=949)
Doesn't seem to leave any way to distinguish the objects of different types that happen to have the same fields. Comparing ` smart._fields==harley._fields` doesn't help here, nor does any type constructed solely from the fields. Yes, I know a Harley-Davidson only weighs about half as much as a SmartCar, although the price and HP aren't far off. I can think of a few syntax ideas for how we might mix in a "name" to the `ntuple` objects, but I don't want to bikeshed. I'd just like to have the option of giving a name or class that isn't solely derived from the field names. On Wed, Jul 19, 2017 at 9:06 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
data:image/s3,"s3://crabby-images/735d9/735d937548be7e044a6af7241efaa4feb82d7484" alt=""
If the type is a data, it probably belongs to the inside of the tuple: smart = (type="Car", cost=18_900, hp=89, weight=949) harley = (type="Motorcycle", cost=18_900, hp=89, weight=949) both_vehicles = (type(smart) == type(harley)) # True - type+cost+hp+weight on both sides same_vehicles = (smart == harley) # False - cost, hp and weight are identical, but not type Le 20/07/17 à 07:12, David Mertz a écrit :
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
On Jul 20, 2017 1:13 AM, "David Mertz" <mertz@gnosis.cx> wrote: I'm concerned in the proposal about losing access to type information (i.e. name) in this proposal. For example, I might write some code like this now:
The proposal to define this as:
smart = (cost=18_900, hp=89, weight=949) harley = (cost=18_900, hp=89, weight=949)
Doesn't seem to leave any way to distinguish the objects of different types that happen to have the same fields. Comparing ` smart._fields==harley._fields` doesn't help here, nor does any type constructed solely from the fields. What about making a syntax to declare a type? The ones that come to mind are name = (x=, y=) Or name = (x=pass, y=pass) They may not be clear enough, though.
data:image/s3,"s3://crabby-images/4d61d/4d61d487866c8cb290837cb7b1cd911c7420eb10" alt=""
I'm not sure why everybody have such a grip on the type. When we use regular tuples, noone care, it's all tuples, no matter what. Well in that case, let's make all those namedtuple and be done with it. If somebody really needs a type, this person will either used collections.namedtuple the old way, or use a namespace or a class. If using the type "namedtuple" is an issue because it already exist, let's find a name for this new type that convey the meaning, like labelledtuple or something. The whole point of this is to make it a litteral, simple and quick to use. If you make it more than it is, we already got everything to do this and don't need to modify the language. Le 23/07/2017 à 18:08, Todd a écrit :
data:image/s3,"s3://crabby-images/57f17/57f172f0cf4086452e8f193e1590042b5113a553" alt=""
23.7.2017 20.59 "Michel Desmoulin" <desmoulinmichel@gmail.com> wrote: I'm not sure why everybody have such a grip on the type. When we use regular tuples, noone care, it's all tuples, no matter what. Well in that case, let's make all those namedtuple and be done with it. If somebody really needs a type, this person will either used collections.namedtuple the old way, or use a namespace or a class. If using the type "namedtuple" is an issue because it already exist, let's find a name for this new type that convey the meaning, like labelledtuple or something. The whole point of this is to make it a litteral, simple and quick to use. If you make it more than it is, we already got everything to do this and don't need to modify the language. +1 to this, why not just have: type((x=0, y=0)) == namedtuple similar to how tuples work. If you want to go advanced, feel free to use classes. Also, would it be crazy to suggest mixing tuples and named tuples:
Just an idea, I'm not sure if it would have any use though.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jul 23, 2017 at 07:47:16PM +0200, Michel Desmoulin wrote:
I'm not sure why everybody have such a grip on the type.
When we use regular tuples, noone care, it's all tuples, no matter what.
Some people care. This is one of the serious disadvantages of ordinary tuples as a record/struct type. There's no way to distinguish between (let's say) rectangular coordinates (1, 2) and polar coordinates (1, 2), or between (name, age) and (movie_title, score). They're all just 2-tuples. [...]
I disagree: in my opinion, the whole point is to make namedtuple faster, so that Python's startup time isn't affected so badly. Creating new syntax for a new type of tuple is scope-creep. Even if we had that new syntax, the problem of namedtuple slowing down Python startup would remain. People can't use this new syntax until they have dropped support for everything before 3.7, which might take many years. But a fast namedtuple will give them benfit immediately their users upgrade to 3.7. I agree that there is a strong case to be made for a fast, built-in, easy way to make record/structs without having to pre-declare them. But as the Zen of Python says: Now is better than never. Although never is often better than *right* now. Let's not rush into designing a poor record/struct builtin just because we have a consensus (Raymond dissenting?) that namedtuple is too slow. The two issues are, not unrelated, but orthogonal. Record syntax would be still useful even if namedtuple was accelerated, and faster namedtuple would still be necessary even if we have record syntax. I believe that a couple of people (possibly including Guido?) are already thinking about a PEP for that. If that's the case, let's wait and see what they come up with. In the meantime, lets get back to the original question here: how can we make namedtuple faster? - Guido has ruled out using a metaclass as the implementation, as that makes it hard to inherit from namedtuple and another class with a different metaclass. - Backwards compatibility is a must. - *But* maybe we can afford to bend backwards compatibility a bit. Perhaps we don't need to generate the *entire* class using exec, just __new__. - I don't think that the _source attribute itself makes namedtuple slow. That might effect the memory usage of the class object itself, but its just a name binding: result._source = class_definition The expensive part is, I'm fairly sure, this: exec(class_definition, namespace) (Taken from the 3.5 collections/__init__.py.) I asked on PythonList@python.org whether people made us of the _source attribute, and the overwhelming response was that they either didn't know it existed, or if they did know, they didn't use it. https://mail.python.org/pipermail/python-list/2017-July/723888.html *If* it is accurate to say that nobody uses _source, then perhaps we might be willing to make this minor backwards-incompatible change in 3.7 (but not in a bug-fix release): - Only the __new__ method is generated by exec (my rough tests suggest that may make namedtuple four times faster); - _source only gives the source to __new__; - or perhaps we can save backwards compatibility by making _source generate the rest of the template lazily, when needed, even if the entire template isn't used by exec. That risks getting the *actual* source and the *reported* source getting out of sync. Maybe its better to just break compatibility rather than risk introducing a discrepancy between the two. -- Steve
data:image/s3,"s3://crabby-images/4d61d/4d61d487866c8cb290837cb7b1cd911c7420eb10" alt=""
Le 24/07/2017 à 15:31, Steven D'Aprano a écrit :
You are just using my figure of speech as a way to counter argument. It's not a very useful thing to do. Of course some people care, there are always a few people caring about anything. But you just created your manual namedtuple or a namespace and be done with it. Rejecting completly the literal syntax just because it doesn't improve this use case you already had and worked but was a bit verbose is very radical. Unless you have a very nice counter proposal that makes everyone happy, accepting the current one doesn't take anything from you.
You are in the wrong thread. This thread is specifically about namedtupels literal. Making namedtuple faster can be done in many other ways and doesn't require a literal syntax. A literal syntax, while making things slightly faster by nature, is essentially to make things faster to read and write.
Again you are mixing the 2 things. This is why we have 2 threads: the debate splitted.
I agree that there is a strong case to be made for a fast, built-in, easy way to make record/structs without having to pre-declare them.
Do other languages have such a thing that can be checked against types ?
I agree. I don't thing we need to rush it. I can live without it now. I can live without it at all.
Let's not rush into designing a poor record/struct builtin just because we have a consensus (Raymond dissenting?) that namedtuple is too slow.
We don't. We can solve the slowness problem without having the namedtuple. The litteral is a convenience.
On that we agree.
Yes but it's about making classes less verbose if I recall. Or at least use the class syntax. It's nice but not the same thing. Namedtuple litterals are way more suited for scripting. You really don't want to write a class in quick scripts, when you do exploratory programming or data analysis on the fly.
In the meantime, lets get back to the original question here: how can we make namedtuple faster?
The go to the other thread for that.
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 24 July 2017 at 17:37, Michel Desmoulin <desmoulinmichel@gmail.com> wrote:
You are in the wrong thread. This thread is specifically about namedtupels literal.
In which case, did you not see Guido's post "Honestly I would like to declare the bare (x=1, y=0) proposal dead."? The namedtuple literal proposal that started this thread is no longer an option, so can we move on? Preferably by dropping the whole idea - no-one has to my mind offered any sort of "replacement namedtuple" proposal that can't be implemented as a 3rd party library on PyPI *except* the (x=1, y=0) syntax proposal, and I see no justification for adding a *fourth* implementation of this type of object in the stdlib (which means any proposal would have to include deprecation of at least one of namedtuple, structseq or types.SimpleNamespace). The only remaining discussion on the table that I'm aware of is how we implement a more efficient version of the stdlib namedtuple class (and there's not much of that to be discussed here - implementation details can be thrashed out on the tracker issue). Paul
data:image/s3,"s3://crabby-images/a03e9/a03e989385213ae76a15b46e121c382b97db1cc3" alt=""
On Mon, Jul 24, 2017 at 6:31 AM, Steven D'Aprano <steve@pearwood.info> wrote:
sure -- but Python is dynamically typed, and we all like to talk abou tit as duck typing -- so asking: Is this a "rect_coord" or a "polar_coord" object isn't only unnecessary, it's considered non-pythonic. Bad example, actually, as a rect_coord would likely have names like 'x' and 'y', while a polar_coord would have "r' and 'theta' -- showing why having a named-tuple-like structure is helpful, even without types. So back to the example before of "Motorcycle" vs "Car" -- if they have the same attributes, then who cares which it is? If there is different functionality tied to each one, then that's what classes and sub-classing are for. I think the entire point of this proposed object is that it be as lightweight as possible -- it's just a data storage object -- if you want to switch functionality on type, then use subclasses. As has been said, NameTupule is partly the way it is because it was desired to be a drop-in replacement for a regular tuple, and need to be reasonably implemented in pure python. If we can have an object that is: immutable indexable like a tuple has named attributes is lightweight and efficient I think that would be very useful, and would take the place of NamedTuple for most use-cases, while being both more pythonic and more efficient. Whether it gets a literal or a simple constructor makes little difference, though if it got a literal, it would likely end up seeing much wider use (kind of like the set literal). I disagree: in my opinion, the whole point is to make namedtuple faster,
so that Python's startup time isn't affected so badly. Creating new syntax for a new type of tuple is scope-creep.
I think making it easier to access and use is a worthwhile goal, too. If we are re-thinking this, a littel scope creep is OK. Even if we had that new syntax, the problem of namedtuple slowing down
These aren't mutually exclusive, if 3.7 has collection.NamedTuple wrap the new object. IIUC, the idea of chached types would mean that objects _would_ be a Type, even if that wasn't usually exposed -- so it could be exposed in the case where it was constructed from a collections.NamedTuple() -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
data:image/s3,"s3://crabby-images/e7510/e7510abb361d7860f4e4cc2642124de4d110d36f" alt=""
On Wed, Jul 19, 2017 at 9:06 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The problem with namedtuple's semantics are that they're perfect for its original use case (replacing legacy tuple returns without breaking backwards compatibility), but turn out to be sub-optimal for pretty much anything else, which is one of the motivations behind stuff like attrs and Eric's dataclasses PEP: https://github.com/ericvsmith/dataclasses/blob/61bc9354621694a93b215e79a7187... that namedtuple is already arguably *too* convenient, in the sense that it's become an attractive nuisance that gets used in places where it isn't really appropriate. Also, what's the advantage of (x=1, y=2) over ntuple(x=1, y=2)? I.e., why does this need to be syntax instead of a library? -n -- Nathaniel J. Smith -- https://vorpus.org
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 20 July 2017 at 07:58, Nathaniel Smith <njs@pobox.com> wrote:
Agreed. This discussion was prompted by the fact that namedtuple class creation was slow, resulting in startup time issues. It seems to have morphed into a generalised discussion of how we design a new "named values" type. While I know that if we're rewriting the implementation, that's a good time to review the semantics, but it feels like we've gone too far in that direction. As has been noted, the new proposal - no longer supports multiple named types with the same set of field names - doesn't allow creation from a simple sequence of values I would actually struggle to see how this can be considered a replacement for namedtuple - it feels like a completely independent beast. Certainly code intended to work on multiple Python versions would seem to have no motivation to change.
Also, what's the advantage of (x=1, y=2) over ntuple(x=1, y=2)? I.e., why does this need to be syntax instead of a library?
Agreed. Now that keyword argument dictionaries retain their order, there's no need for new syntax here. In fact, that's one of the key motivating reasons for the feature. Paul
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On 20 July 2017 at 10:15, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
I don't think anyone has suggested that the instance creation time penalty for namedtuple is the issue (it's the initial creation of the class that affects interpreter startup time), so it's not clear that we need to optimise that (at this stage). However, it's also true that namedtuple instances are created from sequences, not dictionaries (because the class holds the position/name mapping, so instance creation doesn't need it). So it could be argued that the backward-incompatible means of creating instances is *also* a problem because it's slower... Paul PS Taking ntuple as "here's a neat idea for a new class", rather than as a possible namedtuple replacement, changes the context of all of the above significantly. Just treating ntuple purely as a new class being proposed, I quite like it, but I'm not sure it's justified given all of the similar approaches available, so let's see how a 3rd party implementation fares. And it's too early to justify new syntax, but if the overhead of a creation function turns out to be too high in practice, we can revisit that question. But that's *not* what this thread is about, as I understand it.
data:image/s3,"s3://crabby-images/7f583/7f58305d069b61dd85ae899024335bf8cf464978" alt=""
Something probably not directly related, but since we started to talk about syntactic changes... I think what would be great to eventually have is some form of pattern matching. Essentially pattern matching could be just a "tagged" unpacking protocol. For example, something like this will simplify a common pattern with a sequence of if isinstance() branches: class Single(NamedTuple): x: int class Pair(NamedTuple): x: int y: int def func(arg: Union[Single, Pair]) -> int: whether arg: Single as a: return a + 2 Pair as a, b: return a * b else: return 0 The idea is that the expression before ``as`` is evaluated, then if ``arg`` is an instance of the result, then ``__unpack__`` is called on it. Then the resulting tuple is unpacked into the names a, b, etc. I think named tuples could provide the __unpack__, and especially it would be great for dataclasses to provide the __unpack__ method. (Maybe we can then call it __data__?) -- Ivan On 20 July 2017 at 11:39, Clément Pit-Claudel <cpitclaudel@gmail.com> wrote:
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
On Thu, Jul 20, 2017 at 9:58 AM, Nathaniel Smith <njs@pobox.com> wrote:
Well put! I agree that adding attribute names to elements in a tuple (e.g. return values) in a backwards-compatible way is where namedtuple is great.
I do think it makes sense to add a convenient way to upgrade a function to return named values. Is there any reason why that couldn't replace structseq completely? These anonymous namedtuple classes could also be made fast to create (and more importantly, cached).
Also, what's the advantage of (x=1, y=2) over ntuple(x=1, y=2)? I.e., why does this need to be syntax instead of a library?
Indeed, we might need the syntax (x=1, y=2) later for something different. However, I hope we can forget about 'ntuple', because it suggests a tuple of n elements. Maybe something like return tuple.named(x=foo, y=bar) which is backwards compatible with return foo, bar -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
data:image/s3,"s3://crabby-images/98c42/98c429f8854de54c6dfbbe14b9c99e430e0e4b7d" alt=""
20.07.17 04:35, Alexander Belopolsky пише:
Yes, this is the key problem with this idea. If the type of every namedtuple literal is unique, this is a waste of memory and CPU time. Creating a new type is much more slower than instantiating it, even without compiling. If support the global cache of types, we have problems with mutability and life time. If types are mutable (namedtuple classes are), setting the __doc__ or __name__ attributes of type((x=1, y=2)) will affect type((x=3, y=4)). How to create two different named tuple types with different names and docstrings? In Python 2 all types are immortal, in python 3 they can be collected as ordinary objects, and you can create types dynamically without a fear of spent too much memory. If types are cached, we should take care about collecting unused types, this will significantly complicate the implementation.
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
On Fri, Jul 21, 2017 at 8:49 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
How about just making a named namedtuple if you want to mutate the type? Or perhaps make help() work better for __doc__ attributes on instances. Currently,
does not show "Hello" at all. In Python 2 all types are immortal, in python 3 they can be collected as
Hmm. Good point. Even if making large amounts of arbitrary disposable anonymous namedtuples is probably not a great idea, someone might do it. Maybe having a separate type for each anonymous named tuple is not worth it. After all, keeping references to the attribute names in the object shouldn't take up that much memory. And the tuples are probably often short-lived. Given all this, the syntax for creating anonymous namedtuples efficiently probably does not really need to be super convenient on the Python side, but having it available and unified with that structseq thing would seem useful. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Honestly I would like to declare the bare (x=1, y=0) proposal dead. Let's encourage the use of objects rather than tuples (named or otherwise) for most data exchanges. I know of a large codebase that uses dicts instead of objects, and it's a mess. I expect the bare ntuple to encourage the same chaos. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/70d22/70d229dc59c135445304f3c3ceb082e78329143f" alt=""
Languages since the original Pascal have had a way to define types by structure. If Python did the same, ntuples with the same structure would be typed "objects" that are not pre-declared. In Python's case, because typing of fields is not required and thus can't be used to hint the structures type, the names and order of fields could be used. Synthesizing a (reserved) type name for (x=1, y=0) should be straight forward. I short,
isinstance(x=None, y=None), type((x=1, y=0))) True
That can be implemented with namedtuple with some ingenious mangling for the (quasi-anonymous) type name. Equivalence of types by structure is useful, and is very different from the mess that using dicts as records can produce. Cheers, -- Juancarlo *Añez*
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 22 July 2017 at 01:18, Guido van Rossum <guido@python.org> wrote:
That sounds sensible to me - given ordered keyword arguments, anything that bare syntax could do can be done with a new builtin instead, and be inherently more self-documenting as a result. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/4d61d/4d61d487866c8cb290837cb7b1cd911c7420eb10" alt=""
Le 24/07/2017 à 16:12, Nick Coghlan a écrit :
This is the people working on big code base talking. Remember, Python is not just for Google and Dropbox. We have thousands of user just being sysadmin, mathematicians, bankers, analysts, that just want a quick way to make a record. They don't want nor need a class. Dictionaries and collections.namedtuple are verbose and so they just used regular tuples. They don't use mypy either so having a type would be moot for them. In many languages we have the opposite problem: people using classes as a container for everything. It makes things very complicated with little value. Python actually has a good balance here. Yes, Python doesn't have pattern matching witch makes it harder to check if a nested data structure match the desired schema but all in all, the bloat/expressiveness equilibrium is quite nice. A litteral namedtuple would allow a clearer way to make a quick and simple record.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 25 July 2017 at 02:46, Michel Desmoulin <desmoulinmichel@gmail.com> wrote:
Dedicated syntax: (x=1, y=0) New builtin: ntuple(x=1, y=0) So the only thing being ruled out is the dedicated syntax option, since it doesn't let us do anything that a new builtin can't do, it's harder to find help on (as compared to "help(ntuple)" or searching online for "python ntuple"), and it can't be readily backported to Python 3.6 as part of a third party library (you can't easily backport it any further than that regardless, since you'd be missing the order-preservation guarantee for the keyword arguments passed to the builtin). Having such a builtin implictly create and cache new namedtuple type definitions so the end user doesn't need to care about pre-declaring them is still fine, and remains the most straightforward way of building a capability like this atop the underlying `collections.namedtuple` type. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
On 25 July 2017 at 11:57, Nick Coghlan <ncoghlan@gmail.com> wrote:
I've updated the example I posted in the other thread with all the necessary fiddling required for full pickle compatibility with auto-generated collections.namedtuple type definitions: https://gist.github.com/ncoghlan/a79e7a1b3f7dac11c6cfbbf59b189621 This shows that given ordered keyword arguments as a building block, most of the actual implementation complexity now lies in designing an implicit type cache that plays nicely with the way pickle works: from collections import namedtuple class _AutoNamedTupleTypeCache(dict): """Pickle compatibility helper for autogenerated collections.namedtuple type definitions""" def __new__(cls): # Ensure that unpickling reuses the existing cache instance self = globals().get("_AUTO_NTUPLE_TYPE_CACHE") if self is None: maybe_self = super().__new__(cls) self = globals().setdefault("_AUTO_NTUPLE_TYPE_CACHE", maybe_self) return self def __missing__(self, fields): cls_name = "_ntuple_" + "_".join(fields) return self._define_new_type(cls_name, fields) def __getattr__(self, cls_name): parts = cls_name.split("_") if not parts[:2] == ["", "ntuple"]: raise AttributeError(cls_name) fields = tuple(parts[2:]) return self._define_new_type(cls_name, fields) def _define_new_type(self, cls_name, fields): cls = namedtuple(cls_name, fields) cls.__module__ = __name__ cls.__qualname__ = "_AUTO_NTUPLE_TYPE_CACHE." + cls_name # Rely on setdefault to handle race conditions between threads return self.setdefault(fields, cls) _AUTO_NTUPLE_TYPE_CACHE = _AutoNamedTupleTypeCache() def auto_ntuple(**items): cls = _AUTO_NTUPLE_TYPE_CACHE[tuple(items)] return cls(*items.values()) But given such a cache, you get implicitly defined types that are automatically shared between instances that want to use the same field names: >>> p1 = auto_ntuple(x=1, y=2) >>> p2 = auto_ntuple(x=4, y=5) >>> type(p1) is type(p2) True >>> >>> import pickle >>> p3 = pickle.loads(pickle.dumps(p1)) >>> p1 == p3 True >>> type(p1) is type(p3) True >>> >>> p1, p2, p3 (_ntuple_x_y(x=1, y=2), _ntuple_x_y(x=4, y=5), _ntuple_x_y(x=1, y=2)) >>> type(p1) <class '__main__._AUTO_NTUPLE_TYPE_CACHE._ntuple_x_y'> And writing the pickle out to a file and reloading it also works without needing to explicitly predefine that particular named tuple variant: >>> with open("auto_ntuple.pkl", "rb") as f: ... p1 = pickle.load(f) ... >>> p1 _ntuple_x_y(x=1, y=2) In effect, implicitly named tuples would be like key-sharing dictionaries, but sharing at the level of full type objects rather than key sets. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
data:image/s3,"s3://crabby-images/70d22/70d229dc59c135445304f3c3ceb082e78329143f" alt=""
If an important revamp of namedtuple will happen (actually, "easy and friendly immutable structures"), I'd suggest that the new syntax is not discarded upfront, but rather be left as a final decision, after all the other forces are resolved. FWIW, there's another development thread about "easy class declarations (with typining)". From MHPOV, the threads are different enough to remain separate. Cheers! -- Juancarlo *Añez*
data:image/s3,"s3://crabby-images/2eb67/2eb67cbdf286f4b7cb5a376d9175b1c368b87f28" alt=""
On 2017-07-25 02:57, Nick Coghlan wrote:
[snip] I think it's a little like function arguments. Arguments can be all positional, but you have to decide in what order they are listed. Named arguments are clearer than positional arguments when calling functions. So an ntuple would be like a tuple, but with names (attributes) instead of positions. I don't see how they could be compatible with tuples because the positions aren't fixed. You would need a NamedTuple where the type specifies the order. I think...
data:image/s3,"s3://crabby-images/02573/025732c254c3bfef379ac4c320c4d99544742163" alt=""
On Tue, Jul 25, 2017 at 7:49 PM, MRAB <python@mrabarnett.plus.com> wrote:
Most likely ntuple() will require keyword args only, whereas for collections.namedtuple they are mandatory only during declaration. The order is the same as kwargs, so:
What's less clear is how isinstance() should behave. Perhaps:
-- Giampaolo - http://grodola.blogspot.com
data:image/s3,"s3://crabby-images/2eb67/2eb67cbdf286f4b7cb5a376d9175b1c368b87f28" alt=""
On 2017-07-25 19:48, Giampaolo Rodola' wrote:
Given:
nt = ntuple(x=1, y=2)
you have nt[0] == 1 because that's the order of the args. But what about:
nt2 = ntuple(y=2, x=1)
? Does that mean that nt[0] == 2? Presumably, yes. Does nt == nt2? If it's False, then you've lost some of the advantage of using names instead of positions. It's a little like saying that functions can be called with keyword arguments, but the order of those arguments still matters!
participants (32)
-
Alex Walters
-
Alexander Belopolsky
-
Alexandre Brault
-
Brice PARENT
-
Chris Angelico
-
Chris Barker
-
Clément Pit-Claudel
-
David Mertz
-
Ethan Furman
-
Giampaolo Rodola'
-
Greg Ewing
-
Guido van Rossum
-
Ivan Levkivskyi
-
Juancarlo Añez
-
Koos Zevenhoven
-
Lucas Wiman
-
Markus Meskanen
-
Michel Desmoulin
-
Mike Miller
-
MRAB
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Pavol Lisy
-
Serhiy Storchaka
-
Stephan Hoyer
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
Todd
-
Victor Stinner
-
אלעזר