A mutable alternative to namedtuple
data:image/s3,"s3://crabby-images/9851b/9851bf95beba08841f958ac928aee4a59945668f" alt=""
Sometimes we need a simple class to hold some mutable attributes, provide a nice repr, support == for testing, and support iterable unpacking, so you can write:
p = Point(3, 4) x, y = p
That's very much like the classes built by namedtuple, but mutable. I propose we add to the collections module another class factory. I am calling it plainclass, but perhaps we can think of a better name. Here is how it would be used:
import collections Point = collections.plainclass('Point', 'x y')
The signature of the plainclass function would be exactly the same as namedtuple, supporting the same alternative ways of naming the attributes. The semantics of the generated Point class would be like this code: https://gist.github.com/ramalho/fd3d367e9d3b2a659faf What do you think? Cheers, Luciano PS. I am aware that there are "Namespace" classes in the standard library (e.g. [2]). They solve a different problem. [2] https://docs.python.org/3/library/argparse.html#argparse.Namespace -- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Professor em: http://python.pro.br | Twitter: @ramalhoorg
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Tue, Mar 17, 2015 at 10:52 AM, Luciano Ramalho <luciano@ramalho.org> wrote:
Should it also have all the same methods as a namedtuple class (e.g. the tuple methods, _make, _replace)? What about the other namedtuple attrs (_fields, etc.)? What's the motivation for parity with namedtuple (particularly iteration)? I suppose I see the desire for a fixed set of mutable attributes specific to the generated type. However, there have been numerous discussions on this list about alternative approaches to namedtuple which apply here. Simply adapting namedtuple may not be the right thing. Regardless, this is the sort of thing that should bake outside the stdlib for a while to prove it's approach and its worth, much as namedtuple did. It would also help if there were a concrete use case in the stdlib that this new class (factory) would satisfy.
Don't forget types.SimpleNamespace. :) -eric
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 03/17/2015 12:52 PM, Luciano Ramalho wrote:
https://pypi.python.org/pypi/namedlist It also adds default values to the generated constructor, which may or may not be desirable. But if used exactly like collections.namedtuple, it ignores the default values. Eric.
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
вторник, 17 марта 2015 г., 20:21:01 UTC+3 пользователь Eric V. Smith написал:
attribute access. The mutable alternative could be considered as an array with attribute access. Array in this context is tuple-like object that support assign operation. Since python have not such object there are different approaches tomutable named tuple alternatives. One should note that particular property of named tuple is memory saving. So one can expect similar property of mutable named tuple too.
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 19, 2015, at 12:04 AM, Zaur Shibzukhov <szport@gmail.com> wrote:
Python definitely does have such an object: list. A list is effectively the same as a tuple but mutable; it's the paradigm MutableSequence while tuple is the paradigm Sequence. Under the covers they have very similar headers that both use the same storage (a C array of pointers to Python objects, in CPython), and C API functions like PySequence_Fast_GET_ITEM don't distinguish between the two. However, list is resizable, and presumably a "namedlist" would not be. That makes things more complicated for both the interface (there's no is-a relationship; a type without append is not a list--and, worse, a type that has __setitem__ but can't handle slice replacement is not a list but that's very hard to detect...) and the implementation (e.g., a list reserves extra space at the end to avoid having to reallocate on every append). (Python _also_ has an array type, which is for homogenous simple types (like 32-bit int) which can store the values directly, as opposed to tuple and list, which store (pointers to) heterogenous normal Python objects.)
One should note that particular property of named tuple is memory saving. So one can expect similar property of mutable named tuple too.
If you don't need to access the items by index for whatever reason, you don't need a namedtuple, and using one as a misguided misoptimization is a bad idea. Besides the fact that a normal class with __slots__ is also small, and even a normal class with a dict (in newer CPython versions and PyPy) not that much bigger, besides the fact that you can eliminate the row overhead rather than just slightly reducing it by using, e.g., a 2D array, you're optimizing the wrong thing in the first place--if your rows have 9 elements, reducing the row overhead is focusing on fixing 10% of your overhead, while reducing or eliminating the element overhead by using, e.g., a 2D numpy array of low-level values fixes the 90% (along with the 10%).
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
That all right. But I want to note that `collections.namedtuple` has several properties that make them exclusive: 1. Fast creation; 2. Minimal memory capacity; 3. Fast sequence interface; 4. Attribute access to elements via properties. Different namedtuple alternatives has different sets of properties that make them more ore less suitable depending on use cases. So if someone search alternative of collections.namedtuple that support assignment too then it could be constructed on top of array (actually "tuple" but with assignment support, but python seems have not such array type). --- *Zaur Shibzukhov* 2015-03-19 11:37 GMT+03:00 Andrew Barnert <abarnert@yahoo.com>:
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
print(sys.getsizeof(list([])), sys.getsizeof(tuple([]))) 64 48
Yes. But: First. print(sys.getsizeof(list([1,2])), sys.getsizeof(tuple([1,2]))) 104 64 Second. Tuple object allocates it's memory 1 time, list object allocates it's memory 2 time. That is why
Certainly this can have or not have a value depending on use case. --- *Zaur Shibzukhov* 2015-03-19 13:07 GMT+03:00 Andrew Barnert <abarnert@yahoo.com>:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 19, 2015, at 3:43 AM, Zaur Shibzukhov <szport@gmail.com> wrote:
As I explained, list leaves room for expansion, so it doesn't have to keep reallocating with every append. A 2-element list actually has room for... off the top of my head, I think 8 elements; then it multiplies the capacity every time it runs out on an append. In a C extension, you can create a list with whatever specific capacity you want, but that isn't exposed to Python (presumably because such a micro-optimization is rarely useful, and would complicate the API, as well as being an attractive nuisance to people who insist on optimizing the wrong things). So, if you really needed this optimization, you could implement your namedlist in C. But what use case are you imagining where the extra 40 bytes per row matter, but the 64 bytes per row don't, and neither do the 56 or so bytes per element per row? I'm having a hard time imagining any case where using tuples would be necessary where it wouldn't be woefully insufficient.
If you're going to show benchmarks, you really should use %timeit rather than %time, and also understand that "i for i in range(15)" is just going to give you a slower but equivalent iterable to just using "range(15)" in the first place. But more importantly, what use case are you considering where this extra 0.8us for construction of each row object will matter, but the difference between a namedtuple (or, presumably, namedlist) vs. a tuple won't? And, even if you did, you're focusing on something that accounts for at worst 25% of the construction time, and 0% of the time for all the actual work you do with the object after construction. Again, unless you have many millions of these, neither the memory not the construction time is going to matter--and if you do, a new type that's more like tuple isn't going to be anywhere near sufficient to make a noticeable difference, because you're optimizing the wrong part of both the memory and the time.
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
2015-03-19 14:22 GMT+03:00 Andrew Barnert <abarnert@yahoo.com>:
for i in [4, 12, 28, 68]: print(sys.getsizeof(list(range(i))), end=' ') 120 216 360 720
Yes you right. Actual optimization one could archive with the help of Cython if necessary. But what use case are you imagining where the extra 40 bytes per row matter, but the 64 bytes per row don't, and neither do the 56 or so bytes per element per row? I'm having a hard time imagining any case where using tuples would be necessary where it wouldn't be woefully insufficient. That is why
Certainly this can have or not have a value depending on use case.
import namedlist
Possibly basic gain in the savings of memory, especially in the context of long running processes, limitations on the memory on hostings, long loop + memory fragmentation in python heap space, ... --- *Zaur Shibzukhov* 2015-03-19 13:07 GMT+03:00 Andrew Barnert <abarnert@yahoo.com>:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
And what's your point? That's exactly what you should expect if lists keep extra slack space at the end to avoid reallocation on expansion.
Not if you're trying to put this in the stdlib; you have to write the C extension. But if the person who thinks this is necessary (you) prefers to use Cython instead of C, that seems like another argument against putting it in the stdlib...
But what use case are you imagining where the extra 40 bytes per row matter, but the 64 bytes per row don't, and neither do the 56 or so bytes per element per row? I'm having a hard time imagining any case where using tuples would be necessary where it wouldn't be woefully insufficient.
I've now asked this twice, and you've avoided answering it both times to present further irrelevancies. This is the key question. If you don't have any use case for which this matters, then it doesn't matter.
Despite the name, the third-party module "namedlist" does not appear to make classes that inherit from list, or use a list for storage. You just get a normal class (using an OrderedDict __dict__ by default, or optionally __slots__) that provides a __getitem__ method implemented in terms of __getattr__. So I don't see how this is even vaguely relevant here. And, even if that module were relevant, how would it have anything to do with answering "Where is the use case where this construction time matters?" You don't even _mention_ construction time here.
No. This is basic computer science. Optimizing parts of your code that aren't actually relevant because they contribute only a tiny percentage to the waste doesn't actually improve anything.
data:image/s3,"s3://crabby-images/1940c/1940cb981172fcc1dafcecc03420e31ecedc6372" alt=""
On Tue, Mar 17, 2015 at 7:52 PM, Luciano Ramalho <luciano@ramalho.org> wrote:
+1, but I think that the core problem with such proposals is that they lack the use cases. The only reason for me to have such class is to work with tabular data. For example, for querying the capability of the system, I need to build an inmemory table of features, and then set parameters for each feature one by one. sqlite/SQL is an overkill for that, and dicts are just not enough to do readable lookups and updates to specific cells, so I'd more appreciate a full table class than just its "named row" model. Practical example that I came up with: https://bitbucket.org/techtonik/discovery/src/c4f3d306bb43772dcf3c03be8db941... -- anatoly t.
data:image/s3,"s3://crabby-images/9851b/9851bf95beba08841f958ac928aee4a59945668f" alt=""
On Wed, Mar 18, 2015 at 3:05 AM, anatoly techtonik <techtonik@gmail.com> wrote:
Thanks for your input and example, Anatoly. What would the full table class offer that could not be easily done with a simple list of items produced with a plainclass class?
Practical example that I came up with: https://bitbucket.org/techtonik/discovery/src/c4f3d306bb43772dcf3c03be8db941...
Oh, I see you have an update method, so that's the key value add of the Table class, right? I see the Table and plainclass as complementary ideas. You could use plainclass to implement Table more easily. The Table.__iter__ could return plainclass class instances instead of OrderedDict. Cheers, Luciano -- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Professor em: http://python.pro.br | Twitter: @ramalhoorg
data:image/s3,"s3://crabby-images/1940c/1940cb981172fcc1dafcecc03420e31ecedc6372" alt=""
On Wed, Mar 18, 2015 at 2:36 PM, Luciano Ramalho <luciano@ramalho.org> wrote:
Lookup operations. Attaching new columns to existing table or providing a supplementary table with additional columns for existing ones. If Python had first class tables, things right now could be very different. We could have a better structured data handling. Take logging for example. It is not extensible. You have message and level. Then it got component. Then timestamp. But you can't add anything yourself to that message - error code, binary dump, selfie or some read/ unread flags. If logging used a table, there could be an ability to add your own data to its events. And this would be interoperable between programs.
Right. The update method says "change the value of cell with column name==name, in row where column name idname==idvalue". It is basically lookup method with cell modification. So it adds a second dimension to 1D "mutable namedtuple".
That's one of the uses. But I am concerned that it is the only example where this "mutable namedtuple" is useful. I don't like the dynamic class construction as with namedtuple. I believe you can not serialize it reliably, and there are problems with static analysis tools to deal with it (like locate the definition). -- anatoly t.
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
вторник, 17 марта 2015 г., 19:52:28 UTC+3 пользователь Luciano Ramalho написал:
There is an attempt to make such alternative: recordarray <https://pypi.python.org/pypi/recordarray>.
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
--- *Zaur Shibzukhov* 2015-03-27 7:08 GMT+03:00 Joonas Liik <liik.joonas@gmail.com>:
The name 'namedlist' is already used in https://pypi.python.org/pypi/namedlist, 'mutabletuple' -- too in https://pypi.python.org/pypi/mutabletuple. The name recordarray is means that it 1) array of objects and 2) record with access to fields by attributes.
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 26, 2015, at 21:26, Zaur Shibzukhov <szport@gmail.com> wrote:
If you're trying to provide the same concept, why use a completely unrelated name? That's like saying 'I want an ordered set, but there's already an "orderedset" on PyPI so I went with "sortedsequence"'.
The name recordarray is means that it 1) array of objects and 2) record with access to fields by attributes.
But how is being "an array of objects" any different from what a tuple, list, array.array, bytearray, bytes, str, etc. already are? What's specifically array-like about this type as opposed to all of those? And what's specifically record-like about your type compared to namedtuple, Struct, or SimpleNamespace?
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
--- *Zaur Shibzukhov* 2015-03-27 7:40 GMT+03:00 Andrew Barnert <abarnert@yahoo.com>:
I am inclined to the fact that it's better to rename `objectarray` to 'mutabletuple' in order to be explicit about what is that. 'recordarray' is a factory function that make exactly the same as 'namedtuple' factory does (except that it create subclass of 'mutabletuple' and make '_replace' to update the 'self', not make a copy). So may be it's better to call it as 'record' or 'recordtype', or even 'recordfactory'?
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
On 27 March 2015 at 01:40, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
Acutally, on my understanding, the request on this thread is for something that is quite concrete, existing in other languages, and that can be done in Python in a few lines, but is not in the stdlib: The Python equivalent of a C Struct. Just that. An easy to create class, with named fields, with possible type-enforcement for those fields. Or maybe it _does_ exist in Python, and it is a matter of having a nice example in the docs: for example a "blank" class with "__slots__" would do it. Or a blank class with slots that could serialize and deserialize itself to a sequence in a seamless way. class Base: __slots__ = () def __init__(self, seq=None): if not seq: return for attr, val in zip(self.slots, seq): setattr(self, attr, val) def __iter__(self): for attr in self.__slots__: yield getattr(self, attr) def NamedList(name, fields): ... # split string with space separated fields, and other niceities here return type(name, (Base,), dict(__slots__=fields)) And 10-15 more lines if one wants type-checking, default values, __repr__ into that. I think getting a proper recipe for this, and publicizing it on the documentation ecosystem is enough - maybe a Pypi module adding some more goodies - and if that would get any traction - the usual consideration for inclusion could apply.
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 27, 2015, at 06:22, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
But a C struct is not a subtype of, or substitutable for, a C array. It's not indexable. And the same is true with the equivalents in other languages. In fact, the dichotomy between struct--heterogeneous collection of fixed named fields--and array--homogeneous collection of indexed fields--goes back way before C. So, if you want the equivalent of a C struct, there's no reason to make it an iterable in Python. And a class already is the Python of a C struct, it's just that it can do _more_ than a C struct. A language like C++ that wants to share code and values with C has to bend over backward to make it possible to write a C++ struct (or class) that doesn't use any of its extra features and is therefore exactly equivalent to a C struct, but Python has no need for that. (Except when you actually do want to share values with C code--but for that case, we've got ctypes.Struct, which is exactly what you want in that situation.)
Just that. An easy to create class, with named fields,
Which is easy to do: just create a class, and create its fields in the __init__ method (or, in some cases, it's acceptable to use class attributes as "default values" for instance attributes).
with possible type-enforcement for those fields.
Of course namedtuple doesn't have type-enforcement for the fields. I'm not sure whether you're talking about MyPy static type checking, or runtime checking, but either way, it's easier to add onto a regular class than to a namedtuple-like class factory.
A blank class without __slots__ can also do it. There are times when __slots__ are useful, but usually you're fine with just a plain __dict__. Encouraging people to use it when they have no need for it just because it's more like idiomatic C would be a bad idea. (It's like encouraging people to use @property to get something more like idiomatic .NET or ObjC, when actually they should just be writing idiomatic Python and using attributes directly.)
Why do you want to serialize and deserialize to a sequence? A C struct can't do that, and neither can equivalent types in other languages.
Default values and __repr__ are _also_ not part of a C struct. So, again, if what you're looking for is the equivalent of a C struct, you can replace all of the above with: def Base: pass If you want other features that C structs don't have, then yes, you may have to write them, but the same is true in C (and, in fact, it's clumsier and more difficult in C).
data:image/s3,"s3://crabby-images/9851b/9851bf95beba08841f958ac928aee4a59945668f" alt=""
On Fri, Mar 27, 2015 at 8:13 PM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
So, if you want the equivalent of a C struct, there's no reason to make it an iterable in Python.
Yes, there is: iterable unpacking.
Which is easy to do: just create a class, and create its fields in the __init__ method (or, in some cases, it's acceptable to use class attributes as "default values" for instance attributes).
Boilerplate with lots of repetition, with little added value. For example, in a basic __init__ each attribute name usually occurs three times: as an argument name in the method declaration, and then twice when it's assigned to self. Ruby does much better, for example. Best, Luciano
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 27, 2015, at 16:29, Luciano Ramalho <luciano@ramalho.org> wrote:
Why? You can't do the equivalent in C or any of its descendants (or most other languages with a struct/record type, or most pedagogical or theoretical struct/record concepts). Nor can you do anything even vaguely similar. So why would anyone expect that "the equivalent of a C struct" in Python should be able to do something that a C struct, and its equivalents in other languages, can't? Also, the desire to _not_ have to use iterable unpacking is why we have namedtuple (and structseq in the C API) in the first place: to make tuples that can be used as records, not the other way around. A namedtuple stat result allows your users to access the fields by name instead of by index, which not only makes their code more readable, it also means stat can return different sets of extra fields on different platforms and in new versions without breaking their code. Even in C, this is important: because you access the PyObject fields by name, I can hand you a PyList* cast to a PyObject* and you can use it; if C allowed you to access it by iterable unpacking and you did so, I'd instead have to copy the PyObject fields of the PyList into a new PyObject that didn't have any extra fields.
Let's compare some C code and the equivalent Python: struct Person { const char *name; int age; } struct Person person_make(const char *name, int age) { struct Person p; p.name = strdup(name); p.age = age; return p; } class Person def __init__(self, name: str, age: int): self.name = name self.age = age You really think that this is not like a C struct because it has too much boilerplate compared to the C equivalent? Of course it's trivial to wrap up that boilerplate if you're going to create 20 of these. And to add in other functionality that C structs (and, except for the first, Python namedtuples) don't have that your project needs, like a nice repr, default values, runtime type checking, a JSON serialization schema, an ORM mapping, an HTML form representation, etc. If you really want to add in being a sequence, you can add that too--but again, what's the use case for that? It's certainly not being more like a C struct.
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
On 27 March 2015 at 22:09, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
Of course it's trivial to wrap up that boilerplate if you're going to create 20 of these. And to add in other functionality that C structs (and, except for the first, Python namedtuples) don't have that your project needs, like a nice repr, default values, runtime type checking, a JSON serialization schema, an ORM mapping, an HTML form representation, etc.
So -that is the point - it is trivial to do away with the boiler plate, as I've shown on the other message, - but there is no way to do it in the stdlib, so it is a wheel that is reinvented everytime. NamedTuples are a way to do _almost_ that: make it tirvial to create a class with only this fixed set of attributes, with as little boilerplate as one can think of - but one can't change the attributes on an instance of it. That is why the start of the thread is about a "mutable named tuple" - not because it is a tuple - but because it creates a basic class with fixed attributes that works nicely, with minimal boiler plate. Rethinking at my example, I think it does fit exactly in "too small to be an external dependence in itself, and with too many subtle behaviors to get done right again and again in several projects". js -><-
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sat, Mar 28, 2015 at 12:09 PM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
Here's a crazy thought: you could use functools.wraps() to abuse **kwargs. def make_attributes(func): @functools.wraps(func) def inner(self, **args): self.__dict__.update(args) inner(self, **args) return inner class Person: @make_attributes def __init__(self, *, name: str, age: int): pass Thanks to wraps(), you still have your parameter names for introspection and help() and so on. Thanks to **args, you can do bulk operations on all the args. It's a bit naughty (and it does preclude positional args, though a little bit more work in the decorator could support that too), but it would work..... ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Mar 27, 2015 at 04:13:46PM -0700, Andrew Barnert wrote:
Joao said "The Python equivalent of a C struct", not "a C struct". Python is not C, and Python data types are not limited to what C does. Python strings aren't limited to C null-delimited strings, and Python ints aren't limited to what C ints can do. I think the original thread was quite explicit about what is wanted: something like a mutable equivalent to namedtuple. Namedtuples are used in Python where C would use a struct, or Pascal a record, except that namedtuples (being tuples) are immutable. I think it's quite reasonable to want a mutable version. Effectively, namedtuple is just a convenience function for wrapping up a bunch of nice-to-have but not essential functionality around an immutable struct. Python got by with unnamed tuples for over a decade, so it's not like we *have* to have namedtuples. But having got them, would we go back to using regular tuples as a struct? Hell no. Having named fields is so much better.
And a class already is the Python of a C struct, it's just that it can do _more_ than a C struct.
This is why it is unfair to insist that a Python equivalent of a C struct be limited to what C structs do.
If this is so easy, why we have namedtuple *and* SimpleNamespace in the standard library. Are they both mistakes? SimpleNamespace is especially interesting. The docs say: "However, for a structured record type use namedtuple() instead." https://docs.python.org/3/library/types.html#types.SimpleNamespace which is great if you want an *immutable* structured record type, but not if you want a mutable one. Which brings us back to where this thread started: a request for a mutable version of namedtuple. That's trickier than namedtuple, because we don't have a mutable version of a tuple to inherit from. Lists won't do the job, because they have a whole lot of functionality that are inappropriate, e.g. sort, reverse, pop methods. That makes it harder to create a mutable structured record type, not simpler. Think about the functional requirements: - it should be semantically a struct, not a list or array; - with a fixed set of named fields; - fields should be ordered: a record with fields foo and bar is not the same as a record with fields bar and foo; - accessing fields by index would be a Nice To Have, but not essential; - but iteration is essential, for sequence unpacking; - values in the fields must be mutable; - it should support equality, but not hashing (since it is mutable); - it must have a nice repr and/or str; - being mutable, it may directly or indirectly contain a reference to itself (e.g. x.field = x) so it needs to deal with that correctly; - support for pickle; - like namedtuple, it may benefit from a handful of methods such as '_asdict', '_fields', '_make', '_replace' or similar. Does this sound easy to write? Well, sure, in the big picture, it's hardly a 100,000 line application. But it's not a trivial class. -- Steve
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Sat, Mar 28, 2015 at 7:37 AM, Steven D'Aprano <steve@pearwood.info> wrote:
+1, though it doesn't *necessarily* follow that a mutable equivalent is a good idea. This (and related threads) imply there is at least some support for the new type in principle. I haven't followed the threads too closely so I've missed any mention of solid pythonic use cases that would give the idea much more solid footing. However, I have seen references to prior art on the cheeseshop which may be used to provide harder evidence (both of support and of solid use cases). Regardless, I concur that there are many cases where types and functions have been added to the stdlib that weren't strictly necessary. Perhaps if those proposals had come from someone else or when the mood on python-dev was different then they would not have been added. That is what has happened with numerous other we-have-gotten-by-without-it-so-why-add-it ideas (which may also have proven themselves as namedtuple has). Ultimately we have to be careful in this space because, as Raymond often reminds us, it really is important to make the effort to keep Python small enough to fit in people's brains (and in *some* regard we've probably failed there already). With the great efforts in the last year to improve packaging, the cheeseshop is increasingly the better place for new types and helpers to live. With that in mind, perhaps we should start adding a section to the bottom of relevant docs that contains links to vetted PyPI packages (and recipes) that provide extended capabilities. We've done this already in a few select places (e.g. the 3.2+ namedtuple docs).
As the person who wrote that I'll point out that I added it to help make the distinction clearer between the two. At the time there were concerns about the similarities and with users getting confused about which to use. I will argue that "record type" implies an archive of data, ergo immutable. "Structured" refers to being ordered and having attribute access. IMHO that statement is clear and helpful, but if it has proven otherwise we should consider improving it. In contrast, I see the proposal here as somewhat of a middle ground. Folks are looking for a factory mechanism that produces classes with slots and have both iteration (for unpacking) and index lookup. So something like this: class FixedClassMeta(type): # Ideally this would be a "classonly" method (like classmethod but # class-only) method on FixedClass and not need a metaclass. def subclass(base, name, *fields): # XXX validate fields first args = ', '.join(fields) body = '\n '.join('self.{0} = {0}'.format(f) for f in fields) code = """def __init__(self, {}):\n {}""".format(args, body) ns = {} exec(code, ns) class X(base): __slots__ = fields __init__ = ns['__init__'] X.__name__ = name X.__qualname__ = X.__qualname__.replace('X', name, 1) X.__doc__ = """...""" return X class FixedClass(metaclass=FixedClassMeta): __slots__ = () def __repr__(self): items = ("{}={!r}".format(f, getattr(self, f)) for f in self.__slots__) return "{}({})".format(self.__class__.__name__, ', '.join(items)) def __iter__(self): # for unpacking return (getattr(self, f) for f in self.__slots__) def __getitem__(self, index): field = self.__slots__[index] try: return getattr(self, field) except AttributeError: raise IndexError(index) # Index lookup exists for convenience, but assignment & deletion # are not in scope. def fixedClass(name, field_names): """A factory that produces classes with fixed, ordered attributes. The returned class has __slots__ set to the field names, as well as __iter__ (for unpacking) and __getitem__ implemented. """ if isinstance(field_names, str): fields = field_names.replace(',', ' ').split() else: fields = field_names return FixedClass.subclass(name, *fields) That said, I'm still not clear on what the use cases are.
This is the key point. It is a fixed-size class with iteration for unpacking and index lookup for convenience. A full-fledged mutable namedtuple doesn't make sense (without clear use cases).
Ah, my example above would have to grow __eq__ then.
- accessing fields by index would be a Nice To Have, but not essential;
Exactly. Not Essential.
- but iteration is essential, for sequence unpacking;
This brings to mind a different proposal that has come up in the past (a separate "dunder" method for unpacking). Iteration seems out of place for here, but we need it for sequence unpacking.
Ah, yes. "RuntimeError: maximum recursion depth exceeded". :)
Perhaps. I think there are a few things we can learn from namedtuple that can be applied for this hypothetical new type/factory. And to add to your list: - performance should be a consideration since the apparent use cases relate to handling many of these as "records". Again, I'm not sold on the benefit of this over the existing alternatives. For records use namedtuple (with the _replace method for "mutation"). -eric
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Sat, Mar 28, 2015 at 12:11 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
What ever became of that proposal? If I recall correctly, it had moderate support but the champion wasn't able to continue pursuing it. -eric
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 28, 2015, at 06:37, Steven D'Aprano <steve@pearwood.info> wrote:
Sure, but nobody just invents random new features to add to int and then justifies them by saying "I want the equivalent of a C int" even though C int doesn't have those features. People invent features (like bit_length) to solve actual use cases, and justify them based on those use cases. Multiple people have asked "what do you want this for?", and the best answer anyone's given has been "the equivalent of a C struct". (That, and to prematurely and badly optimize memory usage.) Even worse, when I ask why specifically anyone wants this thing to be iterable, the answer is "to be the equivalent of a C struct", and that doesn't answer the question. [snip]
I'm not saying a record type shouldn't be allowed to have any features that C structs don't, just that equivalency with C structs isn't an argument for features that C structs don't have. Some of the extra features are so obviously desirable that they probably don't need any argument--if you're going to build this thing, having a nice repr or not breaking pickle seems hard to argue against. But iterability is not that kind of obvious win. Also, how is it "unfair" to suggest that this thing should be limited in some ways? For example, two instances of the same class can have completely different fields; presumably two instances of the same record type really shouldn't. There's no reason it _couldn't_ be completely open like a general class, it's just that you usually don't want it to be. Similarly, there's no reason it couldn't be a sequence, but I don't think you usually want it to be. [snip]
Sure. And that's the problem. If you want something that's "just like a sequence whose elements can be replaced but whose shape is fixed, except that the elements are also named", you run into the problem that Python doesn't have such a sequence type. It's a perfectly coherent concept, and there's no reason you could design a language around immutable, fixed-shape-mutable, and mutable-shape sequences instead of just the first and last, but that's not the way Python was designed. Should that be changed? Or is the only use for such a type to underlie this new type?
Note that namedtuples are nominatively typed, not structurally--a record with fields foo and bar is not necessarily the same as another record with fields foo and bar. Ordering doesn't enter into it; they were defined separately, so they're separate types. Do you want the same behavior here, or the behavior your description implies instead?
- accessing fields by index would be a Nice To Have, but not essential;
Why would that be nice to have? The record/sequence dichotomy has been fundamental to the design of languages since the earliest days, and it's still there in almost all languages. Maybe every language in the world is wrong--but if so, surely you can explain why? For structseq, there was a good reason: a stat result is a 7-tuple as well as being a record with 13-odd fields, because there was a pre-existing mass of code that used stat results as 7-tuples, but people also wanted to be able to access the newer or not-100%-portable fields. That's a great use case. And people have used structseq in other similar examples to migrate users painlessly from an early API that turned out to be too simple and limited. And namedtuple gives you a way to write APIs in a similar style that previously could only be (easily) written with a C extension, which is an obvious win. That's clearly not the case here--nobody has existing APIs that use a fixed-length but mutable sequence that they want to expand into something more flexible, because Python doesn't come with such a sequence type. Of course that's not the only use anyone's ever found for, respectively, structseq and namedtuple--e.g., converting to namedtuple turns out to be handy for cases where you want a record but some external API like SQL gives you a sequence, and that would probably be a good enough justification for namedtuple too. But what is the use that justifies this addition? (For example, if you need to take SQL rows as a sequence, mutate them by name, and then do something I can't imagine with them that requires them to still be a sequence, that would be a pretty good answer.)
- but iteration is essential, for sequence unpacking;
Again, why is that essential? TOOWTDI isn't an iron-clad rule, but it's a good rule of thumb; adding a second way to access the members of a record that's both unique to Python and less Pythonic seems like a bad idea, unless there's some good reason that overbalances it in the other direction. Think of stat code: it's a lot more readable when you access the fields by name instead of by unpacking. Why wouldn't the same be true for, say, a Person record, or an Address record, or an ImageHeader record, or almost anything else you can imagine? (I can think of one particular special case where it might be nice: small, homogenous, essentially-sequence-like records like a Vector or Point or... Well, really just a Vector or Point. And they're clearly special. Both in C and in Python, you're often torn between storing them as an array or as a sequence, and you'll find different apps doing it each way. That isn't true for a Person or Address etc.)
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
It is possible that this is recordclass <https://pypi.python.org/pypi/recordclass> <wink> A short example <http://nbviewer.ipython.org/urls/bitbucket.org/intellimath/recordclass/raw/d...> to illustrate that fact. суббота, 28 марта 2015 г., 16:37:40 UTC+3 пользователь Steven D'Aprano написал:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 26, 2015, at 21:08, Joonas Liik <liik.joonas@gmail.com> wrote:
namedlist perhaps? :)
if you want:"namedtuple, but mutable. " then namedlist seems like the obvious alternative..
But, as discussed earlier in the thread, a list isn't right, or at least isn't obviously right, because lists can change size, and what would it mean for a namedlist to, say, delete element 3? And that's really the problem: Python's entire infrastructure is designed around things which are reshapable like lists, or immutable like tuples, and this doesn't fit either one.
The name "array" seems really unfortunate. It doesn't give you any clue that this thing is halfway between a tuple and a list. Also, we've already got a bytearray, array.array, and the NumPy array types, all of which hold homogeneous simple-value types and can be accessed as buffers. And bytearray and array.array are resizable. And np.ndarray--like a C array, in effect--returns views when sliced rather than copies. Just about everything this name implies is misleading. And as for recordarray, that doesn't exactly scream "like a namedtuple, but with array instead of tuple". Also, in the weeks this thing has been discussed, no one has yet come up with a use case where. Look at the motivating example, Point--do you think location[2] meaning the same thing as location.z is a good thing? That's neither easy nor common in C and its OO descendants, or SmallTalk and its, or most other languages, and I can't remember ever being bothered by that. The best anyone has come up with is that it might be more space-efficient than a SimpleNamespace or a standard __slots__ class, for all those times when you really need a billion points and can afford to needlessly waste 140 bytes per point instead of 12, but can't afford to waste 188.
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
On Mar 27, 2015 5:33 AM, "Andrew Barnert" <abarnert@yahoo.com.dmarc.invalid> wrote:
написал: that this thing is halfway between a tuple and a list. Also, we've already got a bytearray, array.array, and the NumPy array types, all of which hold homogeneous simple-value types and can be accessed as buffers. And bytearray and array.array are resizable. And np.ndarray--like a C array, in effect--returns views when sliced rather than copies. Just about everything this name implies is misleading.
And as for recordarray, that doesn't exactly scream "like a namedtuple,
but with array instead of tuple". Numpy already has the concept of a "record array", which they call "recarray", which is a numpy array that is partially accessible in a similar manner to named tuples. However, otherwise they are the same as numpy arrays, which means they have a fixed size (usually) but mutable contents. Whether you consider this a point in favor or a point against, however, probably depends on your point of view.
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Tue, Mar 17, 2015 at 10:52 AM, Luciano Ramalho <luciano@ramalho.org> wrote:
Should it also have all the same methods as a namedtuple class (e.g. the tuple methods, _make, _replace)? What about the other namedtuple attrs (_fields, etc.)? What's the motivation for parity with namedtuple (particularly iteration)? I suppose I see the desire for a fixed set of mutable attributes specific to the generated type. However, there have been numerous discussions on this list about alternative approaches to namedtuple which apply here. Simply adapting namedtuple may not be the right thing. Regardless, this is the sort of thing that should bake outside the stdlib for a while to prove it's approach and its worth, much as namedtuple did. It would also help if there were a concrete use case in the stdlib that this new class (factory) would satisfy.
Don't forget types.SimpleNamespace. :) -eric
data:image/s3,"s3://crabby-images/ab219/ab219a9dcbff4c1338dfcbae47d5f10dda22e85d" alt=""
On 03/17/2015 12:52 PM, Luciano Ramalho wrote:
https://pypi.python.org/pypi/namedlist It also adds default values to the generated constructor, which may or may not be desirable. But if used exactly like collections.namedtuple, it ignores the default values. Eric.
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
вторник, 17 марта 2015 г., 20:21:01 UTC+3 пользователь Eric V. Smith написал:
attribute access. The mutable alternative could be considered as an array with attribute access. Array in this context is tuple-like object that support assign operation. Since python have not such object there are different approaches tomutable named tuple alternatives. One should note that particular property of named tuple is memory saving. So one can expect similar property of mutable named tuple too.
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 19, 2015, at 12:04 AM, Zaur Shibzukhov <szport@gmail.com> wrote:
Python definitely does have such an object: list. A list is effectively the same as a tuple but mutable; it's the paradigm MutableSequence while tuple is the paradigm Sequence. Under the covers they have very similar headers that both use the same storage (a C array of pointers to Python objects, in CPython), and C API functions like PySequence_Fast_GET_ITEM don't distinguish between the two. However, list is resizable, and presumably a "namedlist" would not be. That makes things more complicated for both the interface (there's no is-a relationship; a type without append is not a list--and, worse, a type that has __setitem__ but can't handle slice replacement is not a list but that's very hard to detect...) and the implementation (e.g., a list reserves extra space at the end to avoid having to reallocate on every append). (Python _also_ has an array type, which is for homogenous simple types (like 32-bit int) which can store the values directly, as opposed to tuple and list, which store (pointers to) heterogenous normal Python objects.)
One should note that particular property of named tuple is memory saving. So one can expect similar property of mutable named tuple too.
If you don't need to access the items by index for whatever reason, you don't need a namedtuple, and using one as a misguided misoptimization is a bad idea. Besides the fact that a normal class with __slots__ is also small, and even a normal class with a dict (in newer CPython versions and PyPy) not that much bigger, besides the fact that you can eliminate the row overhead rather than just slightly reducing it by using, e.g., a 2D array, you're optimizing the wrong thing in the first place--if your rows have 9 elements, reducing the row overhead is focusing on fixing 10% of your overhead, while reducing or eliminating the element overhead by using, e.g., a 2D numpy array of low-level values fixes the 90% (along with the 10%).
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
That all right. But I want to note that `collections.namedtuple` has several properties that make them exclusive: 1. Fast creation; 2. Minimal memory capacity; 3. Fast sequence interface; 4. Attribute access to elements via properties. Different namedtuple alternatives has different sets of properties that make them more ore less suitable depending on use cases. So if someone search alternative of collections.namedtuple that support assignment too then it could be constructed on top of array (actually "tuple" but with assignment support, but python seems have not such array type). --- *Zaur Shibzukhov* 2015-03-19 11:37 GMT+03:00 Andrew Barnert <abarnert@yahoo.com>:
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
print(sys.getsizeof(list([])), sys.getsizeof(tuple([]))) 64 48
Yes. But: First. print(sys.getsizeof(list([1,2])), sys.getsizeof(tuple([1,2]))) 104 64 Second. Tuple object allocates it's memory 1 time, list object allocates it's memory 2 time. That is why
Certainly this can have or not have a value depending on use case. --- *Zaur Shibzukhov* 2015-03-19 13:07 GMT+03:00 Andrew Barnert <abarnert@yahoo.com>:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 19, 2015, at 3:43 AM, Zaur Shibzukhov <szport@gmail.com> wrote:
As I explained, list leaves room for expansion, so it doesn't have to keep reallocating with every append. A 2-element list actually has room for... off the top of my head, I think 8 elements; then it multiplies the capacity every time it runs out on an append. In a C extension, you can create a list with whatever specific capacity you want, but that isn't exposed to Python (presumably because such a micro-optimization is rarely useful, and would complicate the API, as well as being an attractive nuisance to people who insist on optimizing the wrong things). So, if you really needed this optimization, you could implement your namedlist in C. But what use case are you imagining where the extra 40 bytes per row matter, but the 64 bytes per row don't, and neither do the 56 or so bytes per element per row? I'm having a hard time imagining any case where using tuples would be necessary where it wouldn't be woefully insufficient.
If you're going to show benchmarks, you really should use %timeit rather than %time, and also understand that "i for i in range(15)" is just going to give you a slower but equivalent iterable to just using "range(15)" in the first place. But more importantly, what use case are you considering where this extra 0.8us for construction of each row object will matter, but the difference between a namedtuple (or, presumably, namedlist) vs. a tuple won't? And, even if you did, you're focusing on something that accounts for at worst 25% of the construction time, and 0% of the time for all the actual work you do with the object after construction. Again, unless you have many millions of these, neither the memory not the construction time is going to matter--and if you do, a new type that's more like tuple isn't going to be anywhere near sufficient to make a noticeable difference, because you're optimizing the wrong part of both the memory and the time.
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
2015-03-19 14:22 GMT+03:00 Andrew Barnert <abarnert@yahoo.com>:
for i in [4, 12, 28, 68]: print(sys.getsizeof(list(range(i))), end=' ') 120 216 360 720
Yes you right. Actual optimization one could archive with the help of Cython if necessary. But what use case are you imagining where the extra 40 bytes per row matter, but the 64 bytes per row don't, and neither do the 56 or so bytes per element per row? I'm having a hard time imagining any case where using tuples would be necessary where it wouldn't be woefully insufficient. That is why
Certainly this can have or not have a value depending on use case.
import namedlist
Possibly basic gain in the savings of memory, especially in the context of long running processes, limitations on the memory on hostings, long loop + memory fragmentation in python heap space, ... --- *Zaur Shibzukhov* 2015-03-19 13:07 GMT+03:00 Andrew Barnert <abarnert@yahoo.com>:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
And what's your point? That's exactly what you should expect if lists keep extra slack space at the end to avoid reallocation on expansion.
Not if you're trying to put this in the stdlib; you have to write the C extension. But if the person who thinks this is necessary (you) prefers to use Cython instead of C, that seems like another argument against putting it in the stdlib...
But what use case are you imagining where the extra 40 bytes per row matter, but the 64 bytes per row don't, and neither do the 56 or so bytes per element per row? I'm having a hard time imagining any case where using tuples would be necessary where it wouldn't be woefully insufficient.
I've now asked this twice, and you've avoided answering it both times to present further irrelevancies. This is the key question. If you don't have any use case for which this matters, then it doesn't matter.
Despite the name, the third-party module "namedlist" does not appear to make classes that inherit from list, or use a list for storage. You just get a normal class (using an OrderedDict __dict__ by default, or optionally __slots__) that provides a __getitem__ method implemented in terms of __getattr__. So I don't see how this is even vaguely relevant here. And, even if that module were relevant, how would it have anything to do with answering "Where is the use case where this construction time matters?" You don't even _mention_ construction time here.
No. This is basic computer science. Optimizing parts of your code that aren't actually relevant because they contribute only a tiny percentage to the waste doesn't actually improve anything.
data:image/s3,"s3://crabby-images/1940c/1940cb981172fcc1dafcecc03420e31ecedc6372" alt=""
On Tue, Mar 17, 2015 at 7:52 PM, Luciano Ramalho <luciano@ramalho.org> wrote:
+1, but I think that the core problem with such proposals is that they lack the use cases. The only reason for me to have such class is to work with tabular data. For example, for querying the capability of the system, I need to build an inmemory table of features, and then set parameters for each feature one by one. sqlite/SQL is an overkill for that, and dicts are just not enough to do readable lookups and updates to specific cells, so I'd more appreciate a full table class than just its "named row" model. Practical example that I came up with: https://bitbucket.org/techtonik/discovery/src/c4f3d306bb43772dcf3c03be8db941... -- anatoly t.
data:image/s3,"s3://crabby-images/9851b/9851bf95beba08841f958ac928aee4a59945668f" alt=""
On Wed, Mar 18, 2015 at 3:05 AM, anatoly techtonik <techtonik@gmail.com> wrote:
Thanks for your input and example, Anatoly. What would the full table class offer that could not be easily done with a simple list of items produced with a plainclass class?
Practical example that I came up with: https://bitbucket.org/techtonik/discovery/src/c4f3d306bb43772dcf3c03be8db941...
Oh, I see you have an update method, so that's the key value add of the Table class, right? I see the Table and plainclass as complementary ideas. You could use plainclass to implement Table more easily. The Table.__iter__ could return plainclass class instances instead of OrderedDict. Cheers, Luciano -- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Professor em: http://python.pro.br | Twitter: @ramalhoorg
data:image/s3,"s3://crabby-images/1940c/1940cb981172fcc1dafcecc03420e31ecedc6372" alt=""
On Wed, Mar 18, 2015 at 2:36 PM, Luciano Ramalho <luciano@ramalho.org> wrote:
Lookup operations. Attaching new columns to existing table or providing a supplementary table with additional columns for existing ones. If Python had first class tables, things right now could be very different. We could have a better structured data handling. Take logging for example. It is not extensible. You have message and level. Then it got component. Then timestamp. But you can't add anything yourself to that message - error code, binary dump, selfie or some read/ unread flags. If logging used a table, there could be an ability to add your own data to its events. And this would be interoperable between programs.
Right. The update method says "change the value of cell with column name==name, in row where column name idname==idvalue". It is basically lookup method with cell modification. So it adds a second dimension to 1D "mutable namedtuple".
That's one of the uses. But I am concerned that it is the only example where this "mutable namedtuple" is useful. I don't like the dynamic class construction as with namedtuple. I believe you can not serialize it reliably, and there are problems with static analysis tools to deal with it (like locate the definition). -- anatoly t.
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
вторник, 17 марта 2015 г., 19:52:28 UTC+3 пользователь Luciano Ramalho написал:
There is an attempt to make such alternative: recordarray <https://pypi.python.org/pypi/recordarray>.
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
--- *Zaur Shibzukhov* 2015-03-27 7:08 GMT+03:00 Joonas Liik <liik.joonas@gmail.com>:
The name 'namedlist' is already used in https://pypi.python.org/pypi/namedlist, 'mutabletuple' -- too in https://pypi.python.org/pypi/mutabletuple. The name recordarray is means that it 1) array of objects and 2) record with access to fields by attributes.
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 26, 2015, at 21:26, Zaur Shibzukhov <szport@gmail.com> wrote:
If you're trying to provide the same concept, why use a completely unrelated name? That's like saying 'I want an ordered set, but there's already an "orderedset" on PyPI so I went with "sortedsequence"'.
The name recordarray is means that it 1) array of objects and 2) record with access to fields by attributes.
But how is being "an array of objects" any different from what a tuple, list, array.array, bytearray, bytes, str, etc. already are? What's specifically array-like about this type as opposed to all of those? And what's specifically record-like about your type compared to namedtuple, Struct, or SimpleNamespace?
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
--- *Zaur Shibzukhov* 2015-03-27 7:40 GMT+03:00 Andrew Barnert <abarnert@yahoo.com>:
I am inclined to the fact that it's better to rename `objectarray` to 'mutabletuple' in order to be explicit about what is that. 'recordarray' is a factory function that make exactly the same as 'namedtuple' factory does (except that it create subclass of 'mutabletuple' and make '_replace' to update the 'self', not make a copy). So may be it's better to call it as 'record' or 'recordtype', or even 'recordfactory'?
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
On 27 March 2015 at 01:40, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
Acutally, on my understanding, the request on this thread is for something that is quite concrete, existing in other languages, and that can be done in Python in a few lines, but is not in the stdlib: The Python equivalent of a C Struct. Just that. An easy to create class, with named fields, with possible type-enforcement for those fields. Or maybe it _does_ exist in Python, and it is a matter of having a nice example in the docs: for example a "blank" class with "__slots__" would do it. Or a blank class with slots that could serialize and deserialize itself to a sequence in a seamless way. class Base: __slots__ = () def __init__(self, seq=None): if not seq: return for attr, val in zip(self.slots, seq): setattr(self, attr, val) def __iter__(self): for attr in self.__slots__: yield getattr(self, attr) def NamedList(name, fields): ... # split string with space separated fields, and other niceities here return type(name, (Base,), dict(__slots__=fields)) And 10-15 more lines if one wants type-checking, default values, __repr__ into that. I think getting a proper recipe for this, and publicizing it on the documentation ecosystem is enough - maybe a Pypi module adding some more goodies - and if that would get any traction - the usual consideration for inclusion could apply.
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 27, 2015, at 06:22, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
But a C struct is not a subtype of, or substitutable for, a C array. It's not indexable. And the same is true with the equivalents in other languages. In fact, the dichotomy between struct--heterogeneous collection of fixed named fields--and array--homogeneous collection of indexed fields--goes back way before C. So, if you want the equivalent of a C struct, there's no reason to make it an iterable in Python. And a class already is the Python of a C struct, it's just that it can do _more_ than a C struct. A language like C++ that wants to share code and values with C has to bend over backward to make it possible to write a C++ struct (or class) that doesn't use any of its extra features and is therefore exactly equivalent to a C struct, but Python has no need for that. (Except when you actually do want to share values with C code--but for that case, we've got ctypes.Struct, which is exactly what you want in that situation.)
Just that. An easy to create class, with named fields,
Which is easy to do: just create a class, and create its fields in the __init__ method (or, in some cases, it's acceptable to use class attributes as "default values" for instance attributes).
with possible type-enforcement for those fields.
Of course namedtuple doesn't have type-enforcement for the fields. I'm not sure whether you're talking about MyPy static type checking, or runtime checking, but either way, it's easier to add onto a regular class than to a namedtuple-like class factory.
A blank class without __slots__ can also do it. There are times when __slots__ are useful, but usually you're fine with just a plain __dict__. Encouraging people to use it when they have no need for it just because it's more like idiomatic C would be a bad idea. (It's like encouraging people to use @property to get something more like idiomatic .NET or ObjC, when actually they should just be writing idiomatic Python and using attributes directly.)
Why do you want to serialize and deserialize to a sequence? A C struct can't do that, and neither can equivalent types in other languages.
Default values and __repr__ are _also_ not part of a C struct. So, again, if what you're looking for is the equivalent of a C struct, you can replace all of the above with: def Base: pass If you want other features that C structs don't have, then yes, you may have to write them, but the same is true in C (and, in fact, it's clumsier and more difficult in C).
data:image/s3,"s3://crabby-images/9851b/9851bf95beba08841f958ac928aee4a59945668f" alt=""
On Fri, Mar 27, 2015 at 8:13 PM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
So, if you want the equivalent of a C struct, there's no reason to make it an iterable in Python.
Yes, there is: iterable unpacking.
Which is easy to do: just create a class, and create its fields in the __init__ method (or, in some cases, it's acceptable to use class attributes as "default values" for instance attributes).
Boilerplate with lots of repetition, with little added value. For example, in a basic __init__ each attribute name usually occurs three times: as an argument name in the method declaration, and then twice when it's assigned to self. Ruby does much better, for example. Best, Luciano
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 27, 2015, at 16:29, Luciano Ramalho <luciano@ramalho.org> wrote:
Why? You can't do the equivalent in C or any of its descendants (or most other languages with a struct/record type, or most pedagogical or theoretical struct/record concepts). Nor can you do anything even vaguely similar. So why would anyone expect that "the equivalent of a C struct" in Python should be able to do something that a C struct, and its equivalents in other languages, can't? Also, the desire to _not_ have to use iterable unpacking is why we have namedtuple (and structseq in the C API) in the first place: to make tuples that can be used as records, not the other way around. A namedtuple stat result allows your users to access the fields by name instead of by index, which not only makes their code more readable, it also means stat can return different sets of extra fields on different platforms and in new versions without breaking their code. Even in C, this is important: because you access the PyObject fields by name, I can hand you a PyList* cast to a PyObject* and you can use it; if C allowed you to access it by iterable unpacking and you did so, I'd instead have to copy the PyObject fields of the PyList into a new PyObject that didn't have any extra fields.
Let's compare some C code and the equivalent Python: struct Person { const char *name; int age; } struct Person person_make(const char *name, int age) { struct Person p; p.name = strdup(name); p.age = age; return p; } class Person def __init__(self, name: str, age: int): self.name = name self.age = age You really think that this is not like a C struct because it has too much boilerplate compared to the C equivalent? Of course it's trivial to wrap up that boilerplate if you're going to create 20 of these. And to add in other functionality that C structs (and, except for the first, Python namedtuples) don't have that your project needs, like a nice repr, default values, runtime type checking, a JSON serialization schema, an ORM mapping, an HTML form representation, etc. If you really want to add in being a sequence, you can add that too--but again, what's the use case for that? It's certainly not being more like a C struct.
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
On 27 March 2015 at 22:09, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
Of course it's trivial to wrap up that boilerplate if you're going to create 20 of these. And to add in other functionality that C structs (and, except for the first, Python namedtuples) don't have that your project needs, like a nice repr, default values, runtime type checking, a JSON serialization schema, an ORM mapping, an HTML form representation, etc.
So -that is the point - it is trivial to do away with the boiler plate, as I've shown on the other message, - but there is no way to do it in the stdlib, so it is a wheel that is reinvented everytime. NamedTuples are a way to do _almost_ that: make it tirvial to create a class with only this fixed set of attributes, with as little boilerplate as one can think of - but one can't change the attributes on an instance of it. That is why the start of the thread is about a "mutable named tuple" - not because it is a tuple - but because it creates a basic class with fixed attributes that works nicely, with minimal boiler plate. Rethinking at my example, I think it does fit exactly in "too small to be an external dependence in itself, and with too many subtle behaviors to get done right again and again in several projects". js -><-
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sat, Mar 28, 2015 at 12:09 PM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
Here's a crazy thought: you could use functools.wraps() to abuse **kwargs. def make_attributes(func): @functools.wraps(func) def inner(self, **args): self.__dict__.update(args) inner(self, **args) return inner class Person: @make_attributes def __init__(self, *, name: str, age: int): pass Thanks to wraps(), you still have your parameter names for introspection and help() and so on. Thanks to **args, you can do bulk operations on all the args. It's a bit naughty (and it does preclude positional args, though a little bit more work in the decorator could support that too), but it would work..... ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Mar 27, 2015 at 04:13:46PM -0700, Andrew Barnert wrote:
Joao said "The Python equivalent of a C struct", not "a C struct". Python is not C, and Python data types are not limited to what C does. Python strings aren't limited to C null-delimited strings, and Python ints aren't limited to what C ints can do. I think the original thread was quite explicit about what is wanted: something like a mutable equivalent to namedtuple. Namedtuples are used in Python where C would use a struct, or Pascal a record, except that namedtuples (being tuples) are immutable. I think it's quite reasonable to want a mutable version. Effectively, namedtuple is just a convenience function for wrapping up a bunch of nice-to-have but not essential functionality around an immutable struct. Python got by with unnamed tuples for over a decade, so it's not like we *have* to have namedtuples. But having got them, would we go back to using regular tuples as a struct? Hell no. Having named fields is so much better.
And a class already is the Python of a C struct, it's just that it can do _more_ than a C struct.
This is why it is unfair to insist that a Python equivalent of a C struct be limited to what C structs do.
If this is so easy, why we have namedtuple *and* SimpleNamespace in the standard library. Are they both mistakes? SimpleNamespace is especially interesting. The docs say: "However, for a structured record type use namedtuple() instead." https://docs.python.org/3/library/types.html#types.SimpleNamespace which is great if you want an *immutable* structured record type, but not if you want a mutable one. Which brings us back to where this thread started: a request for a mutable version of namedtuple. That's trickier than namedtuple, because we don't have a mutable version of a tuple to inherit from. Lists won't do the job, because they have a whole lot of functionality that are inappropriate, e.g. sort, reverse, pop methods. That makes it harder to create a mutable structured record type, not simpler. Think about the functional requirements: - it should be semantically a struct, not a list or array; - with a fixed set of named fields; - fields should be ordered: a record with fields foo and bar is not the same as a record with fields bar and foo; - accessing fields by index would be a Nice To Have, but not essential; - but iteration is essential, for sequence unpacking; - values in the fields must be mutable; - it should support equality, but not hashing (since it is mutable); - it must have a nice repr and/or str; - being mutable, it may directly or indirectly contain a reference to itself (e.g. x.field = x) so it needs to deal with that correctly; - support for pickle; - like namedtuple, it may benefit from a handful of methods such as '_asdict', '_fields', '_make', '_replace' or similar. Does this sound easy to write? Well, sure, in the big picture, it's hardly a 100,000 line application. But it's not a trivial class. -- Steve
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Sat, Mar 28, 2015 at 7:37 AM, Steven D'Aprano <steve@pearwood.info> wrote:
+1, though it doesn't *necessarily* follow that a mutable equivalent is a good idea. This (and related threads) imply there is at least some support for the new type in principle. I haven't followed the threads too closely so I've missed any mention of solid pythonic use cases that would give the idea much more solid footing. However, I have seen references to prior art on the cheeseshop which may be used to provide harder evidence (both of support and of solid use cases). Regardless, I concur that there are many cases where types and functions have been added to the stdlib that weren't strictly necessary. Perhaps if those proposals had come from someone else or when the mood on python-dev was different then they would not have been added. That is what has happened with numerous other we-have-gotten-by-without-it-so-why-add-it ideas (which may also have proven themselves as namedtuple has). Ultimately we have to be careful in this space because, as Raymond often reminds us, it really is important to make the effort to keep Python small enough to fit in people's brains (and in *some* regard we've probably failed there already). With the great efforts in the last year to improve packaging, the cheeseshop is increasingly the better place for new types and helpers to live. With that in mind, perhaps we should start adding a section to the bottom of relevant docs that contains links to vetted PyPI packages (and recipes) that provide extended capabilities. We've done this already in a few select places (e.g. the 3.2+ namedtuple docs).
As the person who wrote that I'll point out that I added it to help make the distinction clearer between the two. At the time there were concerns about the similarities and with users getting confused about which to use. I will argue that "record type" implies an archive of data, ergo immutable. "Structured" refers to being ordered and having attribute access. IMHO that statement is clear and helpful, but if it has proven otherwise we should consider improving it. In contrast, I see the proposal here as somewhat of a middle ground. Folks are looking for a factory mechanism that produces classes with slots and have both iteration (for unpacking) and index lookup. So something like this: class FixedClassMeta(type): # Ideally this would be a "classonly" method (like classmethod but # class-only) method on FixedClass and not need a metaclass. def subclass(base, name, *fields): # XXX validate fields first args = ', '.join(fields) body = '\n '.join('self.{0} = {0}'.format(f) for f in fields) code = """def __init__(self, {}):\n {}""".format(args, body) ns = {} exec(code, ns) class X(base): __slots__ = fields __init__ = ns['__init__'] X.__name__ = name X.__qualname__ = X.__qualname__.replace('X', name, 1) X.__doc__ = """...""" return X class FixedClass(metaclass=FixedClassMeta): __slots__ = () def __repr__(self): items = ("{}={!r}".format(f, getattr(self, f)) for f in self.__slots__) return "{}({})".format(self.__class__.__name__, ', '.join(items)) def __iter__(self): # for unpacking return (getattr(self, f) for f in self.__slots__) def __getitem__(self, index): field = self.__slots__[index] try: return getattr(self, field) except AttributeError: raise IndexError(index) # Index lookup exists for convenience, but assignment & deletion # are not in scope. def fixedClass(name, field_names): """A factory that produces classes with fixed, ordered attributes. The returned class has __slots__ set to the field names, as well as __iter__ (for unpacking) and __getitem__ implemented. """ if isinstance(field_names, str): fields = field_names.replace(',', ' ').split() else: fields = field_names return FixedClass.subclass(name, *fields) That said, I'm still not clear on what the use cases are.
This is the key point. It is a fixed-size class with iteration for unpacking and index lookup for convenience. A full-fledged mutable namedtuple doesn't make sense (without clear use cases).
Ah, my example above would have to grow __eq__ then.
- accessing fields by index would be a Nice To Have, but not essential;
Exactly. Not Essential.
- but iteration is essential, for sequence unpacking;
This brings to mind a different proposal that has come up in the past (a separate "dunder" method for unpacking). Iteration seems out of place for here, but we need it for sequence unpacking.
Ah, yes. "RuntimeError: maximum recursion depth exceeded". :)
Perhaps. I think there are a few things we can learn from namedtuple that can be applied for this hypothetical new type/factory. And to add to your list: - performance should be a consideration since the apparent use cases relate to handling many of these as "records". Again, I'm not sold on the benefit of this over the existing alternatives. For records use namedtuple (with the _replace method for "mutation"). -eric
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Sat, Mar 28, 2015 at 12:11 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
What ever became of that proposal? If I recall correctly, it had moderate support but the champion wasn't able to continue pursuing it. -eric
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 28, 2015, at 06:37, Steven D'Aprano <steve@pearwood.info> wrote:
Sure, but nobody just invents random new features to add to int and then justifies them by saying "I want the equivalent of a C int" even though C int doesn't have those features. People invent features (like bit_length) to solve actual use cases, and justify them based on those use cases. Multiple people have asked "what do you want this for?", and the best answer anyone's given has been "the equivalent of a C struct". (That, and to prematurely and badly optimize memory usage.) Even worse, when I ask why specifically anyone wants this thing to be iterable, the answer is "to be the equivalent of a C struct", and that doesn't answer the question. [snip]
I'm not saying a record type shouldn't be allowed to have any features that C structs don't, just that equivalency with C structs isn't an argument for features that C structs don't have. Some of the extra features are so obviously desirable that they probably don't need any argument--if you're going to build this thing, having a nice repr or not breaking pickle seems hard to argue against. But iterability is not that kind of obvious win. Also, how is it "unfair" to suggest that this thing should be limited in some ways? For example, two instances of the same class can have completely different fields; presumably two instances of the same record type really shouldn't. There's no reason it _couldn't_ be completely open like a general class, it's just that you usually don't want it to be. Similarly, there's no reason it couldn't be a sequence, but I don't think you usually want it to be. [snip]
Sure. And that's the problem. If you want something that's "just like a sequence whose elements can be replaced but whose shape is fixed, except that the elements are also named", you run into the problem that Python doesn't have such a sequence type. It's a perfectly coherent concept, and there's no reason you could design a language around immutable, fixed-shape-mutable, and mutable-shape sequences instead of just the first and last, but that's not the way Python was designed. Should that be changed? Or is the only use for such a type to underlie this new type?
Note that namedtuples are nominatively typed, not structurally--a record with fields foo and bar is not necessarily the same as another record with fields foo and bar. Ordering doesn't enter into it; they were defined separately, so they're separate types. Do you want the same behavior here, or the behavior your description implies instead?
- accessing fields by index would be a Nice To Have, but not essential;
Why would that be nice to have? The record/sequence dichotomy has been fundamental to the design of languages since the earliest days, and it's still there in almost all languages. Maybe every language in the world is wrong--but if so, surely you can explain why? For structseq, there was a good reason: a stat result is a 7-tuple as well as being a record with 13-odd fields, because there was a pre-existing mass of code that used stat results as 7-tuples, but people also wanted to be able to access the newer or not-100%-portable fields. That's a great use case. And people have used structseq in other similar examples to migrate users painlessly from an early API that turned out to be too simple and limited. And namedtuple gives you a way to write APIs in a similar style that previously could only be (easily) written with a C extension, which is an obvious win. That's clearly not the case here--nobody has existing APIs that use a fixed-length but mutable sequence that they want to expand into something more flexible, because Python doesn't come with such a sequence type. Of course that's not the only use anyone's ever found for, respectively, structseq and namedtuple--e.g., converting to namedtuple turns out to be handy for cases where you want a record but some external API like SQL gives you a sequence, and that would probably be a good enough justification for namedtuple too. But what is the use that justifies this addition? (For example, if you need to take SQL rows as a sequence, mutate them by name, and then do something I can't imagine with them that requires them to still be a sequence, that would be a pretty good answer.)
- but iteration is essential, for sequence unpacking;
Again, why is that essential? TOOWTDI isn't an iron-clad rule, but it's a good rule of thumb; adding a second way to access the members of a record that's both unique to Python and less Pythonic seems like a bad idea, unless there's some good reason that overbalances it in the other direction. Think of stat code: it's a lot more readable when you access the fields by name instead of by unpacking. Why wouldn't the same be true for, say, a Person record, or an Address record, or an ImageHeader record, or almost anything else you can imagine? (I can think of one particular special case where it might be nice: small, homogenous, essentially-sequence-like records like a Vector or Point or... Well, really just a Vector or Point. And they're clearly special. Both in C and in Python, you're often torn between storing them as an array or as a sequence, and you'll find different apps doing it each way. That isn't true for a Person or Address etc.)
data:image/s3,"s3://crabby-images/135b7/135b745fcb50f0bcc827b93de23ee1213e1b844e" alt=""
It is possible that this is recordclass <https://pypi.python.org/pypi/recordclass> <wink> A short example <http://nbviewer.ipython.org/urls/bitbucket.org/intellimath/recordclass/raw/d...> to illustrate that fact. суббота, 28 марта 2015 г., 16:37:40 UTC+3 пользователь Steven D'Aprano написал:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Mar 26, 2015, at 21:08, Joonas Liik <liik.joonas@gmail.com> wrote:
namedlist perhaps? :)
if you want:"namedtuple, but mutable. " then namedlist seems like the obvious alternative..
But, as discussed earlier in the thread, a list isn't right, or at least isn't obviously right, because lists can change size, and what would it mean for a namedlist to, say, delete element 3? And that's really the problem: Python's entire infrastructure is designed around things which are reshapable like lists, or immutable like tuples, and this doesn't fit either one.
The name "array" seems really unfortunate. It doesn't give you any clue that this thing is halfway between a tuple and a list. Also, we've already got a bytearray, array.array, and the NumPy array types, all of which hold homogeneous simple-value types and can be accessed as buffers. And bytearray and array.array are resizable. And np.ndarray--like a C array, in effect--returns views when sliced rather than copies. Just about everything this name implies is misleading. And as for recordarray, that doesn't exactly scream "like a namedtuple, but with array instead of tuple". Also, in the weeks this thing has been discussed, no one has yet come up with a use case where. Look at the motivating example, Point--do you think location[2] meaning the same thing as location.z is a good thing? That's neither easy nor common in C and its OO descendants, or SmallTalk and its, or most other languages, and I can't remember ever being bothered by that. The best anyone has come up with is that it might be more space-efficient than a SimpleNamespace or a standard __slots__ class, for all those times when you really need a billion points and can afford to needlessly waste 140 bytes per point instead of 12, but can't afford to waste 188.
data:image/s3,"s3://crabby-images/d82cf/d82cfdcfaa7411c61e6ca877f84970109000fbcc" alt=""
On Mar 27, 2015 5:33 AM, "Andrew Barnert" <abarnert@yahoo.com.dmarc.invalid> wrote:
написал: that this thing is halfway between a tuple and a list. Also, we've already got a bytearray, array.array, and the NumPy array types, all of which hold homogeneous simple-value types and can be accessed as buffers. And bytearray and array.array are resizable. And np.ndarray--like a C array, in effect--returns views when sliced rather than copies. Just about everything this name implies is misleading.
And as for recordarray, that doesn't exactly scream "like a namedtuple,
but with array instead of tuple". Numpy already has the concept of a "record array", which they call "recarray", which is a numpy array that is partially accessible in a similar manner to named tuples. However, otherwise they are the same as numpy arrays, which means they have a fixed size (usually) but mutable contents. Whether you consider this a point in favor or a point against, however, probably depends on your point of view.
participants (11)
-
anatoly techtonik
-
Andrew Barnert
-
Chris Angelico
-
Eric Snow
-
Eric V. Smith
-
Joao S. O. Bueno
-
Joonas Liik
-
Luciano Ramalho
-
Steven D'Aprano
-
Todd
-
Zaur Shibzukhov