Mailman 3 Add recordlcass to collections module - Python-ideas - python.org

newer
Re: [Python-ideas] Pre-conditions...

Add recordlcass to collections module

older
Why shouldn't Python be better at...

Martin Bammer

Sept. 1, 2018

7:47 a.m.

Hi, what about adding recordclass (https://bitbucket.org/intellimath/recordclass) to the collections module It is like namedtuple, but elements are writable and it is written in C and thus much faster. And for convenience it could be named as namedlist. Regards, Martin

Reply

Sign in to reply online Use email software

Show replies by date

Steven D'Aprano

September 2018

8:25 a.m.

On Sat, Sep 01, 2018 at 09:47:04AM +0200, Martin Bammer wrote:

Hi,

what about adding recordclass (https://bitbucket.org/intellimath/recordclass) to the collections module

The first thing you need to do is ask the author of that library whether or not he or she is willing to donate the library to the Python stdlib, which (among other things) means keeping to the same release schedule as the rest of the stdlib.

It is like namedtuple, but elements are writable and it is written in C and thus much faster.

Faster than what?

And for convenience it could be named as namedlist.

Why? Is it a list? How or why is it better than dataclasses? -- Steve

Reply

Sign in to reply online Use email software

Jonathan Fine

4:10 p.m.

Hi Martin Summary: Thank you. Your suggestion has good points. I suggest to advance it (i) provide a pure Python implementation of namedlist, and (ii) ask that the Python docs for namedtuple provide a link to namedlist. Thank you, Martin, for bringing https://bitbucket.org/intellimath/recordclass to this list's attention. Here's my first impressions. Here's the good things I've noticed (although not closely examined). 1. This is released software, available through pip. 2. There's a link on the page to an example in a Jupyter notebook. 3. That page gives performance statistics for the C-implementation. 4. The key idea is simple and well expressed. 5. The promoter (you) is not the package's author. Of all the suggestions made to this list, I'd say based on the above that this one is in the top quarter. The credit for this belong mostly, of course its author Zaur Shibzukhov. By the way, there's a mirror of the bitbucket repository here https://github.com/intellimath/recordclass. Here's my suggestions for going forward. They're based on my guess that there's some need for a mutable variant of named tuple, but not the same need for a C implementation. And they're based on what I like, rather than the opinions of many. 1. Produce a pure Python implementation of recordclass. 2. Instead, as you said, call it namedlist. 3. Write some docs for the new class, similar to https://docs.python.org/3/library/collections.html#collections.namedtuple 4. Once you've done 1-3 above, request that the Python docs reference the new class in the "See also" for named tuple. Mutable and immutable is, for me, a key concept in Python. Here's an easy way to 'modify' a tuple:

Of course, 'modify' means create a new one, changed in some way. And if the original is a namedtuple, that it makes sense to use namedlist. Here are some final remarks. (All my own opinions, not deep truth.) 1. Focus on getting and meeting the expressed needs of users. A link from the Python docs will help here. 2. Don't worry about performance of the pure Python implementation. It won't hold back progress. 3. I'd personally like to see something like numpy, but for combinatorial rather than numerical computing. Perhaps the memoryslots.c (on which recordclass depends) might be useful here. But that's further in the future. Once again, thank you for Martin, for bringing this to our attention. And to Zaur for writing the software. -- best regards Jonathan

Reply

Sign in to reply online Use email software

Steven D'Aprano

12:18 a.m.

On Sat, Sep 01, 2018 at 05:10:49PM +0100, Jonathan Fine wrote:

Before Martin (and you) get carried away doing these things, there's a lot more to do first. For starters, how about answering the questions I asked? Recapping: - The package author describes this as a record class, not a list, and it doesn't seem to support any list operations, so why do you and Martin want to change the name to namedlist? - What would it mean to insert, sort, append etc named items in a list? When would you want to do it? - Have you asked the author what he thinks about putting it in the standard library? (The author calls it a "proof of concept", and it is version 0.5. That doesn't sound like the author considers this a mature product.) - How is this different from data classes? If the answer is that this supports iteration, why not add iteration to data classes? See this thread: https://mail.python.org/pipermail/python-ideas/2018-August/052683.html [...]

1. Focus on getting and meeting the expressed needs of users. A link from the Python docs will help here.

It's not the job of the Python docs to link to every and any third-party package that somebody might find useful. It might -- perhaps -- make sense for the docs to mention or link to third-party libraries such as numpy which are widely recognised as "best of breed". (Not that numpy needs a link from the std lib.) But in general it is hardly fair for us to single out some arbitrary third- party libraries for official recognition while other libraries, perhaps better or more worthy, are ignored. Put yourself in the shoes of somebody who has worked hard to get a package into a mature state, and then the Python docs start linking to a competing alpha-quality package just because by pure chance, that was the package that got mentioned on Python-Ideas first. -- Steve

Reply

Sign in to reply online Use email software

Martin Bammer

8:56 p.m.

Hi, then intention of my first mail was to start a discussion about this topic about the pros and cons and possible alternatives. As long as it is not clear that recordclass or something like that is accepted to be implemented to the collections module I do not want to spend any effort on this. My wish that the collections module gets something like namedtuple, but writable, is based on my personal experience when projects are becoming bigger and data structures more complex it is sometimes useful to named items and not just an index. This improves the readability and makes development and maintenance of the code easier. Another important topic for me is performance. When I write applications then they should finish their tasks quickly. The performance of recordclass was one reason for me to use it (some benchmarks with Python 2 can be found on here https://gist.github.com/grantjenks/a06da0db18826be1176c31c95a6ee572). I've done some more recent and additional benchmarks with Python 3.7 on Linux which you can find here https://github.com/brmmm3/compare-recordclass. These new benchmarks show that namedtuple is as fast as recordclass in all cases, but with named attribute access. Named attribute access is faster with recordclass. Compared to dataclass: dataclass wins only on the topic object size. When it comes to speed and functionality (indexing, sorting) dataclass would be my last choice. Yes it is possible to make dataclass fast by using __slots__, but this always an extra programming effort. namedtuple and recordclass are easy to use with small effort. Adding new items: This is not possible with namedtuple and also not possible with recordclass. I see no reason why a namedlist should support this, because with these object types you define new object types and these types should not change. I hope 3.8 will get a namedlist and maybe it will be the recordclass module (currently my choice). As the author of this module already has responded to this discussion I hope he willing to contribute his code to the Python project. Best regards, Martin

Reply

Sign in to reply online Use email software

Steven D'Aprano

11:49 p.m.

On Sun, Sep 02, 2018 at 10:56:50PM +0200, Martin Bammer wrote:

Compared to dataclass: dataclass wins only on the topic object size. When it comes to speed and functionality (indexing, sorting) dataclass would be my last choice.

I see no sign that recordclass supports sorting. (But I admit that I haven't tried it.) What would it mean to sort a recordclass? Person = recordclass('Person', 'personalname familyname address') fred = Person("Fred", "Anderson", "123 Main Street") fred.sort() print(fred) => output: Person(personalname='123 Main Street', familyname='Anderson', address='Fred') [...]

Adding new items: This is not possible with namedtuple and also not possible with recordclass. I see no reason why a namedlist should support this,

If you want to change the name and call it a "list", then it needs to support the same things that lists support.

because with these object types you define new object types and these types should not change.

Sorry, I don't understand that. How do you get "no insertions" from "can't change the type"? A list remains a list when you insert into it. In case it isn't clear, I think there is zero justification for renaming recordclass to namedlist. I don't think "named list" makes sense as a concept, and recordclass surely doesn't implement a list-like interface. As for the idea of adding a recordclass or mutable-namedtuple or whatever to the stdlib, the idea seems reasonable but its not clear to me that dataclass wouldn't be suitable. -- Steve

Reply

Sign in to reply online Use email software

Jacco van Dorp

7:23 a.m.

This feels really useful to me to make some quick changes to a database - perhaps a database layer could return an class of type Recordclass, and then you just simply mutate it and shove it back into the database. Pseudocode: record = database.execute("SELECT * FROM mytable WHERE primary_key = 15") record.mostRecentLoggedInTime = time.time() database.execute(f"UPDATE mytable SET mostRecentLoggedInTime = {record.mostRecentLoggedInTime} WHERE primary_key = {record.primary_key}":) Or any smart database wrapper might just go: database.updateOrInsert(table = mytable, record = record) And be smart enough to figure out that we already have a primary key unequal to some sentinel value like None, and do an update, while it could do an insert if the primary key WAS some kind of sentinel value. which is something I really wanted to do in the past with namedTuples, but had to use dicts for instead. Also, it's rather clear that namedList is a really bad name for a Recordclass. It's cleary not intended to be a list. It's a record you can take out from somewhere, mutate, and push back in. We often use namedTuples as records now, but we can't just mutate those to shove them back in - you have to make new ones, and unless you write a smart wrapper for database handling yourself, you can't just shove them in either. Recordclass could be the gateway drug to a smart database access layer that reduces the amount of SQL we need to write - and that's a good thing in my opinion.

Reply

Sign in to reply online Use email software

Jonathan Goble

7:41 a.m.

On Mon, Sep 3, 2018 at 3:25 AM Jacco van Dorp <j.van.dorp@deonet.nl> wrote:

So call it "namedrecord", perhaps?

Reply

Sign in to reply online Use email software

Wes Turner

7:59 a.m.

On Mon, Sep 3, 2018 at 3:25 AM Jacco van Dorp <j.van.dorp@deonet.nl> wrote:

SQLAlchemy.orm solves for this (with evented objects with evented attributes): http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#sessi... - Transient, Pending, Persistent, Deleted, Detached http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.att... - flag_modified isn't necessary in most cases because attribute mutation on mapped classes deriving from Base(declarative_base()) is evented http://docs.sqlalchemy.org/en/latest/orm/session_events.html#attribute-chang... http://docs.sqlalchemy.org/en/latest/orm/tutorial.html There are packages for handling attribute states with the Django ORM, as well: - https://github.com/romgar/django-dirtyfields - https://github.com/Suor/django-dirty What would be the performance impact of instead subclassing from recordclass? IDK. pyrsistent.PRecord(PMap) is immutable and supports .attribute access: https://github.com/tobgu/pyrsistent#precord

Reply

Sign in to reply online Use email software

Chris Angelico

8:16 a.m.

On Mon, Sep 3, 2018 at 5:23 PM, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:

In its purest form, what you're asking for is an "upsert" or "merge" operation: https://en.wikipedia.org/wiki/Merge_(SQL) In a multi-user transactional database, there are some fundamentally hard problems to implementing a merge. I'm not 100% certain, so I won't say "impossible", but it is certainly *extremely difficult* to implement an operation like this in application-level software without some form of race condition. ChrisA

Reply

Sign in to reply online Use email software

Wes Turner

8:31 a.m.

On Mon, Sep 3, 2018 at 4:17 AM Chris Angelico <rosuav@gmail.com> wrote:

http://docs.sqlalchemy.org/en/latest/orm/contextual.html#contextual-thread-l... - scoped_session http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#mergi... http://docs.sqlalchemy.org/en/latest/orm/session_basics.html obj = ExampleObject(attr='value') assert obj.id is None session.add(obj) session.flush() assert obj.id is not None session.commit()

Reply

Sign in to reply online Use email software

Chris Angelico

8:39 a.m.

On Mon, Sep 3, 2018 at 6:31 PM, Wes Turner <wes.turner@gmail.com> wrote:

Yep. What does it do if it's on a back-end database that doesn't provide a merge/upsort intrinsic? What if you have a multi-column primary key? There are, of course, easier sub-forms of this (eg you mandate that the PK be a single column and be immutable), but if there is any chance that any other client might simultaneously be changing the PK of your row, a perfectly reliable upsert/merge basically depends on the DB itself providing that functionality. ChrisA

Reply

Sign in to reply online Use email software

Wes Turner

9:22 a.m.

On Mon, Sep 3, 2018 at 4:40 AM Chris Angelico <rosuav@gmail.com> wrote:

On Mon, Sep 3, 2018 at 6:31 PM, Wes Turner <wes.turner@gmail.com> wrote:

...
On Mon, Sep 3, 2018 at 4:17 AM Chris Angelico <rosuav@gmail.com> wrote:

...
On Mon, Sep 3, 2018 at 5:23 PM, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:

...
This feels really useful to me to make some quick changes to a

database

...
...
...
- perhaps a database layer could return an class of type Recordclass, and then you just simply mutate it and shove it back into the database. Pseudocode:

record = database.execute("SELECT * FROM mytable WHERE primary_key = 15") record.mostRecentLoggedInTime = time.time() database.execute(f"UPDATE mytable SET mostRecentLoggedInTime = {record.mostRecentLoggedInTime} WHERE primary_key = {record.primary_key}":)

Or any smart database wrapper might just go:

database.updateOrInsert(table = mytable, record = record)

And be smart enough to figure out that we already have a primary key unequal to some sentinel value like None, and do an update, while it could do an insert if the primary key WAS some kind of sentinel value.

In its purest form, what you're asking for is an "upsert" or "merge" operation:

https://en.wikipedia.org/wiki/Merge_(SQL)

In a multi-user transactional database, there are some fundamentally hard problems to implementing a merge. I'm not 100% certain, so I won't say "impossible", but it is certainly *extremely difficult* to implement an operation like this in application-level software without some form of race condition.

http://docs.sqlalchemy.org/en/latest/orm/contextual.html#contextual-thread-l...

...
- scoped_session

http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#mergi...

...
http://docs.sqlalchemy.org/en/latest/orm/session_basics.html

obj = ExampleObject(attr='value') assert obj.id is None session.add(obj) session.flush() assert obj.id is not None session.commit()

Yep. What does it do if it's on a back-end database that doesn't provide a merge/upsort intrinsic? What if you have a multi-column primary key? There are, of course, easier sub-forms of this (eg you mandate that the PK be a single column and be immutable), but if there is any chance that any other client might simultaneously be changing the PK of your row, a perfectly reliable upsert/merge basically depends on the DB itself providing that functionality.

There's yet another argument for indeed, immutable surrogate primary keys. With appropriate foreign key constraints, changing any part of the [composite] PK is a really expensive operation because all references must also be updated (w/ e.g. ON UPDATE CASCADE), and that doesn't fix e.g. existing URLs or serialized references in cached JSON documents. Far better, IMHO, to just enforce a UNIQUE constraint on those column(s). UUIDs don't require a central key allocation service (such as AUTOINCREMENT, which is now fixed in MySQL AFAIU);. Should the __hash__() of a recordclass change when attributes are modified? http://www.attrs.org/en/stable/hashing.html has a good explanation. In general, neither .__hash__() nor id(obj) are good candidates for a database primary key because when/if there are collisions (birthday paradox) -- e.g. when an INSERT or UPSERT or INSERT OR REPLACE fails -- it has to change. Sorry getting OT, something like COW immutability is actually desirable with SQL databases, too. Database backups generally require offline intervention in order to rollback; if there's even a backup which contains those transactions. https://en.wikipedia.org/wiki/Temporal_database#Implementations_in_notable_p... (SELECT, ) https://django-reversion.readthedocs.io/en/stable/

Reply

Sign in to reply online Use email software

Zaur Shibzukhov

6:24 p.m.

As the author of `recordclass` I would like to shed some light... Recorclass originated as a response to the [question](https://stackoverflow.com/questions/29290359/existence-of-mutable-named-tupl...) on stackoverflow. `Recordclass` was conceived and implemented as a type that, by api, memory and speed, would be completely identical to` namedtuple`, except that it would support an assignment in which any element could be replaced without creating a new instance, as in ` namedtuple`. Those. would be almost identical to `namedtuple` and support the assignment (` __setitem__` / `setslice__`). The effectiveness of namedtuple is based on the effectiveness of the `tuple` type in python. In order to achieve the same efficiency it was necessary to create a type `memoryslots`. Its structure (`PyMemorySlotsObject`) is identical to the structure of` tuple` (`PyTupleObject`) and therefore takes up the same amount of memory as` tuple`. `Recordclass` is defined on top of` memoryslots` just like `namedtuple` above` tuple`. Attributes are accessed via a descriptor (`itemgetset`), which supports both` __get__` and `__set__` by the element index. The class generated by `recordclass` is: `` ` from recordclass import memoryslots, itemgetset class C (memoryslots): __slots__ = () _fields = ('attr_1', ..., 'attr_m') attr_1 = itemgetset (0) ... attr_m = itemgetset (m-1) def __new __ (cls, attr_1, ..., attr_m): 'Create new instance of {typename} ({arg_list})' return memoryslots .__ new __ (cls, attr_1, ..., attr_m) `` ` etc. following the `namedtuple` definition scheme. As a result, `recordclass` takes up as much memory as` namedtuple`, it supports quick access by `__getitem__` /` __setitem__` and by attribute name via the protocol of the descriptors. Regards, Zaur суббота, 1 сентября 2018 г., 10:48:07 UTC+3 пользователь Martin Bammer написал:

Reply

Sign in to reply online Use email software

Greg Ewing

11:09 p.m.

Zaur Shibzukhov wrote:

I'm not sure why you need a new C-level type for this. Couldn't you get the same effect just by using __slots__? e.g. class C: __slots__ = ('attr_1', ..., 'attr_m') def __new __ (cls, attr_1, ..., attr_m): self.attr_1 = attr_1 ... self.attt_m = attr_m -- Greg

Reply

Sign in to reply online Use email software

Zaur Shibzukhov

6:17 p.m.

понедельник, 3 сентября 2018 г., 2:11:06 UTC+3 пользователь Greg Ewing написал:

slow. So if you don't need fast access by index but only by name then using __slots__ is enough. Recordclass is actually a fixed array with named access to the elements in the same manner as namedtuple is a actually a tuple with named access to it's elements.

Reply

Sign in to reply online Use email software

Steven D'Aprano

September 2018

1:25 a.m.

On Sat, Sep 01, 2018 at 09:47:04AM +0200, Martin Bammer wrote:

Hi,

what about adding recordclass (https://bitbucket.org/intellimath/recordclass) to the collections module

The first thing you need to do is ask the author of that library whether or not he or she is willing to donate the library to the Python stdlib, which (among other things) means keeping to the same release schedule as the rest of the stdlib.

It is like namedtuple, but elements are writable and it is written in C and thus much faster.

Faster than what?

And for convenience it could be named as namedlist.

Why? Is it a list? How or why is it better than dataclasses? -- Steve

Reply

Sign in to reply online Use email software

Jonathan Fine

9:10 a.m.

Hi Martin Summary: Thank you. Your suggestion has good points. I suggest to advance it (i) provide a pure Python implementation of namedlist, and (ii) ask that the Python docs for namedtuple provide a link to namedlist. Thank you, Martin, for bringing https://bitbucket.org/intellimath/recordclass to this list's attention. Here's my first impressions. Here's the good things I've noticed (although not closely examined). 1. This is released software, available through pip. 2. There's a link on the page to an example in a Jupyter notebook. 3. That page gives performance statistics for the C-implementation. 4. The key idea is simple and well expressed. 5. The promoter (you) is not the package's author. Of all the suggestions made to this list, I'd say based on the above that this one is in the top quarter. The credit for this belong mostly, of course its author Zaur Shibzukhov. By the way, there's a mirror of the bitbucket repository here https://github.com/intellimath/recordclass. Here's my suggestions for going forward. They're based on my guess that there's some need for a mutable variant of named tuple, but not the same need for a C implementation. And they're based on what I like, rather than the opinions of many. 1. Produce a pure Python implementation of recordclass. 2. Instead, as you said, call it namedlist. 3. Write some docs for the new class, similar to https://docs.python.org/3/library/collections.html#collections.namedtuple 4. Once you've done 1-3 above, request that the Python docs reference the new class in the "See also" for named tuple. Mutable and immutable is, for me, a key concept in Python. Here's an easy way to 'modify' a tuple:

Of course, 'modify' means create a new one, changed in some way. And if the original is a namedtuple, that it makes sense to use namedlist. Here are some final remarks. (All my own opinions, not deep truth.) 1. Focus on getting and meeting the expressed needs of users. A link from the Python docs will help here. 2. Don't worry about performance of the pure Python implementation. It won't hold back progress. 3. I'd personally like to see something like numpy, but for combinatorial rather than numerical computing. Perhaps the memoryslots.c (on which recordclass depends) might be useful here. But that's further in the future. Once again, thank you for Martin, for bringing this to our attention. And to Zaur for writing the software. -- best regards Jonathan

Reply

Sign in to reply online Use email software

Steven D'Aprano

5:18 p.m.

On Sat, Sep 01, 2018 at 05:10:49PM +0100, Jonathan Fine wrote:

Before Martin (and you) get carried away doing these things, there's a lot more to do first. For starters, how about answering the questions I asked? Recapping: - The package author describes this as a record class, not a list, and it doesn't seem to support any list operations, so why do you and Martin want to change the name to namedlist? - What would it mean to insert, sort, append etc named items in a list? When would you want to do it? - Have you asked the author what he thinks about putting it in the standard library? (The author calls it a "proof of concept", and it is version 0.5. That doesn't sound like the author considers this a mature product.) - How is this different from data classes? If the answer is that this supports iteration, why not add iteration to data classes? See this thread: https://mail.python.org/pipermail/python-ideas/2018-August/052683.html [...]

1. Focus on getting and meeting the expressed needs of users. A link from the Python docs will help here.

It's not the job of the Python docs to link to every and any third-party package that somebody might find useful. It might -- perhaps -- make sense for the docs to mention or link to third-party libraries such as numpy which are widely recognised as "best of breed". (Not that numpy needs a link from the std lib.) But in general it is hardly fair for us to single out some arbitrary third- party libraries for official recognition while other libraries, perhaps better or more worthy, are ignored. Put yourself in the shoes of somebody who has worked hard to get a package into a mature state, and then the Python docs start linking to a competing alpha-quality package just because by pure chance, that was the package that got mentioned on Python-Ideas first. -- Steve

Reply

Sign in to reply online Use email software

Martin Bammer

1:56 p.m.

Hi, then intention of my first mail was to start a discussion about this topic about the pros and cons and possible alternatives. As long as it is not clear that recordclass or something like that is accepted to be implemented to the collections module I do not want to spend any effort on this. My wish that the collections module gets something like namedtuple, but writable, is based on my personal experience when projects are becoming bigger and data structures more complex it is sometimes useful to named items and not just an index. This improves the readability and makes development and maintenance of the code easier. Another important topic for me is performance. When I write applications then they should finish their tasks quickly. The performance of recordclass was one reason for me to use it (some benchmarks with Python 2 can be found on here https://gist.github.com/grantjenks/a06da0db18826be1176c31c95a6ee572). I've done some more recent and additional benchmarks with Python 3.7 on Linux which you can find here https://github.com/brmmm3/compare-recordclass. These new benchmarks show that namedtuple is as fast as recordclass in all cases, but with named attribute access. Named attribute access is faster with recordclass. Compared to dataclass: dataclass wins only on the topic object size. When it comes to speed and functionality (indexing, sorting) dataclass would be my last choice. Yes it is possible to make dataclass fast by using __slots__, but this always an extra programming effort. namedtuple and recordclass are easy to use with small effort. Adding new items: This is not possible with namedtuple and also not possible with recordclass. I see no reason why a namedlist should support this, because with these object types you define new object types and these types should not change. I hope 3.8 will get a namedlist and maybe it will be the recordclass module (currently my choice). As the author of this module already has responded to this discussion I hope he willing to contribute his code to the Python project. Best regards, Martin

Reply

Sign in to reply online Use email software

Steven D'Aprano

4:49 p.m.

On Sun, Sep 02, 2018 at 10:56:50PM +0200, Martin Bammer wrote:

Compared to dataclass: dataclass wins only on the topic object size. When it comes to speed and functionality (indexing, sorting) dataclass would be my last choice.

I see no sign that recordclass supports sorting. (But I admit that I haven't tried it.) What would it mean to sort a recordclass? Person = recordclass('Person', 'personalname familyname address') fred = Person("Fred", "Anderson", "123 Main Street") fred.sort() print(fred) => output: Person(personalname='123 Main Street', familyname='Anderson', address='Fred') [...]

Adding new items: This is not possible with namedtuple and also not possible with recordclass. I see no reason why a namedlist should support this,

If you want to change the name and call it a "list", then it needs to support the same things that lists support.

because with these object types you define new object types and these types should not change.

Sorry, I don't understand that. How do you get "no insertions" from "can't change the type"? A list remains a list when you insert into it. In case it isn't clear, I think there is zero justification for renaming recordclass to namedlist. I don't think "named list" makes sense as a concept, and recordclass surely doesn't implement a list-like interface. As for the idea of adding a recordclass or mutable-namedtuple or whatever to the stdlib, the idea seems reasonable but its not clear to me that dataclass wouldn't be suitable. -- Steve

Reply

Sign in to reply online Use email software

Jacco van Dorp

12:23 a.m.

This feels really useful to me to make some quick changes to a database - perhaps a database layer could return an class of type Recordclass, and then you just simply mutate it and shove it back into the database. Pseudocode: record = database.execute("SELECT * FROM mytable WHERE primary_key = 15") record.mostRecentLoggedInTime = time.time() database.execute(f"UPDATE mytable SET mostRecentLoggedInTime = {record.mostRecentLoggedInTime} WHERE primary_key = {record.primary_key}":) Or any smart database wrapper might just go: database.updateOrInsert(table = mytable, record = record) And be smart enough to figure out that we already have a primary key unequal to some sentinel value like None, and do an update, while it could do an insert if the primary key WAS some kind of sentinel value. which is something I really wanted to do in the past with namedTuples, but had to use dicts for instead. Also, it's rather clear that namedList is a really bad name for a Recordclass. It's cleary not intended to be a list. It's a record you can take out from somewhere, mutate, and push back in. We often use namedTuples as records now, but we can't just mutate those to shove them back in - you have to make new ones, and unless you write a smart wrapper for database handling yourself, you can't just shove them in either. Recordclass could be the gateway drug to a smart database access layer that reduces the amount of SQL we need to write - and that's a good thing in my opinion.

Reply

Sign in to reply online Use email software

Jonathan Goble

September 2018

7:41 a.m.

On Mon, Sep 3, 2018 at 3:25 AM Jacco van Dorp <j.van.dorp@deonet.nl> wrote:

So call it "namedrecord", perhaps?

Reply

Sign in to reply online Use email software

Wes Turner

7:59 a.m.

On Mon, Sep 3, 2018 at 3:25 AM Jacco van Dorp <j.van.dorp@deonet.nl> wrote:

SQLAlchemy.orm solves for this (with evented objects with evented attributes): http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#sessi... - Transient, Pending, Persistent, Deleted, Detached http://docs.sqlalchemy.org/en/latest/orm/session_api.html#sqlalchemy.orm.att... - flag_modified isn't necessary in most cases because attribute mutation on mapped classes deriving from Base(declarative_base()) is evented http://docs.sqlalchemy.org/en/latest/orm/session_events.html#attribute-chang... http://docs.sqlalchemy.org/en/latest/orm/tutorial.html There are packages for handling attribute states with the Django ORM, as well: - https://github.com/romgar/django-dirtyfields - https://github.com/Suor/django-dirty What would be the performance impact of instead subclassing from recordclass? IDK. pyrsistent.PRecord(PMap) is immutable and supports .attribute access: https://github.com/tobgu/pyrsistent#precord

Reply

Sign in to reply online Use email software

Chris Angelico

8:16 a.m.

On Mon, Sep 3, 2018 at 5:23 PM, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:

In its purest form, what you're asking for is an "upsert" or "merge" operation: https://en.wikipedia.org/wiki/Merge_(SQL) In a multi-user transactional database, there are some fundamentally hard problems to implementing a merge. I'm not 100% certain, so I won't say "impossible", but it is certainly *extremely difficult* to implement an operation like this in application-level software without some form of race condition. ChrisA

Reply

Sign in to reply online Use email software

Wes Turner

8:31 a.m.

On Mon, Sep 3, 2018 at 4:17 AM Chris Angelico <rosuav@gmail.com> wrote:

http://docs.sqlalchemy.org/en/latest/orm/contextual.html#contextual-thread-l... - scoped_session http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#mergi... http://docs.sqlalchemy.org/en/latest/orm/session_basics.html obj = ExampleObject(attr='value') assert obj.id is None session.add(obj) session.flush() assert obj.id is not None session.commit()

Reply

Sign in to reply online Use email software

Chris Angelico

8:39 a.m.

On Mon, Sep 3, 2018 at 6:31 PM, Wes Turner <wes.turner@gmail.com> wrote:

Yep. What does it do if it's on a back-end database that doesn't provide a merge/upsort intrinsic? What if you have a multi-column primary key? There are, of course, easier sub-forms of this (eg you mandate that the PK be a single column and be immutable), but if there is any chance that any other client might simultaneously be changing the PK of your row, a perfectly reliable upsert/merge basically depends on the DB itself providing that functionality. ChrisA

Reply

Sign in to reply online Use email software

Wes Turner

9:22 a.m.

On Mon, Sep 3, 2018 at 4:40 AM Chris Angelico <rosuav@gmail.com> wrote:

On Mon, Sep 3, 2018 at 6:31 PM, Wes Turner <wes.turner@gmail.com> wrote:

...
On Mon, Sep 3, 2018 at 4:17 AM Chris Angelico <rosuav@gmail.com> wrote:

...
On Mon, Sep 3, 2018 at 5:23 PM, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:

...
This feels really useful to me to make some quick changes to a

database

...
...
...
- perhaps a database layer could return an class of type Recordclass, and then you just simply mutate it and shove it back into the database. Pseudocode:

record = database.execute("SELECT * FROM mytable WHERE primary_key = 15") record.mostRecentLoggedInTime = time.time() database.execute(f"UPDATE mytable SET mostRecentLoggedInTime = {record.mostRecentLoggedInTime} WHERE primary_key = {record.primary_key}":)

Or any smart database wrapper might just go:

database.updateOrInsert(table = mytable, record = record)

And be smart enough to figure out that we already have a primary key unequal to some sentinel value like None, and do an update, while it could do an insert if the primary key WAS some kind of sentinel value.

In its purest form, what you're asking for is an "upsert" or "merge" operation:

https://en.wikipedia.org/wiki/Merge_(SQL)

In a multi-user transactional database, there are some fundamentally hard problems to implementing a merge. I'm not 100% certain, so I won't say "impossible", but it is certainly *extremely difficult* to implement an operation like this in application-level software without some form of race condition.

http://docs.sqlalchemy.org/en/latest/orm/contextual.html#contextual-thread-l...

...
- scoped_session

http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#mergi...

...
http://docs.sqlalchemy.org/en/latest/orm/session_basics.html

obj = ExampleObject(attr='value') assert obj.id is None session.add(obj) session.flush() assert obj.id is not None session.commit()

Yep. What does it do if it's on a back-end database that doesn't provide a merge/upsort intrinsic? What if you have a multi-column primary key? There are, of course, easier sub-forms of this (eg you mandate that the PK be a single column and be immutable), but if there is any chance that any other client might simultaneously be changing the PK of your row, a perfectly reliable upsert/merge basically depends on the DB itself providing that functionality.

There's yet another argument for indeed, immutable surrogate primary keys. With appropriate foreign key constraints, changing any part of the [composite] PK is a really expensive operation because all references must also be updated (w/ e.g. ON UPDATE CASCADE), and that doesn't fix e.g. existing URLs or serialized references in cached JSON documents. Far better, IMHO, to just enforce a UNIQUE constraint on those column(s). UUIDs don't require a central key allocation service (such as AUTOINCREMENT, which is now fixed in MySQL AFAIU);. Should the __hash__() of a recordclass change when attributes are modified? http://www.attrs.org/en/stable/hashing.html has a good explanation. In general, neither .__hash__() nor id(obj) are good candidates for a database primary key because when/if there are collisions (birthday paradox) -- e.g. when an INSERT or UPSERT or INSERT OR REPLACE fails -- it has to change. Sorry getting OT, something like COW immutability is actually desirable with SQL databases, too. Database backups generally require offline intervention in order to rollback; if there's even a backup which contains those transactions. https://en.wikipedia.org/wiki/Temporal_database#Implementations_in_notable_p... (SELECT, ) https://django-reversion.readthedocs.io/en/stable/

Reply

Sign in to reply online Use email software

Zaur Shibzukhov

September 2018

6:24 p.m.

As the author of `recordclass` I would like to shed some light... Recorclass originated as a response to the [question](https://stackoverflow.com/questions/29290359/existence-of-mutable-named-tupl...) on stackoverflow. `Recordclass` was conceived and implemented as a type that, by api, memory and speed, would be completely identical to` namedtuple`, except that it would support an assignment in which any element could be replaced without creating a new instance, as in ` namedtuple`. Those. would be almost identical to `namedtuple` and support the assignment (` __setitem__` / `setslice__`). The effectiveness of namedtuple is based on the effectiveness of the `tuple` type in python. In order to achieve the same efficiency it was necessary to create a type `memoryslots`. Its structure (`PyMemorySlotsObject`) is identical to the structure of` tuple` (`PyTupleObject`) and therefore takes up the same amount of memory as` tuple`. `Recordclass` is defined on top of` memoryslots` just like `namedtuple` above` tuple`. Attributes are accessed via a descriptor (`itemgetset`), which supports both` __get__` and `__set__` by the element index. The class generated by `recordclass` is: `` ` from recordclass import memoryslots, itemgetset class C (memoryslots): __slots__ = () _fields = ('attr_1', ..., 'attr_m') attr_1 = itemgetset (0) ... attr_m = itemgetset (m-1) def __new __ (cls, attr_1, ..., attr_m): 'Create new instance of {typename} ({arg_list})' return memoryslots .__ new __ (cls, attr_1, ..., attr_m) `` ` etc. following the `namedtuple` definition scheme. As a result, `recordclass` takes up as much memory as` namedtuple`, it supports quick access by `__getitem__` /` __setitem__` and by attribute name via the protocol of the descriptors. Regards, Zaur суббота, 1 сентября 2018 г., 10:48:07 UTC+3 пользователь Martin Bammer написал:

Reply

Sign in to reply online Use email software

Greg Ewing

11:09 p.m.

Zaur Shibzukhov wrote:

I'm not sure why you need a new C-level type for this. Couldn't you get the same effect just by using __slots__? e.g. class C: __slots__ = ('attr_1', ..., 'attr_m') def __new __ (cls, attr_1, ..., attr_m): self.attr_1 = attr_1 ... self.attt_m = attr_m -- Greg

Reply

Sign in to reply online Use email software

Zaur Shibzukhov

6:17 p.m.

понедельник, 3 сентября 2018 г., 2:11:06 UTC+3 пользователь Greg Ewing написал:

slow. So if you don't need fast access by index but only by name then using __slots__ is enough. Recordclass is actually a fixed array with named access to the elements in the same manner as namedtuple is a actually a tuple with named access to it's elements.

Reply

Sign in to reply online Use email software

2389

Age (days ago)

2391

Last active (days ago)

Download

15 comments

9 participants

tags

participants (9)

Chris Angelico
Greg Ewing
Jacco van Dorp
Jonathan Fine
Jonathan Goble
Martin Bammer
Steven D'Aprano
Wes Turner
Zaur Shibzukhov