On 17 May 2016 at 10:35, R. David Murray
Feel free to tell me this is the wrong forum, and if so we can a new forum to continue the conversation if enough people are interested.
I think it's the right forum, although a SIG may turn out to be appropriate eventually (then again, there are plenty of arcane topics where we've managed to get by without ever creating a dedicated SIG). [snip background info]
At the language level, support would mean two things, I think: there would need to be a way to declare which backing store an object belongs to, and there would need to be a notion of a memory hierarchy where, at least in the case of persistent vs non-persistent RAM, an object higher in the hierarchy could not point to an object lower in the hierarchy that was mutable, and that immutables would be copied from lower to higher. (Here I'm notionally defining "lower" as "closer to the CPU", since RAM is 'just there' whereas persistent memory goes through a driver layer before direct access can get exposed, and fast DRAM would be closer yet to the CPU :).
In CPython I envision this being implemented by having every object be associated with a 'storage manager', with the DRAM storage manager obviously being the default. I think that which storage manager an object belongs to can be deduced from its address at runtime, although whether that would be better than storing a pointer is an open question. Object method implementations would then need to be expanded as follows: (1) wrapping any operations that comprise an 'atomic' operation with a 'start transaction' and 'end transaction' call on the storage manager. (2) any memory management function (malloc, etc) would be called indirectly through the storage manager, (3) any object pointer to be stored or retrieved would be passed through an appropriate call on the storage manager to give it an opportunity to block it or transform it, and (4) any memory range to be modified would be reported to the memory manager so that it can be registered with the transaction.
I *think* that's the minimum set of changes. Clearly, however, making such calls is going to be less efficient in the DRAM case than the current code, even though the RAM implementation would be a noop.
The three main multiple-kinds-of-memory systems I'm personally familiar with are heap management in DSP/BIOS (now TI-RTOS), C++ memory allocators, and Rust's tiered memory management model (mainly for thread locals vs global heap objects). The DSP/BIOS model is probably too simplistic to help us out - it's just "allocate this memory on that heap", and then you're own your own at the semantic level. Rust's tiered memory management is really nice, but also completely foreign to the C-style memory management model that underpins CPython. However, it does suggest "thread local" as an allocator type worth considering along with "normal process heap" and "non-volatile storage". That leaves the C++ custom object allocator model, and if I recall correctly, allocators there are managed at the *type* level - if you want to use a different allocator, you need to declare a new type. For Python, I'd suggest looking at going one step further, and associating storage management with the *metaclass* - the idea there would be to take advantage of the metaclass conflict if someone attempts to combine data types with conflicting storage management expectations. So you might have "PersistentType", which would itself be a normal type instance (stored in RAM), and the classes it created would also be stored in RAM, but they'd be configured to handle the storage for individual instances differently from normal classes. I'm not sure if the current descriptor protocol and the GC management machinery would give you sufficient access to control all the things you care about for this purpose, but it should let you control quite a lot of it, and trying it out would give you more info on what's currently missing. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia