[Python-ideas] Application awareness of memory storage classes

Tue May 24 10:17:55 EDT 2016

On Tue, 24 May 2016 09:59:42 +0200, "M.-A. Lemburg" <mal at egenix.com> wrote:
> As soon as you have memory in use which is not fully managed
> by Python, I don't think there's any way to implement
> transactions on memory in a meaningful way. The possible side
> effect in the unmanaged blocks would render such transactions
> meaningless, since a rollback in those would still leave you
> with the changes in the unmanaged blocks (other parts of the
> system).

In this case all the memory will "managed" at the direction of the Python
program (and the extension module).  The issue is that while we have
transactions on the NVRAM objects, the regular python objects don't get
their state restored if the transaction block aborts.  Which is part of
why I was wondering about what it might look like to integrate awareness
of storage classes into the language itself.

> Now, back on topic: for writing to NVRAM, having a transaction
> mechanism in place does make sense, but it would have to
> be clear that only the bits stored in NVRAM are subject
> to the transaction.

Yes, exactly.

> The reason here being that a failure while writing to NVRAM
> could potentially cause your machine to no longer boot.

I think you misunderstand.  We're not talking about "regular" NVRAM,
we're talking about memory banks that are exposed to user space via a DAX
driver that uses file system semantics to set up the mapping from user
space to the NVRAM, but after that some kernel magic allows the user space
program to write directly to the NVRAM.  We're not doing this with the
NVRAM involved in booting the machine, it is separate dedicated storage.

> For volatile RAM, at worst, the process will die, but not have
> much effect on other running parts of the system, so there
> is less incentive to have transactions (unless, of course,
> you are deep into STM and want to work around the GIL :-)).

STM is a different approach, and equally valid, but not the one the
underlying library takes.

> Given that Armin Rigo has been working on STM for years,
> I'd suggest to talk to him about challenges and solutions
> for transactions on memory.

He's looking at what we might call the reverse of the type of
transaction I'm dealing with.  An STM transaction makes all changes
pending, and throws them away on conflict.  Our transaction makes all
changes immediately, and *rolls them back* on *failure*.  No conflicts
are involved, so the things you have to worry about are different from
the things you have to worry about in the STM case.  I'm sure there
are some commonalities, so it may well be worth talking to Armin, since
he's thought deeply about this stuff.  I'm being handed the transaction
machinery by the underlying library, though, so I "only" have to think
about how it impacts the Python level :)

> My take on all this would be to work with NVRAM as block
> rather than single memory cells:
> 
> allocate a lock on the NVRAM block
> try:
>  copy the block into DRAM
>  run manipulations in DRAM block
>  write back DRAM block
> finally:
>  release lock on NVRAM block
> 
> so instead of worrying about a transaction failing while
> manipulating NVRAM, you only make sure that you can lock
> the NVRAM block and provide an atomic "block write to NVRAM"
> functionality.

Which is the reverse of what the library actually does.  It copies the
existing data into an NVRAM rollback log, and then makes the changes
to the visible memory (that is, the changes are immediately visible to
all threads).  The rollback log is then used to undo those changes if
the transaction fails.  And yes, this means that you need locks around
your persistent object updates when doing threaded programming, as I
mentioned in my original post.

I'm personally also interested in the STM-style case, since that allows
you to write multi-access, potentially distributed, DB-like applications.
However, that's not what this particular project is about.  A language
that is supporting persistent storage should support both models, I think,
because both are useful for different applications.  But the primary
difference is what happens during a transaction, so at the language
syntax level there is probably no difference.

I guess that means there are two different classes of persistent memory
from the application's point of view, even if they can be backed by the
same physical memory: rollback persistent, and STM persistent.

--David