[Python-ideas] Application awareness of memory storage classes

Wed May 18 13:11:20 EDT 2016

On Tue, 17 May 2016 17:07:59 -0400, Terry Reedy <tjreedy at udel.edu> wrote:
> On 5/16/2016 8:35 PM, R. David Murray wrote:
> > I'm currently working on a project for Intel involving Python and directly
> > addressable non-volatile memory.  See https://nvdimm.wiki.kernel.org/
> > for links to the Linux features underlying this work, and http://pmem.io/
> > for the Intel initiated open source project I'm working with whose intent
> > is to bring support for DAX NVRAM to the application programming layer
> > (nvml currently covers C, with C++ support in process, and there are
> > python bindings for the lower level (non-libpmemobj) libraries).
> 
> How is non-volatile NVRAM different from static SRAM?

NVRAM retains the data even if it loses power.  However, the programming
issues involved in using direct memory access to battery backed SRAM
should be similar, I think.

> > tldr: In the future (and the future is now) application programs will
> > want to be aware of what class of memory they are allocating:
> 
> What I want is to be, if anything, less aware.  I remember fiddling with 
> register declarations in C.  Then is was discovered that compilers can 
> allocate registers better than move people, so that 'register' is 
> deprecated for most C programmers.  I have never had to worry about the 
> L1, L2, L3 on chip caches, though someone has to.

Yes, and I think for the hypothetical "fast memory" class that's
probably what a dynamic language like python would ideally want to do,
even if compiled languages didn't.  So really I'm talking about NVRAM,
the other was just a hypothetical example of another class of memory
besides normal DRAM.

For RAM of whatever flavor that retains its value between program
invocations, the programmer has to be aware that the data is persistent
and program accordingly.

Indeed, to some degree it does not matter what the persistent store
is, the issue is programming for persistence.  The thing that NVRAM
brings in to the picture is the motivation to do persistence *not*
via serialization and deserialization, but via direct random access to
memory that retains its state.  Adding support for persistence to the
language is actually more generically useful than just the NVRAM realm
(consider the hoops the ZODB has to go through to support persistence
of Python objects, for example).  For current persistence schemes the
persistence is file-system based, and the time costs of serialization
are swamped by the time costs of the file system access (even when the
file system is as fast as possible and/or SRAM or NVRAM based).  What is
different now, and what makes thinking about this now worthwhile, is
that the *direct* access (ie: not driver mediated) to the NVRAM memory
locations makes the time cost of serialization swamp the time costs of
accessing the persistent data.

> I have long thought that I should be able to randomly access data on 
> disk the same way I would access that same data in RAM, and not have to 
> fiddle with seek and so on.  Virtual memory is sort of like this, except 

mmap + memoryview allows you to do this (in python3), does it not?

> that it uses the disk as a volatile* cache for RAM objects.  (* Volatile 
> in the sense that when the program ends, the disk space is freed for 
> other use, and is inaccessible even if not.)  Whereas I am thinking of 
> using RAM as a cache for a persistent disk object.  A possible user API 
> (assuming txt stored as a python string with k bytes per char):
> 
> cons = str(None, path="c:/user/terry/gutenburg/us_constitution.txt")
> # now to slice like it was in ram
> preamble = cons[:cons.find(section_marker)]
> 
> Perhaps you are pointing to being able to make this possible, from the 
> implementer side.
> 
> The generic interfaces would be bytes(None, path=) (read only) and 
> bytearray(None, path=) (read-write).

This is already possible by using the pynvm bindings to nvml and
memoryviews, but it would indeed be interesting to provide a more
convenient API, and we've discussed this a bit.  There wouldn't be much
motivation for any changes to python for that level of support, though,
since it could be provided by a couple of new bytes-like classes
defined in an external module.

> A list does not seem like a good candidate for static mem, unless insert 
> and delete are suppressed/unused.

Why not?  The point isn't that the memory is *static*, it's that it is
*persistent*.  So whatever state your objects are in when your program
ends, that's the state they are in when you next connect your program
to that pool of objects.  It's perfectly sensible to want to update a
list and have your changes persist, that's why the ZODB, for example,
provides a PersistentList class.

> If static objects were **always** aligned in 4-byte boundaries, then the 
> lowest 2 bits could be used to indicate memory type.  To not slow down 
> programs, this should be supported by the CPU address decoder.  Isn't 
> Intel thinking/working on something like this?

That's an interesting thought, thanks :)  I'm not clear how the CPU
address decoder would support Python in this context, but I'll
ask the Intel folks if there's anything relevant.

--David