[Python-Dev] const data (was: Unicode patches checked in)

Greg Stein gstein@lyra.org
Thu, 16 Mar 2000 04:08:43 -0800 (PST)


On Wed, 15 Mar 2000, Vladimir Marangozov wrote:
> > [me]
> > > 
> > > Perhaps it would make sense to move the Unicode database on the
> > > Python side (write it in Python)? Or init the database dynamically
> > > in the unicodedata module on import? It's quite big, so if it's
> > > possible to avoid the static declaration (and if the unicodata module
> > > is enabled by default), I'd vote for a dynamic initialization of the
> > > database from reference (Python ?) file(s).
> 
> [Marc-Andre]
> > 
> > The unicodedatabase module contains the Unicode database
> > as static C data - this makes it shareable among (Python)
> > processes.
> 
> The static data is shared if the module is a shared object (.so).
> If unicodedata is not a .so, then you'll have a seperate copy of the
> database in each process.

Nope. A shared module means that multiple executables can share the code.
Whether the const data resides in an executable or a .so, the OS will map
it into readonly memory and share it across all procsses.

> > Python modules don't provide this feature: instead a dictionary
> > would have to be built on import which would increase the heap
> > size considerably. Those dicts would *not* be shareable.
> 
> I haven't mentioned dicts, have I? I suggested that the entries in the
> C version of the database be rewritten in Python (or a text file)
> The unicodedata module would, in it's init function, allocate memory
> for the database and would populate it before returning "import okay"
> to Python -- this is one way to init the db dynamically, among others.

This would place all that data into the per-process heap. Definitely not
shared, and definitely a big hit for each Python process.

> As to sharing the database among different processes, this is a classic
> IPC pb, which has nothing to do with the static C declaration of the db.
> Or, hmmm, one of us is royally confused <wink>.

This isn't IPC. It is sharing of some constant data. The most effective
way to manage this is through const C data. The OS will properly manage
it.

And sorry, David, but mmap'ing a file will simply add complexity. As jcw
mentioned, the OS is pretty much doing this anyhow when it deals with a
const data segment in your executable.

I don't believe this is Linux specific. This kind of stuff has been done
for a *long* time on the platforms, too.

Side note: the most effective way of exposing this const data up to Python
(without shoving it onto the heap) is through buffers created via:
   PyBuffer_FromMemory(ptr, size)
This allows the data to reside in const, shared memory while it is also
exposed up to Python.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/