castironpi at gmail.com
Sun Jun 14 10:27:29 EDT 2009
Hi, please forgive the multi-posting on this general topic.
Some time ago, I recommended a pursuit of keeping 'persistent
composite' types on disk, to be read and updated at other times by
other processes. Databases provide this functionality, with the
exception that field types in any given table are required to be
uniform. Python has no such restriction.
I tried out an implementation of composite collections, specifically
lists, sets, and dicts, using 'sqlite3' as a persistence back-end.
It's significantly slower, but we might argue that attempting to do it
by hand classifies as a premature optimization; it is easy to optimize
The essentials of the implementation are:
- each 'object' gets its own table.
= this includes immutable types
- a reference count table
= when an object's ref. count reaches zero, its table is dropped
- a type-map table
= maps object's table ID to a string of its type
- a single 'entry point' table, with the table ID of the entry-point
= the entry point is the only data structure available to new
connections. (I imagine it will usually be a list or dict.)
I will be sure to kill any interest you might have by now, by
"revealing" a snippet of code.
The object creation procedure:
def new_table( self, type ):
''' 'type' is a string, the name of the class the object is an
instance of '''
cur= self.conn.cursor( )
recs= cur.execute( '''SELECT max( tableid ) FROM refcounts''' )
rec= cur.fetchone( )
if rec[ 0 ] is None:
obid= rec[ 0 ]+ 1
cur.execute( '''INSERT INTO types VALUES( ?, ? )''', ( obid,
type ) )
cur.execute( '''INSERT INTO refcounts VALUES( ?, ? )''', ( obid,
1 ) )
The increment ref. count procedure:
def incref( self, obid ):
cur= self.conn.cursor( )
recs= cur.execute( '''SELECT count FROM refcounts WHERE tableid
= ?''', ( obid, ) )
rec= recs.fetchone( )
newct= rec[ 0 ]+ 1
cur.execute( '''UPDATE refcounts SET count = ? WHERE tableid = ?''',
( newct, obid ) )
The top-level structure contains these two procedures, as well as
'setentry', 'getentry', and 'revive' procedures.
Most notably, user-defined types are possible. The dict is merely a
persistent dict. 'revive' checks the global namespace by name for the
original type, subject to the same restrictions that we all know and
love that 'pickle' has.
As usual, deadlocks and cyclic garbage pose the usual problems. The
approach I used last time was to maintain a graph of acquired locks,
and merely check for cycles to avert deadlocks, which would go in a
separate table. For garbage, I can't find a better solution than
Python already uses.
>From the 3.0 docs:
A list of objects which the collector found to be unreachable but
could not be freed (uncollectable objects).
Python doesn’t collect such [garbage] cycles automatically because, in
general, it isn’t possible for Python to guess a safe order in which
to run the __del__() methods. If you know a safe order, you can force
the issue by examining the garbage list, and explicitly breaking
cycles due to your objects within the list.
Before I go and flesh out the entire interfaces for the provided
types, does anyone have a use for it?
More information about the Python-list