anydbm safe for simultaneous writes?
Eric S. Johansson
esj at harvee.org
Sat Mar 1 06:28:35 CET 2008
> I need simple data persistence for a cgi application that will be used
> potentially by multiple clients simultaneously. So I need something
> that can handle locking among writes. Sqlite probably does this, but
> I am using Python 2.4.4, which does not include sqlite. The dbm-style
> modules would probably be fine, but I have no idea if they are "write
> safe" (I have no experience with the underlying unix stuff). Any tips
the often repeated answer that you need locking is correct but an incomplete
answer. it really depends on which DBM you are using. If you are using a
fairly recent bsdbm (a.k.a. sleepy cat) it does have the kind of lucky needs to
fairly complex transactions. Unfortunately, the API is a sufficiently
unintelligible that it will take more than an afternoon to figure out how to
even start to use it.
gdbm is a nice DBM that permits single writer/multiple readers. If you open a
DBM for read, any writer blocks. You open it for read and some times multiple
readers can get in but not always (or at least that's the way it seems here in
practice). when the DBM is busy, you will get an exception with an error value
of: (11, 'Resource temporarily unavailable'). Just busy wait until this
exception goes away and you'll get access to the DBM file. Yes, this officially
sucks but at least it's a workaround for the problem.
another way to solve this particular problem with DBM files is to stick inside a
Pyro daemon. Performance won't be too bad and you should be able to get it
working relatively easily. I will warn you that the RPC model for Pyro does
take some getting used to if you're familiar with more traditional RPC
environments. Once you wrap your head around the Pyro model, it's pretty nice.
If you want, I can send you a copy of my Pyro daemon I use to wrap a DBM so I
don't have to worry about multiple processes accessing the same DBM.
the one thing that really bothers me about the DBM interfaces is that the two
main DBM's are really quite full-featured but the documentation presents a very
sketchy description of what they support and how. As a result, I suspect that
DBMS don't get used as often as they could and people are pushed into more
complex databases because they don't understand what DBM's are capable of.
Other folks have recommended some form of SQL and while SQL light is a very nice
small database, personally, I find SQL unintelligible and I have lost more days
than I care to think about trying to figure out how to do something in SQL. As
result, I tend to go more towards databases such as metakit and buzhug
(http://buzhug.sourceforge.net/). the former is like gdbm and only handles a
single writer. It's really intended for single process use but I don't know if
you can put it in a Pyro controlled deamon. The latter looks pretty interesting
because the documentation implies that it supports concurrent access on a per
record level (menu item: concurrency control).
Given that I'm currently replacing a DBM for very much the same reason you are,
I'm going to try using buzhug rather than facing SQL again. I would be glad to
compare notes with you if you care to go the same route. Just let me know off list.
I wish you the best of luck in your project.
Speech-recognition in use. It makes mistakes, I correct some.
More information about the Python-list