OT Re: Parallel insert to postgresql with thread
erik at myemma.com
Fri Oct 26 06:04:49 CEST 2007
If you're not Scott Daniels, beware that this conversation has gone
horribly off topic and, unless you have an interest in PostreSQL, you
may not want to bother reading on...
On Oct 25, 2007, at 9:46 PM, Scott David Daniels wrote:
> Erik Jones wrote:
>> On Oct 25, 2007, at 7:28 AM, Scott David Daniels wrote:
>>> Diez B. Roggisch wrote:
>>>> Abandoned wrote:
>>>>> I use the threading module for the fast operation. But ....
>>> [in each thread]
>>>>> def save(a,b,c):
>>>>> cursor.execute("INSERT INTO ...
>>>>> How can i insert data to postgresql the same moment ?...
>>>> DB modules aren't necessarily thread-safe. Most of the times, a
>>>> connection (and ... cursor) can't be shared between threads.
>>>> So open a connection for each thread.
>>> Note that your DB server will have to "serialize" your inserts, so
>>> ... a single thread through a single connection to the DB is the way
>>> to go. Of course it (the DB server) may be clever enough to behave
>>> "as if" they are serialized, but most of your work parallelizing at
>>> your end simply creates new work at the DB server end.
>> Fortunately, in his case, that's not necessarily true.... If he
>> goes the recommended route with a separate connection for each
>> then Postgres will not serialize multiple inserts coming from
>> connections unless there is something like and ALTER TABLE or REINDEX
>> concurrently happening on the table.
>> The whole serialized inserts thing is strictly something popularized
>> by MySQL and is by no means necessary or standard (as with a lot of
> But he commits after every insert, which _does_ force serialization
> only to provide safe transaction boundaries). I understand you can
> clever at how to do it, _but_ preserving ACID properties is exactly
> I mean by "serialize,"
First, bad idea to work with your own definition of a very domain
specific and standardized term. Especially when Postgres's Multi-
Version Concurrency Control mechanisms are designed specifically for
the purpose of preserve ACID compliance without forcing serialized
transactions on the user.
Second, unless he specifically sets his transaction level to
serializable, he will be working in read-committed mode. What this
specifically means is that two (or more) transactions writing to the
same table will not block any of the others. Let's say the user has
two concurrent inserts to run on the same table that, for whatever
reason, take a while to run (for example, they insert the results of
some horribly complex or inefficient select), if either is run in
serializable mode then which ever one starts a fraction of a second
sooner will run until completion before the second is even allowed to
begin. In (the default) read-committed mode they will both begin
executing as soon as they are called and will write their data
regardless of conflicts. At commit time (which may be sometime later
for transactions with multiple statements are used) is when conflicts
are resolved. So, if between the two example transactions there does
turn out to be a conflict betwen their results, whichever commits
second will roll back and, since the data written by the second
transaction will not be marked as committed, it will never be visible
to any other transactions and the space will remain available for
Here's the relevant portion of the Postgres docs on all of this:
> and while I like to bash MySQL as well as the
> next person, I most certainly am not under the evil sway of the vile
> MySQL cabal.
Good to hear ;)
> The server will have to be able to abort each transaction
> _independently_ of the others, and so must serialize any index
> updates that share a page by, for example, landing in the same node
> of a B-Tree.
There is nothing inherent in B-Trees that prevents identical datum
from being written in them. If there was the only they'd be good for
would be unique indexes. Even if you do use a unique index, as noted
above, constraints and conflicts are only enforced at commit time.
Software Developer | Emma®
erik at myemma.com
800.595.4401 or 615.292.5888
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
More information about the Python-list