[ python-Bugs-881522 ] Shelve slow after 7/8000 key
SourceForge.net
noreply at sourceforge.net
Thu Jan 22 19:16:08 EST 2004
Bugs item #881522, was opened at 2004-01-21 17:09
Message generated for change (Comment added) made by jkew
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=881522&group_id=5470
Category: Windows
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Marco Beri (marcoberi)
Assigned to: Gregory P. Smith (greg)
Summary: Shelve slow after 7/8000 key
Initial Comment:
After about 8.000 insertion shelve became really, really
slow.
This happens only with 2.3.3 #51 on Windows, not with
2.2 and with 2.3 on Linux.
I try with writeback True or False: same problem.
Help! :-))
----------------------------------------------------------------------
Comment By: James Kew (jkew)
Date: 2004-01-23 00:16
Message:
Logged In: YES
user_id=598066
FWIW2, on skip's "miserable hack" comment below, vis-a-vis
running shelve on btree: isn't this exactly the sort of thing
shelve.Shelf is intended for?
import bsddb
import shelve
db = bsddb.btopen("temp.db")
sh = shelve.Shelf(db)
# do stuff with sh
sh.close()
# automatically calls close() on the underlying db
(Not sure why Shelf and friends are documented on
shelve's "Restrictions" subsection...)
----------------------------------------------------------------------
Comment By: Marco Beri (marcoberi)
Date: 2004-01-23 00:08
Message:
Logged In: YES
user_id=588604
I get your same results under normal cmd: 7.07 seconds vs
0.46.
[c:\tmp]timer & \python23\python test3skip.py hashopen &
timer
Timer 1 on: 19.13.22
Timer 1 off: 19.13.29 Elapsed: 0.00.07,07
[c:\tmp]timer & \python23\python test3skip.py btopen & timer
Timer 1 on: 19.13.45
Timer 1 off: 19.13.45 Elapsed: 0.00.00,46
----------------------------------------------------------------------
Comment By: James Kew (jkew)
Date: 2004-01-22 23:53
Message:
Logged In: YES
user_id=598066
FWIW, to throw another use case into the pot: I (used to)
run Roundup (roundup.sf.net) trackers on anydbm/Win2K and
experienced a significant drop in performance between 2.2.x
(bsddb185) and 2.3.x (dbhash).
I understand that this is a third-party issue, and that there
were significant known problems with bsddb 1.85, but it did
cause me a bit of a double-take after having heard so much
about Python 2.3 being faster...
I say "used to" because the slowdown prompted me to
migrate to Roundup's sqlite backend, solving my problem.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2004-01-22 21:11
Message:
Logged In: YES
user_id=44345
If we wanted speed and didn't care about corruption, my vote
would be bsddb185. ;-)
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-01-22 20:36
Message:
Logged In: YES
user_id=31435
Greg, I didn't expect you to fix it <wink>, I just didn't want
the bug report closed based on misunderstanding what it was
about.
I've unassigned this item, and if nobody volunteers to dig into
it within a few weeks, it should indeed be closed as "3rd
Party" and "Wont Fix
Skip, maybe we should try to force spambayes to use a btree
mapping too -- then maybe we could get a whole new class
of intractable corruption errors <wink -- but it might be a lot
faster>.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2004-01-22 20:28
Message:
Logged In: YES
user_id=44345
Whoops, sorry about polluting the waters with the btree stuff.
Dang time lag.
Looking at just the hashopen times between 2.2, 2.3 and 2.4 does
show that it hash file times have gotten worse since Berkeley 1.85
days.
Whether or not btree times muddy these particular waters,
figuring out a way to switch to a different db type and still use the
shelve module may be Marco's best bet for a short term
performance improvement.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2004-01-22 20:22
Message:
Logged In: YES
user_id=44345
I guess I get similar results on Mac OS X after looking at it a bit.
The differences are just not as dramatic (or disappointing) as they
are on Windows. Here's the output of a little shell script which
runs test3skip.py with various Python interpreters and Berkeley
DB versions:
Python version: (2, 4, 0, 'alpha', 0)
Berkeley DB version: 4.2.4
hashopen: 0m1.621s
btopen: 0m0.608s
Python version: (2, 3, 3, 'final', 0)
Berkeley DB version: 4.2.0
hashopen: 0m1.359s
btopen: 0m0.450s
Python version: (2, 2, 0, 'final', 0)
Berkeley DB version: ???
hashopen: 0m0.514s
btopen: 0m0.202s
Only real (wall clock) times are displayed.
Mario,
Unfortunately, there doesn't seem to be much we can do at this
end to remedy the situation with hash files. If you want to use
shelve but switch to bsddb.btopen as the underlying db file open
call, try posting to comp.lang.python. Anything you do will
probably be a miserable hack, but we can probably figure
something out.
----------------------------------------------------------------------
Comment By: Gregory P. Smith (greg)
Date: 2004-01-22 19:12
Message:
Logged In: YES
user_id=413
python 2.2 and earlier on windows linked against some form
of bsddb 1.85.
python 2.3 and later link against modern BerkeleyDB (not
really related to bsddb 1.85 much at all other than by name
and a legacy api). They are very different libraries with
very different capabilities and performance.
regardless, i don't have a windows development platform
anymore. someone who does, please take this.
i suspect this is not something we can fix. try asking
sleepycat why modern DB_HASH databases might be slower than
bsddb 1.85 hash databases on windows and see what they say.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-01-22 18:56
Message:
Logged In: YES
user_id=31435
The original question is why a BDB hash is some 30x slower
under 2.3 than under 2.2 or 2.1, and that does appear
specific to Windows.
Skip threw btrees into this too, but that complication doesn't
appear relevant to the original report (despite marcoberi's
hearsay 2004-01-21 18:57 comment -- others posted actual
output, making clear that dbhash is used under all Python
versions in test1skip).
I'll note in passing that the test case inserts keys in already-
mostly-sorted order, which is a friendly order for a btree-
based mapping. To get back to the original report, ignore
everything here concerning test3skip and btrees.
----------------------------------------------------------------------
Comment By: Gregory P. Smith (greg)
Date: 2004-01-22 18:32
Message:
Logged In: YES
user_id=413
This problem is not specific to windows. hashopen in the
test3skip.py test case is 10x slower than btopen on my
linux-alpha system.
I don't know why BerkeleyDB hash databases are so much
slower than B-Tree ones. My best suggestion is: if it
hurts, don't do that. Use a btree rather thah hash database.
Running the python process under strace on linux reveals
nothing obvious (no system calls are being made during the
time hash open is consuming lots of cpu...
You'll have to ask sleepycat themselves if you want a real
answer as to why hash databases don't perform well.
----------------------------------------------------------------------
Comment By: Marco Beri (marcoberi)
Date: 2004-01-22 18:16
Message:
Logged In: YES
user_id=588604
I get your same results under normal cmd: 7.07 seconds vs
0.46.
[c:\tmp]timer & \python23\python test3skip.py hashopen &
timer
Timer 1 on: 19.13.22
Timer 1 off: 19.13.29 Elapsed: 0.00.07,07
[c:\tmp]timer & \python23\python test3skip.py btopen & timer
Timer 1 on: 19.13.45
Timer 1 off: 19.13.45 Elapsed: 0.00.00,46
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2004-01-22 18:02
Message:
Logged In: YES
user_id=44345
Try test3skip.py. You run it like this:
python test3skip.py hashopen
python test3skip.py btopen
I ran it on win2k under cygwin so I could use the time command
(but ran the Windows version of Python). Using btopen was much
faster. I got rid of shelve to eliminate it and pickle as possible
sources of problems.
$ time /cygdrive/c/Python23/python test3skip.py hashopen
real 0m6.801s
user 0m0.015s
sys 0m0.000s
Administrator at CYCLOPS ~/tmp
$ time /cygdrive/c/Python23/python test3skip.py btopen
real 0m0.345s
user 0m0.015s
sys 0m0.015s
I don't know if the relationship between real, user and sys time
means anything on cygwin, but the reported real times are very
repeatable and match my subjective feel of the elapsed time. This
suggests there's something fishy with either the underlying library
or with __setitem__ when using hash files.
I'm assigning to Greg so he can take a peek. As the bsddb/
pybsddb guy he might have some better insight (certainly better
than me).
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2004-01-22 18:01
Message:
Logged In: YES
user_id=44345
Try test3skip.py. You run it like this:
python test3skip.py hashopen
python test3skip.py btopen
I ran it on win2k under cygwin so I could use the time command
(but ran the Windows version of Python). Using btopen was much
faster. I got rid of shelve to eliminate it and pickle as possible
sources of problems.
$ time /cygdrive/c/Python23/python test3skip.py hashopen
real 0m6.801s
user 0m0.015s
sys 0m0.000s
Administrator at CYCLOPS ~/tmp
$ time /cygdrive/c/Python23/python test3skip.py btopen
real 0m0.345s
user 0m0.015s
sys 0m0.015s
I don't know if the relationship between real, user and sys time
means anything on cygwin, but the reported real times are very
repeatable and match my subjective feel of the elapsed time. This
suggests there's something fishy with either the underlying library
or with __setitem__ when using hash files.
I'm assigning to Greg so he can take a peek. As the bsddb/
pybsddb guy he might have some better insight (certainly better
than me).
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-01-22 17:29
Message:
Logged In: YES
user_id=31435
FYI, on a Win98SE box, test1skip.py took about 30 seconds
under 2.3.3, and about 1 second under both 2.2.3 and 2.1.3.
Under 2.3.3, no significant time is taken by a.close(), so it's
all in the loop. It prints "dbhash" under all versions.
----------------------------------------------------------------------
Comment By: Marco Beri (marcoberi)
Date: 2004-01-22 07:30
Message:
Logged In: YES
user_id=588604
I tried your version: 31.36 seconds vs 0.65.
Just to be sure I tried on three different computers with
Windows 2000: same gap.
[c:\tmp]timer & \Python23\python test1skip.py & timer
Timer 1 on: 8.21.58
dbhash
Timer 1 off: 8.22.29 Elapsed: 0.00.31,36
[c:\tmp]timer & \Python22\python test1skip.py & timer
Timer 1 on: 8.22.40
dbhash
Timer 1 off: 8.22.41 Elapsed: 0.00.00,65
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2004-01-22 00:28
Message:
Logged In: YES
user_id=44345
Can't reproduce on Mac OS X. I tried with 2.2, 2.3 and CVS using
attached test1skip.py (no writeback - 2.2 doesn't support it, no
import pickle - not used, no key prints - just muddies the water,
print whichdb's result).
The times are close enough to not worry me:
montanaro:tmp% time python2.3 test1.py
dbhash
real 0m1.927s
user 0m1.720s
sys 0m0.080s
montanaro:tmp% time python2.2 test1.py
dbhash
real 0m1.250s
user 0m0.850s
sys 0m0.360s
montanaro:tmp% time python test1.py
dbhash
real 0m2.179s
user 0m1.950s
sys 0m0.120s
Please try this modified version just to make sure we are both
looking at the same thing.
----------------------------------------------------------------------
Comment By: Marco Beri (marcoberi)
Date: 2004-01-21 23:57
Message:
Logged In: YES
user_id=588604
Skip Montanaro discovered that whichdb repors bsddb185
with python 2.2 and dbhash with 2.3.3.
So why is it so slow after few thousand keys?
----------------------------------------------------------------------
Comment By: Thomas Heller (theller)
Date: 2004-01-21 18:24
Message:
Logged In: YES
user_id=11105
Hm, are windows bugs automatically assigned to me ;-)??
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=881522&group_id=5470
More information about the Python-bugs-list
mailing list