PyPy/python newb - need dbm other than dumbdbm and dbm
![](https://secure.gravatar.com/avatar/c5d4f2f4f6605a01fcf13b1adebb45d7.jpg?s=120&d=mm&r=g)
Hi all, I didn't see a pypy-user list, so I'm posting here. I came across PyPy while trying to find a way to speed up cvs2svn while converting a large CVS repo. I've pulled the latest via svn, got it built and have translated the interpreter, but cvs2svn using pypy-c fails with the output: debug: WARNING: library path not found, using compiled-in sys.path ERROR: cvs2svn uses the anydbm package, which depends on lower level dbm libraries. Your system has dumbdbm, with which cvs2svn is known to have problems. To use cvs2svn, you must install a Python dbm library other than dumbdbm or dbm. See http://python.org/doc/current/lib/module-anydbm.html for more information. Now, when we've run into this issue with python, someone just rebuilt python enabling built in libgdbm support, but I haven't found any way to do that or include some other dbm within PyPy. Since I'm not that familiar with python to start with, I'm hoping that I'm just missing something quite basic, and that some kind soul here can point me in the right direction... Should I be trying to translate cvs2svn instead of trying using pypy-c ? (this seems to involve building a target file or something). Am I just out of luck? -- Kelly F. Hickel Senior Product Architect MQSoftware, Inc. 952-345-8677 Office 952-345-8721 Fax kfh@mqsoftware.com www.mqsoftware.com Certified IBM SOA Specialty Your Full Service Provider for IBM WebSphere Learn more at www.mqsoftware.com
![](https://secure.gravatar.com/avatar/a5de4cf6334caef556290f8bcd00f09a.jpg?s=120&d=mm&r=g)
On Feb 11, 2009, at 4:28 PM, Kelly F. Hickel wrote:
Hi all, I didn't see a pypy-user list, so I'm posting here.
I came across PyPy while trying to find a way to speed up cvs2svn while converting a large CVS repo. I've pulled the latest via svn, got it built and have translated the interpreter, but cvs2svn using pypy-c fails with the output: debug: WARNING: library path not found, using compiled-in sys.path ERROR: cvs2svn uses the anydbm package, which depends on lower level dbm libraries. Your system has dumbdbm, with which cvs2svn is known to have problems. To use cvs2svn, you must install a Python dbm library other than dumbdbm or dbm. See http://python.org/doc/current/lib/module-anydbm.html for more information.
Now, when we've run into this issue with python, someone just rebuilt python enabling built in libgdbm support, but I haven't found any way to do that or include some other dbm within PyPy.
Since I'm not that familiar with python to start with, I'm hoping that I'm just missing something quite basic, and that some kind soul here can point me in the right direction...
Should I be trying to translate cvs2svn instead of trying using pypy- c ? (this seems to involve building a target file or something).
Am I just out of luck?
Try to use psyco to speed up cvs2svn. I'm afraid PyPy right know would not give you any improvements in performance from CPython for this (maybe even psyco will not help). ps: Isn't cvs2svn a one time thing? -- Leonardo Santagada santagada at gmail.com
![](https://secure.gravatar.com/avatar/c5d4f2f4f6605a01fcf13b1adebb45d7.jpg?s=120&d=mm&r=g)
I'm just missing something quite basic, and that some kind soul here can point me in the right direction...
Should I be trying to translate cvs2svn instead of trying using pypy- c ? (this seems to involve building a target file or something).
Am I just out of luck?
Try to use psyco to speed up cvs2svn. I'm afraid PyPy right know would not give you any improvements in performance from CPython for this (maybe even psyco will not help).
ps: Isn't cvs2svn a one time thing?
-- Leonardo Santagada santagada at gmail.com
Thanks for the quick reply, I'll give it a try. I had looked at pysco back in June, but got the impression that I'd be better off with PyPy.... Yes, cvs2svn is a one time thing in principle, but I've been running it a number of times, trying different things, as a means of validating the switch to git (cvs2svn also does cvs2git). Since it takes 6.5 days on a 2.4ghz AMD with 32gb of ram to translate our repo, even a small tweak to the process is quite expensive.... -Kelly
![](https://secure.gravatar.com/avatar/c5d4f2f4f6605a01fcf13b1adebb45d7.jpg?s=120&d=mm&r=g)
Try to use psyco to speed up cvs2svn. I'm afraid PyPy right know
would
not give you any improvements in performance from CPython for this (maybe even psyco will not help).
ps: Isn't cvs2svn a one time thing?
-- Leonardo Santagada santagada at gmail.com
Thanks for the quick reply, I'll give it a try. I had looked at pysco back in June, but got the impression that I'd be better off with PyPy....
Yes, cvs2svn is a one time thing in principle, but I've been running it a number of times, trying different things, as a means of validating the switch to git (cvs2svn also does cvs2git). Since it takes 6.5 days on a 2.4ghz AMD with 32gb of ram to translate our repo, even a small tweak to the process is quite expensive....
-Kelly
Ahh, yes, now I remember. Psyco ONLY works with 32 bit python, whereas I MUST use 64 bit because the memory footprint for cvs2svn with our repo is just too large for 32 bit processes. So, dead in the water again? -Kelly
![](https://secure.gravatar.com/avatar/fbb61bd6d94bfce41ffa985c2081577f.jpg?s=120&d=mm&r=g)
Le Wednesday 11 February 2009 19:49:59 Kelly F. Hickel, vous avez écrit :
Ahh, yes, now I remember. Psyco ONLY works with 32 bit python, whereas I MUST use 64 bit because the memory footprint for cvs2svn with our repo is just too large for 32 bit processes.
So, dead in the water again?
Hmm, just a quick question: are you sure your problem is CPU bound (as opposed to IO bound)? If it is not, you are looking at the wrong place to solve your performance issue, and you should rather look for a faster disk, faster network, etc. Also from what you say hereabove, maybe adding some RAM could help. -- Alexandre Fayolle LOGILAB, Paris (France) Formations Python, Zope, Plone, Debian: http://www.logilab.fr/formations Développement logiciel sur mesure: http://www.logilab.fr/services Informatique scientifique: http://www.logilab.fr/science
![](https://secure.gravatar.com/avatar/a5de4cf6334caef556290f8bcd00f09a.jpg?s=120&d=mm&r=g)
On Feb 12, 2009, at 6:14 AM, Alexandre Fayolle wrote:
Le Wednesday 11 February 2009 19:49:59 Kelly F. Hickel, vous avez écrit :
Ahh, yes, now I remember. Psyco ONLY works with 32 bit python, whereas I MUST use 64 bit because the memory footprint for cvs2svn with our repo is just too large for 32 bit processes.
So, dead in the water again?
Hmm, just a quick question: are you sure your problem is CPU bound (as opposed to IO bound)? If it is not, you are looking at the wrong place to solve your performance issue, and you should rather look for a faster disk, faster network, etc. Also from what you say hereabove, maybe adding some RAM could help.
I'm answering this as we already talked about it in some private emails. He already figured out the problem is cpu bond, one of the cpus goes to 100% and stays almost the whole time this way for 6.5 days. His machine has 32 gb of RAM and the python conversion program uses good part of it that's why he can't run the thing on 32bit mode. His repo is only 6gb in size so I think maybe the problem is either a bad performing algo no cvs2svn or a bad cvs library (or whatever is used to access cvs). If it is the cvs library there is nothing no one can do (even psyco or pypy would not help). If the problem is a wrong pure python algo it should probably be changed (if a better one exists of course). Now why would a program reading a 6gb repo, even using a database file for intermediate data still uses tons of ram to do this conversion I don't know. -- Leonardo Santagada santagada at gmail.com
participants (3)
-
Alexandre Fayolle
-
Kelly F. Hickel
-
Leonardo Santagada