Question about PyPI full mirror (38,000+ packages) using devpi
I've downloaded all packages from PyPI and I'd like to create a local mirror using devpi. I had the following idea: 1. create a new index say /root/pypimirror based on /root/pypi 2. upload the entire folder containing 38,000+ packages onto /root/pypimirror 3. eventually setting /root/pypimirrot to NotVolatile (if this can be done, somehow) /myuser/myindex +-- /root/pypimirror +-- /root/pypi Another idea would be creating a "parallel" index to /root/pypi and exposing a third index which derives from both. /myuser/myindex +-- /root/pypi +-- /root/pypimirror Does this idea make sense? Is there a better way of doing it, in particular without having to download everything again from PyPI? I have another concern: performance. Since 38,000+ packages implies on large directories in the file system... do you think that devpi will have troubles managing such amount of packages? Thanks a lot, -- Richard
On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote:
I've downloaded all packages from PyPI and I'd like to create a local mirror using devpi.
If you want to have a full non-lazy mirror, did you consider using bandersnatch?
I had the following idea:
1. create a new index say /root/pypimirror based on /root/pypi
2. upload the entire folder containing 38,000+ packages onto /root/pypimirror
This might fail if your system uses a 32K limit on directory entries. I've just fixed it for /root/pypi (not released yet) and i can also fix it for private indices.
3. eventually setting /root/pypimirrot to NotVolatile (if this can be done, somehow)
You can change index volatility any time.
/myuser/myindex +-- /root/pypimirror +-- /root/pypi
Another idea would be creating a "parallel" index to /root/pypi and exposing a third index which derives from both.
/myuser/myindex +-- /root/pypi +-- /root/pypimirror
Does this idea make sense? Is there a better way of doing it, in particular without having to download everything again from PyPI?
Could you state more clearly what you want to achieve in the first place? I can see a number of possible motivations but would like to understand your particular ones. (Did i mention that i was always very bad in school at understanding textual questions in math questions although others seemed to be able to guess the correct meaning of the question? :)
I have another concern: performance. Since 38,000+ packages implies on large directories in the file system... do you think that devpi will have troubles managing such amount of packages?
see above. /root/pypi handles >32K fine on trunk. And private indices can also be made to do so. (So far there wasn't a use case for >32K private packages -- let's see if we have one here) best, holger
Thanks a lot,
-- Richard
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/groups/opt_out.
Hello Holger, No, I haven't tried bandersnatch. I think devpi is perfect for my workflow and I'm not willing to try other things. Sorry for not being very clear on my intentions. My fault. I made the question more complicated than it should be. In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I should load all those 38K packages into devpi cache. I guess that I should make /root/pypi volatile and I should upload package by package onto it. Does it make sense? I'm basically confused about how devpi decides (or detects) that eventually a package must be updated from PyPI since I've uploaded it by hand. I'm not sure if this idea of uploading by hand would work well or would eventually make devpii confused about when new updates should be downloaded from PyPI. Thanks Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr On 24/01/14 20:05, holger krekel wrote:
On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote:
I've downloaded all packages from PyPI and I'd like to create a local mirror using devpi. If you want to have a full non-lazy mirror, did you consider using bandersnatch?
I had the following idea:
1. create a new index say /root/pypimirror based on /root/pypi
2. upload the entire folder containing 38,000+ packages onto /root/pypimirror This might fail if your system uses a 32K limit on directory entries. I've just fixed it for /root/pypi (not released yet) and i can also fix it for private indices.
3. eventually setting /root/pypimirrot to NotVolatile (if this can be done, somehow)
You can change index volatility any time.
/myuser/myindex +-- /root/pypimirror +-- /root/pypi
Another idea would be creating a "parallel" index to /root/pypi and exposing a third index which derives from both.
/myuser/myindex +-- /root/pypi +-- /root/pypimirror
Does this idea make sense? Is there a better way of doing it, in particular without having to download everything again from PyPI?
Could you state more clearly what you want to achieve in the first place? I can see a number of possible motivations but would like to understand your particular ones. (Did i mention that i was always very bad in school at understanding textual questions in math questions although others seemed to be able to guess the correct meaning of the question? :)
I have another concern: performance. Since 38,000+ packages implies on large directories in the file system... do you think that devpi will have troubles managing such amount of packages? see above. /root/pypi handles >32K fine on trunk. And private indices can also be made to do so. (So far there wasn't a use case for >32K private packages -- let's see if we have one here)
best, holger
Thanks a lot,
-- Richard
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/groups/opt_out.
Hi Richard, On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote:
Hello Holger,
No, I haven't tried bandersnatch. I think devpi is perfect for my workflow and I'm not willing to try other things.
ok.
Sorry for not being very clear on my intentions. My fault. I made the question more complicated than it should be.
In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I should load all those 38K packages into devpi cache. I guess that I should make /root/pypi volatile and I should upload package by package onto it. Does it make sense?
There are 38K projects but many more archive files. If you want a full mirror, that's around 40Gbytes of storage (and the according network traffic). What i think makes more sense is to go for triggering all index pages by iterating over all projects and maybe get and thereby cache the first (highest version) archive files. You can get some ideas how to do that in the repository at server/extra/compare_devpi_server.py e.g. through the getnames() function.
I'm basically confused about how devpi decides (or detects) that eventually a package must be updated from PyPI since I've uploaded it by hand. I'm not sure if this idea of uploading by hand would work well or would eventually make devpii confused about when new updates should be downloaded from PyPI.
One you touched/retrieved all project index pages, devpi-server will auto-update all index pages for those index pages. HTH, holger
http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 24/01/14 20:05, holger krekel wrote:
On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote:
I've downloaded all packages from PyPI and I'd like to create a local mirror using devpi. If you want to have a full non-lazy mirror, did you consider using bandersnatch?
I had the following idea:
1. create a new index say /root/pypimirror based on /root/pypi
2. upload the entire folder containing 38,000+ packages onto /root/pypimirror This might fail if your system uses a 32K limit on directory entries. I've just fixed it for /root/pypi (not released yet) and i can also fix it for private indices.
3. eventually setting /root/pypimirrot to NotVolatile (if this can be done, somehow)
You can change index volatility any time.
/myuser/myindex +-- /root/pypimirror +-- /root/pypi
Another idea would be creating a "parallel" index to /root/pypi and exposing a third index which derives from both.
/myuser/myindex +-- /root/pypi +-- /root/pypimirror
Does this idea make sense? Is there a better way of doing it, in particular without having to download everything again from PyPI?
Could you state more clearly what you want to achieve in the first place? I can see a number of possible motivations but would like to understand your particular ones. (Did i mention that i was always very bad in school at understanding textual questions in math questions although others seemed to be able to guess the correct meaning of the question? :)
I have another concern: performance. Since 38,000+ packages implies on large directories in the file system... do you think that devpi will have troubles managing such amount of packages? see above. /root/pypi handles >32K fine on trunk. And private indices can also be made to do so. (So far there wasn't a use case for >32K private packages -- let's see if we have one here)
best, holger
Thanks a lot,
-- Richard
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/groups/opt_out.
Hello Holger, I understood the point about triggering all index pages. But I guess that it would download everything [again] from PyPI, correct? So, in order to avoid another wasteful full download, cos I've already done that... I'm trying to upload packages from the a local folder. Unless you tell me it is not going to work, I'd like to persist on this route. You know... I can take this chance to gain some mileage with devpi :) First thing would be making /root/pypi volatile (if I understood properly!). # let's play with /root/test first, just to see how it works (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False /root/test changing volatile: False http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=False uploadtrigger_jenkins=None acl_upload=root [87112 refs] (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True /root/test changing volatile: True http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root [87112 refs] # now let's try with /root/pypi (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True /root/pypi changing volatile: True http://pypi.localdomain:8080/root/pypi: type=mirror bases= volatile=False Traceback (most recent call last): File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", line 29, in main return method(hub, hub.args) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 65, in main return index_modify(hub, url, kvdict) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 14, in index_modify index_show(hub, url) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 36, in index_show ixconfig["uploadtrigger_jenkins"],)) KeyError: 'uploadtrigger_jenkins' [87112 refs] Hum... not very good :( I've opened issue #82 https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile Meanwhile, I will play with another index and let you know how it goes. Thanks Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr On 25/01/14 09:29, holger krekel wrote:
Hi Richard,
On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote:
Hello Holger,
No, I haven't tried bandersnatch. I think devpi is perfect for my workflow and I'm not willing to try other things. ok.
Sorry for not being very clear on my intentions. My fault. I made the question more complicated than it should be.
In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I should load all those 38K packages into devpi cache. I guess that I should make /root/pypi volatile and I should upload package by package onto it. Does it make sense? There are 38K projects but many more archive files. If you want a full mirror, that's around 40Gbytes of storage (and the according network traffic).
What i think makes more sense is to go for triggering all index pages by iterating over all projects and maybe get and thereby cache the first (highest version) archive files. You can get some ideas how to do that in the repository at server/extra/compare_devpi_server.py e.g. through the getnames() function.
I'm basically confused about how devpi decides (or detects) that eventually a package must be updated from PyPI since I've uploaded it by hand. I'm not sure if this idea of uploading by hand would work well or would eventually make devpii confused about when new updates should be downloaded from PyPI. One you touched/retrieved all project index pages, devpi-server will auto-update all index pages for those index pages.
HTH, holger
http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 24/01/14 20:05, holger krekel wrote:
On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote:
I've downloaded all packages from PyPI and I'd like to create a local mirror using devpi. If you want to have a full non-lazy mirror, did you consider using bandersnatch?
I had the following idea:
1. create a new index say /root/pypimirror based on /root/pypi
2. upload the entire folder containing 38,000+ packages onto /root/pypimirror This might fail if your system uses a 32K limit on directory entries. I've just fixed it for /root/pypi (not released yet) and i can also fix it for private indices.
3. eventually setting /root/pypimirrot to NotVolatile (if this can be done, somehow)
You can change index volatility any time.
/myuser/myindex +-- /root/pypimirror +-- /root/pypi
Another idea would be creating a "parallel" index to /root/pypi and exposing a third index which derives from both.
/myuser/myindex +-- /root/pypi +-- /root/pypimirror
Does this idea make sense? Is there a better way of doing it, in particular without having to download everything again from PyPI?
Could you state more clearly what you want to achieve in the first place? I can see a number of possible motivations but would like to understand your particular ones. (Did i mention that i was always very bad in school at understanding textual questions in math questions although others seemed to be able to guess the correct meaning of the question? :)
I have another concern: performance. Since 38,000+ packages implies on large directories in the file system... do you think that devpi will have troubles managing such amount of packages? see above. /root/pypi handles >32K fine on trunk. And private indices can also be made to do so. (So far there wasn't a use case for >32K private packages -- let's see if we have one here)
best, holger
Thanks a lot,
-- Richard
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/groups/opt_out.
Hello Richard, On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote:
Hello Holger,
I understood the point about triggering all index pages. But I guess that it would download everything [again] from PyPI, correct?
When a fresh devpi-server starts up, nothing is done except getting the list of all projects from pypi.python.org. Only the triggering of index pages will cache them, and only accessing a specific archive file will cache that.
So, in order to avoid another wasteful full download, cos I've already done that... I'm trying to upload packages from the a local folder.
If you want to upload pypi.python.org packages you have to do that into a private index and that will not be automatically kept up to date via the syncing mechanism. You would have to do any updates by hand. There is no way around that, volatile index or not. /root/pypi is an index where you cannot change properties like volatile. The error message could be better, agreed. The only thing we could consider is adding an option to devpi-server that allows looking up archive files from a local directory (structure) before trying a remote operation. We know the md5 checksum and filename and so can match precisely with already downloaded files. But don't hold your breath on that. cheers, holger
Unless you tell me it is not going to work, I'd like to persist on this route. You know... I can take this chance to gain some mileage with devpi :)
First thing would be making /root/pypi volatile (if I understood properly!).
# let's play with /root/test first, just to see how it works (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False /root/test changing volatile: False http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=False uploadtrigger_jenkins=None acl_upload=root [87112 refs]
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True /root/test changing volatile: True http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root [87112 refs]
# now let's try with /root/pypi (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True /root/pypi changing volatile: True http://pypi.localdomain:8080/root/pypi: type=mirror bases= volatile=False Traceback (most recent call last): File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", line 29, in main return method(hub, hub.args) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 65, in main return index_modify(hub, url, kvdict) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 14, in index_modify index_show(hub, url) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 36, in index_show ixconfig["uploadtrigger_jenkins"],)) KeyError: 'uploadtrigger_jenkins' [87112 refs]
Hum... not very good :(
I've opened issue #82 https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile
Meanwhile, I will play with another index and let you know how it goes.
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 25/01/14 09:29, holger krekel wrote:
Hi Richard,
On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote:
Hello Holger,
No, I haven't tried bandersnatch. I think devpi is perfect for my workflow and I'm not willing to try other things. ok.
Sorry for not being very clear on my intentions. My fault. I made the question more complicated than it should be.
In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I should load all those 38K packages into devpi cache. I guess that I should make /root/pypi volatile and I should upload package by package onto it. Does it make sense? There are 38K projects but many more archive files. If you want a full mirror, that's around 40Gbytes of storage (and the according network traffic).
What i think makes more sense is to go for triggering all index pages by iterating over all projects and maybe get and thereby cache the first (highest version) archive files. You can get some ideas how to do that in the repository at server/extra/compare_devpi_server.py e.g. through the getnames() function.
I'm basically confused about how devpi decides (or detects) that eventually a package must be updated from PyPI since I've uploaded it by hand. I'm not sure if this idea of uploading by hand would work well or would eventually make devpii confused about when new updates should be downloaded from PyPI. One you touched/retrieved all project index pages, devpi-server will auto-update all index pages for those index pages.
HTH, holger
http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 24/01/14 20:05, holger krekel wrote:
On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote:
I've downloaded all packages from PyPI and I'd like to create a local mirror using devpi. If you want to have a full non-lazy mirror, did you consider using bandersnatch?
I had the following idea:
1. create a new index say /root/pypimirror based on /root/pypi
2. upload the entire folder containing 38,000+ packages onto /root/pypimirror This might fail if your system uses a 32K limit on directory entries. I've just fixed it for /root/pypi (not released yet) and i can also fix it for private indices.
3. eventually setting /root/pypimirrot to NotVolatile (if this can be done, somehow)
You can change index volatility any time.
/myuser/myindex +-- /root/pypimirror +-- /root/pypi
Another idea would be creating a "parallel" index to /root/pypi and exposing a third index which derives from both.
/myuser/myindex +-- /root/pypi +-- /root/pypimirror
Does this idea make sense? Is there a better way of doing it, in particular without having to download everything again from PyPI?
Could you state more clearly what you want to achieve in the first place? I can see a number of possible motivations but would like to understand your particular ones. (Did i mention that i was always very bad in school at understanding textual questions in math questions although others seemed to be able to guess the correct meaning of the question? :)
I have another concern: performance. Since 38,000+ packages implies on large directories in the file system... do you think that devpi will have troubles managing such amount of packages? see above. /root/pypi handles >32K fine on trunk. And private indices can also be made to do so. (So far there wasn't a use case for >32K private packages -- let's see if we have one here)
best, holger
Thanks a lot,
-- Richard
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/groups/opt_out.
Hello Holger, Thanks a lot for your prompt answer. Yes, I understand that we'd better keep /root/pypi as it is. I've created an inherited index based on /root/pypi, like shown below: (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypimirror http://pypi.localdomain:8080/root/pypimirror: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root I'm currently uploading packages onto this index. I will let you know later how it goes in regards to difficulties I had and performance observed. Am I correct to think that, if pip retrieves a packages which was updated on PyPI, it (pip) will get the updated package because /root/pypimirror inherits from /root/pypi which keeps itself in sync with PyPI? Thanks Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr On 26/01/14 07:56, holger krekel wrote:
Hello Richard,
Hello Holger,
I understood the point about triggering all index pages. But I guess that it would download everything [again] from PyPI, correct? When a fresh devpi-server starts up, nothing is done except getting
On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote: the list of all projects from pypi.python.org. Only the triggering of index pages will cache them, and only accessing a specific archive file will cache that.
So, in order to avoid another wasteful full download, cos I've already done that... I'm trying to upload packages from the a local folder. If you want to upload pypi.python.org packages you have to do that into a private index and that will not be automatically kept up to date via the syncing mechanism. You would have to do any updates by hand. There is no way around that, volatile index or not. /root/pypi is an index where you cannot change properties like volatile. The error message could be better, agreed.
The only thing we could consider is adding an option to devpi-server that allows looking up archive files from a local directory (structure) before trying a remote operation. We know the md5 checksum and filename and so can match precisely with already downloaded files. But don't hold your breath on that.
cheers,
holger
Unless you tell me it is not going to work, I'd like to persist on this route. You know... I can take this chance to gain some mileage with devpi :)
First thing would be making /root/pypi volatile (if I understood properly!).
# let's play with /root/test first, just to see how it works (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False /root/test changing volatile: False http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=False uploadtrigger_jenkins=None acl_upload=root [87112 refs]
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True /root/test changing volatile: True http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root [87112 refs]
# now let's try with /root/pypi (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True /root/pypi changing volatile: True http://pypi.localdomain:8080/root/pypi: type=mirror bases= volatile=False Traceback (most recent call last): File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", line 29, in main return method(hub, hub.args) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 65, in main return index_modify(hub, url, kvdict) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 14, in index_modify index_show(hub, url) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 36, in index_show ixconfig["uploadtrigger_jenkins"],)) KeyError: 'uploadtrigger_jenkins' [87112 refs]
Hum... not very good :(
I've opened issue #82 https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile
Meanwhile, I will play with another index and let you know how it goes.
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 25/01/14 09:29, holger krekel wrote:
Hi Richard,
On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote:
Hello Holger,
No, I haven't tried bandersnatch. I think devpi is perfect for my workflow and I'm not willing to try other things. ok.
Sorry for not being very clear on my intentions. My fault. I made the question more complicated than it should be.
In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I should load all those 38K packages into devpi cache. I guess that I should make /root/pypi volatile and I should upload package by package onto it. Does it make sense? There are 38K projects but many more archive files. If you want a full mirror, that's around 40Gbytes of storage (and the according network traffic).
What i think makes more sense is to go for triggering all index pages by iterating over all projects and maybe get and thereby cache the first (highest version) archive files. You can get some ideas how to do that in the repository at server/extra/compare_devpi_server.py e.g. through the getnames() function.
I'm basically confused about how devpi decides (or detects) that eventually a package must be updated from PyPI since I've uploaded it by hand. I'm not sure if this idea of uploading by hand would work well or would eventually make devpii confused about when new updates should be downloaded from PyPI. One you touched/retrieved all project index pages, devpi-server will auto-update all index pages for those index pages.
HTH, holger
http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 24/01/14 20:05, holger krekel wrote:
On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote:
I've downloaded all packages from PyPI and I'd like to create a local mirror using devpi. If you want to have a full non-lazy mirror, did you consider using bandersnatch?
I had the following idea:
1. create a new index say /root/pypimirror based on /root/pypi
2. upload the entire folder containing 38,000+ packages onto /root/pypimirror This might fail if your system uses a 32K limit on directory entries. I've just fixed it for /root/pypi (not released yet) and i can also fix it for private indices.
3. eventually setting /root/pypimirrot to NotVolatile (if this can be done, somehow)
You can change index volatility any time.
/myuser/myindex +-- /root/pypimirror +-- /root/pypi
Another idea would be creating a "parallel" index to /root/pypi and exposing a third index which derives from both.
/myuser/myindex +-- /root/pypi +-- /root/pypimirror
Does this idea make sense? Is there a better way of doing it, in particular without having to download everything again from PyPI?
Could you state more clearly what you want to achieve in the first place? I can see a number of possible motivations but would like to understand your particular ones. (Did i mention that i was always very bad in school at understanding textual questions in math questions although others seemed to be able to guess the correct meaning of the question? :)
I have another concern: performance. Since 38,000+ packages implies on large directories in the file system... do you think that devpi will have troubles managing such amount of packages? see above. /root/pypi handles >32K fine on trunk. And private indices can also be made to do so. (So far there wasn't a use case for >32K private packages -- let's see if we have one here)
best, holger
Thanks a lot,
-- Richard
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/groups/opt_out.
hi Richard, On Sun, Jan 26, 2014 at 13:55 +0000, Richard Gomes wrote:
Hello Holger,
Thanks a lot for your prompt answer. Yes, I understand that we'd better keep /root/pypi as it is.
I've created an inherited index based on /root/pypi, like shown below:
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypimirror http://pypi.localdomain:8080/root/pypimirror: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root
I'm currently uploading packages onto this index. I will let you know later how it goes in regards to difficulties I had and performance observed.
I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index.
Am I correct to think that, if pip retrieves a packages which was updated on PyPI, it (pip) will get the updated package because /root/pypimirror inherits from /root/pypi which keeps itself in sync with PyPI?
I think this should work, yes. Are you doing this only because you want to avoid forgetting about your downloads? best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 07:56, holger krekel wrote:
Hello Richard,
Hello Holger,
I understood the point about triggering all index pages. But I guess that it would download everything [again] from PyPI, correct? When a fresh devpi-server starts up, nothing is done except getting
On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote: the list of all projects from pypi.python.org. Only the triggering of index pages will cache them, and only accessing a specific archive file will cache that.
So, in order to avoid another wasteful full download, cos I've already done that... I'm trying to upload packages from the a local folder. If you want to upload pypi.python.org packages you have to do that into a private index and that will not be automatically kept up to date via the syncing mechanism. You would have to do any updates by hand. There is no way around that, volatile index or not. /root/pypi is an index where you cannot change properties like volatile. The error message could be better, agreed.
The only thing we could consider is adding an option to devpi-server that allows looking up archive files from a local directory (structure) before trying a remote operation. We know the md5 checksum and filename and so can match precisely with already downloaded files. But don't hold your breath on that.
cheers,
holger
Unless you tell me it is not going to work, I'd like to persist on this route. You know... I can take this chance to gain some mileage with devpi :)
First thing would be making /root/pypi volatile (if I understood properly!).
# let's play with /root/test first, just to see how it works (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False /root/test changing volatile: False http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=False uploadtrigger_jenkins=None acl_upload=root [87112 refs]
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True /root/test changing volatile: True http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root [87112 refs]
# now let's try with /root/pypi (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True /root/pypi changing volatile: True http://pypi.localdomain:8080/root/pypi: type=mirror bases= volatile=False Traceback (most recent call last): File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", line 29, in main return method(hub, hub.args) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 65, in main return index_modify(hub, url, kvdict) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 14, in index_modify index_show(hub, url) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 36, in index_show ixconfig["uploadtrigger_jenkins"],)) KeyError: 'uploadtrigger_jenkins' [87112 refs]
Hum... not very good :(
I've opened issue #82 https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile
Meanwhile, I will play with another index and let you know how it goes.
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 25/01/14 09:29, holger krekel wrote:
Hi Richard,
On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote:
Hello Holger,
No, I haven't tried bandersnatch. I think devpi is perfect for my workflow and I'm not willing to try other things. ok.
Sorry for not being very clear on my intentions. My fault. I made the question more complicated than it should be.
In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I should load all those 38K packages into devpi cache. I guess that I should make /root/pypi volatile and I should upload package by package onto it. Does it make sense? There are 38K projects but many more archive files. If you want a full mirror, that's around 40Gbytes of storage (and the according network traffic).
What i think makes more sense is to go for triggering all index pages by iterating over all projects and maybe get and thereby cache the first (highest version) archive files. You can get some ideas how to do that in the repository at server/extra/compare_devpi_server.py e.g. through the getnames() function.
I'm basically confused about how devpi decides (or detects) that eventually a package must be updated from PyPI since I've uploaded it by hand. I'm not sure if this idea of uploading by hand would work well or would eventually make devpii confused about when new updates should be downloaded from PyPI. One you touched/retrieved all project index pages, devpi-server will auto-update all index pages for those index pages.
HTH, holger
http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 24/01/14 20:05, holger krekel wrote:
On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote: > I've downloaded all packages from PyPI and I'd like to create a local > mirror using devpi. If you want to have a full non-lazy mirror, did you consider using bandersnatch?
> I had the following idea: > > 1. create a new index say /root/pypimirror based on /root/pypi > > 2. upload the entire folder containing 38,000+ packages onto > /root/pypimirror This might fail if your system uses a 32K limit on directory entries. I've just fixed it for /root/pypi (not released yet) and i can also fix it for private indices.
> 3. eventually setting /root/pypimirrot to NotVolatile (if this can be > done, somehow) You can change index volatility any time.
> /myuser/myindex > +-- /root/pypimirror > +-- /root/pypi > > Another idea would be creating a "parallel" index to /root/pypi and > exposing a third index which derives from both. > > /myuser/myindex > +-- /root/pypi > +-- /root/pypimirror > > > Does this idea make sense? Is there a better way of doing it, in particular > without having to download everything again from PyPI? Could you state more clearly what you want to achieve in the first place? I can see a number of possible motivations but would like to understand your particular ones. (Did i mention that i was always very bad in school at understanding textual questions in math questions although others seemed to be able to guess the correct meaning of the question? :)
> I have another concern: performance. Since 38,000+ packages implies on > large directories in the file system... do you think that devpi will have > troubles managing such amount of packages? see above. /root/pypi handles >32K fine on trunk. And private indices can also be made to do so. (So far there wasn't a use case for >32K private packages -- let's see if we have one here)
best, holger
> Thanks a lot, > > -- Richard > > -- > You received this message because you are subscribed to the Google Groups "devpi-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. > To post to this group, send email to devp...@googlegroups.com. > Visit this group at http://groups.google.com/group/devpi-dev. > For more options, visit https://groups.google.com/groups/opt_out.
Hi Holger,
I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index.
I've installed your most recent changes from bitbucket in regards to this subject. Let's see how it goes!
Are you doing this only because you want to avoid forgetting about your downloads?
I'm doing this because I had already my fair share of troubles with PyPI and I'd like to rule out details about whether PyPI is available or not. I simply cannot depend on external factors when dealing with systems in production. As you mentioned before, I could employ bandersnatch... but I definitely don't see reason for adopting more than one tool for exactly one purpose. Besides, I like the idea of having multiple indexes whilst developing applications. So, again, devpi fits the bill very well. I just need to upload those 38K+ packages in devpi once in a lifetime and probably never to think about this again. Cheers :) Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr On 26/01/14 14:06, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 13:55 +0000, Richard Gomes wrote:
Hello Holger,
Thanks a lot for your prompt answer. Yes, I understand that we'd better keep /root/pypi as it is.
I've created an inherited index based on /root/pypi, like shown below:
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypimirror http://pypi.localdomain:8080/root/pypimirror: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root
I'm currently uploading packages onto this index. I will let you know later how it goes in regards to difficulties I had and performance observed. I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index.
Am I correct to think that, if pip retrieves a packages which was updated on PyPI, it (pip) will get the updated package because /root/pypimirror inherits from /root/pypi which keeps itself in sync with PyPI? I think this should work, yes.
Are you doing this only because you want to avoid forgetting about your downloads?
best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 07:56, holger krekel wrote:
Hello Richard,
Hello Holger,
I understood the point about triggering all index pages. But I guess that it would download everything [again] from PyPI, correct? When a fresh devpi-server starts up, nothing is done except getting
On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote: the list of all projects from pypi.python.org. Only the triggering of index pages will cache them, and only accessing a specific archive file will cache that.
So, in order to avoid another wasteful full download, cos I've already done that... I'm trying to upload packages from the a local folder. If you want to upload pypi.python.org packages you have to do that into a private index and that will not be automatically kept up to date via the syncing mechanism. You would have to do any updates by hand. There is no way around that, volatile index or not. /root/pypi is an index where you cannot change properties like volatile. The error message could be better, agreed.
The only thing we could consider is adding an option to devpi-server that allows looking up archive files from a local directory (structure) before trying a remote operation. We know the md5 checksum and filename and so can match precisely with already downloaded files. But don't hold your breath on that.
cheers,
holger
Unless you tell me it is not going to work, I'd like to persist on this route. You know... I can take this chance to gain some mileage with devpi :)
First thing would be making /root/pypi volatile (if I understood properly!).
# let's play with /root/test first, just to see how it works (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False /root/test changing volatile: False http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=False uploadtrigger_jenkins=None acl_upload=root [87112 refs]
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True /root/test changing volatile: True http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root [87112 refs]
# now let's try with /root/pypi (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True /root/pypi changing volatile: True http://pypi.localdomain:8080/root/pypi: type=mirror bases= volatile=False Traceback (most recent call last): File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", line 29, in main return method(hub, hub.args) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 65, in main return index_modify(hub, url, kvdict) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 14, in index_modify index_show(hub, url) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 36, in index_show ixconfig["uploadtrigger_jenkins"],)) KeyError: 'uploadtrigger_jenkins' [87112 refs]
Hum... not very good :(
I've opened issue #82 https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile
Meanwhile, I will play with another index and let you know how it goes.
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 25/01/14 09:29, holger krekel wrote:
Hi Richard,
On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote:
Hello Holger,
No, I haven't tried bandersnatch. I think devpi is perfect for my workflow and I'm not willing to try other things. ok.
Sorry for not being very clear on my intentions. My fault. I made the question more complicated than it should be.
In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I should load all those 38K packages into devpi cache. I guess that I should make /root/pypi volatile and I should upload package by package onto it. Does it make sense? There are 38K projects but many more archive files. If you want a full mirror, that's around 40Gbytes of storage (and the according network traffic).
What i think makes more sense is to go for triggering all index pages by iterating over all projects and maybe get and thereby cache the first (highest version) archive files. You can get some ideas how to do that in the repository at server/extra/compare_devpi_server.py e.g. through the getnames() function.
I'm basically confused about how devpi decides (or detects) that eventually a package must be updated from PyPI since I've uploaded it by hand. I'm not sure if this idea of uploading by hand would work well or would eventually make devpii confused about when new updates should be downloaded from PyPI. One you touched/retrieved all project index pages, devpi-server will auto-update all index pages for those index pages.
HTH, holger
http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 24/01/14 20:05, holger krekel wrote: > On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote: >> I've downloaded all packages from PyPI and I'd like to create a local >> mirror using devpi. > If you want to have a full non-lazy mirror, did you consider using > bandersnatch? > >> I had the following idea: >> >> 1. create a new index say /root/pypimirror based on /root/pypi >> >> 2. upload the entire folder containing 38,000+ packages onto >> /root/pypimirror > This might fail if your system uses a 32K limit on directory entries. > I've just fixed it for /root/pypi (not released yet) and i can also > fix it for private indices. > >> 3. eventually setting /root/pypimirrot to NotVolatile (if this can be >> done, somehow) > You can change index volatility any time. > >> /myuser/myindex >> +-- /root/pypimirror >> +-- /root/pypi >> >> Another idea would be creating a "parallel" index to /root/pypi and >> exposing a third index which derives from both. >> >> /myuser/myindex >> +-- /root/pypi >> +-- /root/pypimirror >> >> >> Does this idea make sense? Is there a better way of doing it, in particular >> without having to download everything again from PyPI? > Could you state more clearly what you want to achieve in the first > place? I can see a number of possible motivations but would like > to understand your particular ones. (Did i mention that i was always > very bad in school at understanding textual questions in math questions > although others seemed to be able to guess the correct meaning of the > question? :) > >> I have another concern: performance. Since 38,000+ packages implies on >> large directories in the file system... do you think that devpi will have >> troubles managing such amount of packages? > see above. /root/pypi handles >32K fine on trunk. And private indices > can also be made to do so. (So far there wasn't a use case for >32K > private packages -- let's see if we have one here) > > best, > holger > > > >> Thanks a lot, >> >> -- Richard >> >> -- >> You received this message because you are subscribed to the Google Groups "devpi-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. >> To post to this group, send email to devp...@googlegroups.com. >> Visit this group at http://groups.google.com/group/devpi-dev. >> For more options, visit https://groups.google.com/groups/opt_out.
On Sun, Jan 26, 2014 at 14:29 +0000, Richard Gomes wrote:
Hi Holger,
I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index.
I've installed your most recent changes from bitbucket in regards to this subject. Let's see how it goes!
the >32K entries at this point only applies to /root/pypi.
Are you doing this only because you want to avoid forgetting about your downloads?
I'm doing this because I had already my fair share of troubles with PyPI and I'd like to rule out details about whether PyPI is available or not. I simply cannot depend on external factors when dealing with systems in production. As you mentioned before, I could employ bandersnatch... but I definitely don't see reason for adopting more than one tool for exactly one purpose. Besides, I like the idea of having multiple indexes whilst developing applications. So, again, devpi fits the bill very well. I just need to upload those 38K+ packages in devpi once in a lifetime and probably never to think about this again.
good luck then :) holger
Cheers :)
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:06, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 13:55 +0000, Richard Gomes wrote:
Hello Holger,
Thanks a lot for your prompt answer. Yes, I understand that we'd better keep /root/pypi as it is.
I've created an inherited index based on /root/pypi, like shown below:
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypimirror http://pypi.localdomain:8080/root/pypimirror: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root
I'm currently uploading packages onto this index. I will let you know later how it goes in regards to difficulties I had and performance observed. I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index.
Am I correct to think that, if pip retrieves a packages which was updated on PyPI, it (pip) will get the updated package because /root/pypimirror inherits from /root/pypi which keeps itself in sync with PyPI? I think this should work, yes.
Are you doing this only because you want to avoid forgetting about your downloads?
best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 07:56, holger krekel wrote:
Hello Richard,
Hello Holger,
I understood the point about triggering all index pages. But I guess that it would download everything [again] from PyPI, correct? When a fresh devpi-server starts up, nothing is done except getting
On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote: the list of all projects from pypi.python.org. Only the triggering of index pages will cache them, and only accessing a specific archive file will cache that.
So, in order to avoid another wasteful full download, cos I've already done that... I'm trying to upload packages from the a local folder. If you want to upload pypi.python.org packages you have to do that into a private index and that will not be automatically kept up to date via the syncing mechanism. You would have to do any updates by hand. There is no way around that, volatile index or not. /root/pypi is an index where you cannot change properties like volatile. The error message could be better, agreed.
The only thing we could consider is adding an option to devpi-server that allows looking up archive files from a local directory (structure) before trying a remote operation. We know the md5 checksum and filename and so can match precisely with already downloaded files. But don't hold your breath on that.
cheers,
holger
Unless you tell me it is not going to work, I'd like to persist on this route. You know... I can take this chance to gain some mileage with devpi :)
First thing would be making /root/pypi volatile (if I understood properly!).
# let's play with /root/test first, just to see how it works (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False /root/test changing volatile: False http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=False uploadtrigger_jenkins=None acl_upload=root [87112 refs]
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True /root/test changing volatile: True http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root [87112 refs]
# now let's try with /root/pypi (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True /root/pypi changing volatile: True http://pypi.localdomain:8080/root/pypi: type=mirror bases= volatile=False Traceback (most recent call last): File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", line 29, in main return method(hub, hub.args) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 65, in main return index_modify(hub, url, kvdict) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 14, in index_modify index_show(hub, url) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 36, in index_show ixconfig["uploadtrigger_jenkins"],)) KeyError: 'uploadtrigger_jenkins' [87112 refs]
Hum... not very good :(
I've opened issue #82 https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile
Meanwhile, I will play with another index and let you know how it goes.
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 25/01/14 09:29, holger krekel wrote:
Hi Richard,
On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote: > Hello Holger, > > No, I haven't tried bandersnatch. > I think devpi is perfect for my workflow and I'm not willing to try > other things. ok.
> Sorry for not being very clear on my intentions. > My fault. I made the question more complicated than it should be. > > In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I > should load all those 38K packages into devpi cache. > I guess that I should make /root/pypi volatile and I should upload > package by package onto it. > Does it make sense? There are 38K projects but many more archive files. If you want a full mirror, that's around 40Gbytes of storage (and the according network traffic).
What i think makes more sense is to go for triggering all index pages by iterating over all projects and maybe get and thereby cache the first (highest version) archive files. You can get some ideas how to do that in the repository at server/extra/compare_devpi_server.py e.g. through the getnames() function.
> I'm basically confused about how devpi decides (or detects) that > eventually a package must be updated from PyPI since I've uploaded it by > hand. I'm not sure if this idea of uploading by hand would work well or > would eventually make devpii confused about when new updates should be > downloaded from PyPI. One you touched/retrieved all project index pages, devpi-server will auto-update all index pages for those index pages.
HTH, holger
> http://rgomes.info > http://www.linkedin.com/in/rgomes > mobile: +44(77)9955-6813 > inum <http://www.inum.net/>: +883(5100)0800-9804 > sip:r...@ippi.fr > > On 24/01/14 20:05, holger krekel wrote: >> On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote: >>> I've downloaded all packages from PyPI and I'd like to create a local >>> mirror using devpi. >> If you want to have a full non-lazy mirror, did you consider using >> bandersnatch? >> >>> I had the following idea: >>> >>> 1. create a new index say /root/pypimirror based on /root/pypi >>> >>> 2. upload the entire folder containing 38,000+ packages onto >>> /root/pypimirror >> This might fail if your system uses a 32K limit on directory entries. >> I've just fixed it for /root/pypi (not released yet) and i can also >> fix it for private indices. >> >>> 3. eventually setting /root/pypimirrot to NotVolatile (if this can be >>> done, somehow) >> You can change index volatility any time. >> >>> /myuser/myindex >>> +-- /root/pypimirror >>> +-- /root/pypi >>> >>> Another idea would be creating a "parallel" index to /root/pypi and >>> exposing a third index which derives from both. >>> >>> /myuser/myindex >>> +-- /root/pypi >>> +-- /root/pypimirror >>> >>> >>> Does this idea make sense? Is there a better way of doing it, in particular >>> without having to download everything again from PyPI? >> Could you state more clearly what you want to achieve in the first >> place? I can see a number of possible motivations but would like >> to understand your particular ones. (Did i mention that i was always >> very bad in school at understanding textual questions in math questions >> although others seemed to be able to guess the correct meaning of the >> question? :) >> >>> I have another concern: performance. Since 38,000+ packages implies on >>> large directories in the file system... do you think that devpi will have >>> troubles managing such amount of packages? >> see above. /root/pypi handles >32K fine on trunk. And private indices >> can also be made to do so. (So far there wasn't a use case for >32K >> private packages -- let's see if we have one here) >> >> best, >> holger >> >> >> >>> Thanks a lot, >>> >>> -- Richard >>> >>> -- >>> You received this message because you are subscribed to the Google Groups "devpi-dev" group. >>> To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. >>> To post to this group, send email to devp...@googlegroups.com. >>> Visit this group at http://groups.google.com/group/devpi-dev. >>> For more options, visit https://groups.google.com/groups/opt_out.
Hi Holger,
the >32K entries at this point only applies to /root/pypi.
True. I've forgotten that. Would it be easy to apply the change to all indexes? Regarding performance: keeping the current pace, I expect that the entire upload will take some more 18 days or so. Yes, it's very slow. In 10 hours it was able to upload some 900 packages. There are more 37K+ waiting! :) If you can implement the fix to all indexes in the next two weeks, I will be able to test it when my script fails. Apparently, I don't have to start from scratch. Apparently I can simply upload more packages from the point it stopped. Even if I had to upload everything from scratch... this is not a high price to pay at this point. Thanks Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr On 26/01/14 14:31, holger krekel wrote:
Hi Holger,
I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index. I've installed your most recent changes from bitbucket in regards to this subject. Let's see how it goes!
On Sun, Jan 26, 2014 at 14:29 +0000, Richard Gomes wrote: the >32K entries at this point only applies to /root/pypi.
Are you doing this only because you want to avoid forgetting about your downloads? I'm doing this because I had already my fair share of troubles with PyPI and I'd like to rule out details about whether PyPI is available or not. I simply cannot depend on external factors when dealing with systems in production. As you mentioned before, I could employ bandersnatch... but I definitely don't see reason for adopting more than one tool for exactly one purpose. Besides, I like the idea of having multiple indexes whilst developing applications. So, again, devpi fits the bill very well. I just need to upload those 38K+ packages in devpi once in a lifetime and probably never to think about this again. good luck then :)
holger
Cheers :)
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:06, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 13:55 +0000, Richard Gomes wrote:
Hello Holger,
Thanks a lot for your prompt answer. Yes, I understand that we'd better keep /root/pypi as it is.
I've created an inherited index based on /root/pypi, like shown below:
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypimirror http://pypi.localdomain:8080/root/pypimirror: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root
I'm currently uploading packages onto this index. I will let you know later how it goes in regards to difficulties I had and performance observed. I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index.
Am I correct to think that, if pip retrieves a packages which was updated on PyPI, it (pip) will get the updated package because /root/pypimirror inherits from /root/pypi which keeps itself in sync with PyPI? I think this should work, yes.
Are you doing this only because you want to avoid forgetting about your downloads?
best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 07:56, holger krekel wrote:
Hello Richard,
Hello Holger,
I understood the point about triggering all index pages. But I guess that it would download everything [again] from PyPI, correct? When a fresh devpi-server starts up, nothing is done except getting
On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote: the list of all projects from pypi.python.org. Only the triggering of index pages will cache them, and only accessing a specific archive file will cache that.
So, in order to avoid another wasteful full download, cos I've already done that... I'm trying to upload packages from the a local folder. If you want to upload pypi.python.org packages you have to do that into a private index and that will not be automatically kept up to date via the syncing mechanism. You would have to do any updates by hand. There is no way around that, volatile index or not. /root/pypi is an index where you cannot change properties like volatile. The error message could be better, agreed.
The only thing we could consider is adding an option to devpi-server that allows looking up archive files from a local directory (structure) before trying a remote operation. We know the md5 checksum and filename and so can match precisely with already downloaded files. But don't hold your breath on that.
cheers,
holger
Unless you tell me it is not going to work, I'd like to persist on this route. You know... I can take this chance to gain some mileage with devpi :)
First thing would be making /root/pypi volatile (if I understood properly!).
# let's play with /root/test first, just to see how it works (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False /root/test changing volatile: False http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=False uploadtrigger_jenkins=None acl_upload=root [87112 refs]
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True /root/test changing volatile: True http://pypi.localdomain:8080/root/test: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root [87112 refs]
# now let's try with /root/pypi (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True /root/pypi changing volatile: True http://pypi.localdomain:8080/root/pypi: type=mirror bases= volatile=False Traceback (most recent call last): File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", line 29, in main return method(hub, hub.args) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 65, in main return index_modify(hub, url, kvdict) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 14, in index_modify index_show(hub, url) File "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", line 36, in index_show ixconfig["uploadtrigger_jenkins"],)) KeyError: 'uploadtrigger_jenkins' [87112 refs]
Hum... not very good :(
I've opened issue #82 https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile
Meanwhile, I will play with another index and let you know how it goes.
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 25/01/14 09:29, holger krekel wrote: > Hi Richard, > > On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote: >> Hello Holger, >> >> No, I haven't tried bandersnatch. >> I think devpi is perfect for my workflow and I'm not willing to try >> other things. > ok. > >> Sorry for not being very clear on my intentions. >> My fault. I made the question more complicated than it should be. >> >> In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I >> should load all those 38K packages into devpi cache. >> I guess that I should make /root/pypi volatile and I should upload >> package by package onto it. >> Does it make sense? > There are 38K projects but many more archive files. If you want a full > mirror, that's around 40Gbytes of storage (and the according network traffic). > > What i think makes more sense is to go for triggering all index pages > by iterating over all projects and maybe get and thereby cache the first > (highest version) archive files. You can get some ideas how to do that > in the repository at server/extra/compare_devpi_server.py e.g. through > the getnames() function. > >> I'm basically confused about how devpi decides (or detects) that >> eventually a package must be updated from PyPI since I've uploaded it by >> hand. I'm not sure if this idea of uploading by hand would work well or >> would eventually make devpii confused about when new updates should be >> downloaded from PyPI. > One you touched/retrieved all project index pages, devpi-server will > auto-update all index pages for those index pages. > > HTH, > holger > > >> http://rgomes.info >> http://www.linkedin.com/in/rgomes >> mobile: +44(77)9955-6813 >> inum <http://www.inum.net/>: +883(5100)0800-9804 >> sip:r...@ippi.fr >> >> On 24/01/14 20:05, holger krekel wrote: >>> On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote: >>>> I've downloaded all packages from PyPI and I'd like to create a local >>>> mirror using devpi. >>> If you want to have a full non-lazy mirror, did you consider using >>> bandersnatch? >>> >>>> I had the following idea: >>>> >>>> 1. create a new index say /root/pypimirror based on /root/pypi >>>> >>>> 2. upload the entire folder containing 38,000+ packages onto >>>> /root/pypimirror >>> This might fail if your system uses a 32K limit on directory entries. >>> I've just fixed it for /root/pypi (not released yet) and i can also >>> fix it for private indices. >>> >>>> 3. eventually setting /root/pypimirrot to NotVolatile (if this can be >>>> done, somehow) >>> You can change index volatility any time. >>> >>>> /myuser/myindex >>>> +-- /root/pypimirror >>>> +-- /root/pypi >>>> >>>> Another idea would be creating a "parallel" index to /root/pypi and >>>> exposing a third index which derives from both. >>>> >>>> /myuser/myindex >>>> +-- /root/pypi >>>> +-- /root/pypimirror >>>> >>>> >>>> Does this idea make sense? Is there a better way of doing it, in particular >>>> without having to download everything again from PyPI? >>> Could you state more clearly what you want to achieve in the first >>> place? I can see a number of possible motivations but would like >>> to understand your particular ones. (Did i mention that i was always >>> very bad in school at understanding textual questions in math questions >>> although others seemed to be able to guess the correct meaning of the >>> question? :) >>> >>>> I have another concern: performance. Since 38,000+ packages implies on >>>> large directories in the file system... do you think that devpi will have >>>> troubles managing such amount of packages? >>> see above. /root/pypi handles >32K fine on trunk. And private indices >>> can also be made to do so. (So far there wasn't a use case for >32K >>> private packages -- let's see if we have one here) >>> >>> best, >>> holger >>> >>> >>> >>>> Thanks a lot, >>>> >>>> -- Richard >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups "devpi-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. >>>> To post to this group, send email to devp...@googlegroups.com. >>>> Visit this group at http://groups.google.com/group/devpi-dev. >>>> For more options, visit https://groups.google.com/groups/opt_out.
hi Richard, On Sun, Jan 26, 2014 at 14:48 +0000, Richard Gomes wrote:
Hi Holger,
the >32K entries at this point only applies to /root/pypi.
True. I've forgotten that. Would it be easy to apply the change to all indexes?
Not too hard, i guess, but it would mean we need a major version increase in devpi-server because the underlying storage layout changes as well.
Regarding performance: keeping the current pace, I expect that the entire upload will take some more 18 days or so. Yes, it's very slow. In 10 hours it was able to upload some 900 packages. There are more 37K+ waiting! :)
It checks the pypi.python.org index pages for each upload, i am afraid. Not sure if this can be optimized. 18 days seems a bit very slow, though.
If you can implement the fix to all indexes in the next two weeks, I will be able to test it when my script fails. Apparently, I don't have to start from scratch. Apparently I can simply upload more packages from the point it stopped. Even if I had to upload everything from scratch... this is not a high price to pay at this point.
I am wondering then why you don't simply go for a little scraper script that touches all index /root/pypi and the 3-or-so top-most archive files so that you have an archive for each project. This is probably faster than 18 days and works with devpi-server today. best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:31, holger krekel wrote:
Hi Holger,
I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index. I've installed your most recent changes from bitbucket in regards to this subject. Let's see how it goes!
On Sun, Jan 26, 2014 at 14:29 +0000, Richard Gomes wrote: the >32K entries at this point only applies to /root/pypi.
Are you doing this only because you want to avoid forgetting about your downloads? I'm doing this because I had already my fair share of troubles with PyPI and I'd like to rule out details about whether PyPI is available or not. I simply cannot depend on external factors when dealing with systems in production. As you mentioned before, I could employ bandersnatch... but I definitely don't see reason for adopting more than one tool for exactly one purpose. Besides, I like the idea of having multiple indexes whilst developing applications. So, again, devpi fits the bill very well. I just need to upload those 38K+ packages in devpi once in a lifetime and probably never to think about this again. good luck then :)
holger
Cheers :)
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:06, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 13:55 +0000, Richard Gomes wrote:
Hello Holger,
Thanks a lot for your prompt answer. Yes, I understand that we'd better keep /root/pypi as it is.
I've created an inherited index based on /root/pypi, like shown below:
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypimirror http://pypi.localdomain:8080/root/pypimirror: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root
I'm currently uploading packages onto this index. I will let you know later how it goes in regards to difficulties I had and performance observed. I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index.
Am I correct to think that, if pip retrieves a packages which was updated on PyPI, it (pip) will get the updated package because /root/pypimirror inherits from /root/pypi which keeps itself in sync with PyPI? I think this should work, yes.
Are you doing this only because you want to avoid forgetting about your downloads?
best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 07:56, holger krekel wrote:
Hello Richard,
On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote: > Hello Holger, > > I understood the point about triggering all index pages. > But I guess that it would download everything [again] from PyPI, correct? When a fresh devpi-server starts up, nothing is done except getting the list of all projects from pypi.python.org. Only the triggering of index pages will cache them, and only accessing a specific archive file will cache that.
> So, in order to avoid another wasteful full download, cos I've already > done that... I'm trying to upload packages from the a local folder. If you want to upload pypi.python.org packages you have to do that into a private index and that will not be automatically kept up to date via the syncing mechanism. You would have to do any updates by hand. There is no way around that, volatile index or not. /root/pypi is an index where you cannot change properties like volatile. The error message could be better, agreed.
The only thing we could consider is adding an option to devpi-server that allows looking up archive files from a local directory (structure) before trying a remote operation. We know the md5 checksum and filename and so can match precisely with already downloaded files. But don't hold your breath on that.
cheers,
holger
> Unless you tell me it is not going to work, I'd like to persist on this > route. You know... I can take this chance to gain some mileage with devpi :) > > First thing would be making /root/pypi volatile (if I understood properly!). > > > # let's play with /root/test first, just to see how it works > (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False > /root/test changing volatile: False > http://pypi.localdomain:8080/root/test: > type=stage > bases=root/pypi > volatile=False > uploadtrigger_jenkins=None > acl_upload=root > [87112 refs] > > (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True > /root/test changing volatile: True > http://pypi.localdomain:8080/root/test: > type=stage > bases=root/pypi > volatile=True > uploadtrigger_jenkins=None > acl_upload=root > [87112 refs] > > # now let's try with /root/pypi > (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True > /root/pypi changing volatile: True > http://pypi.localdomain:8080/root/pypi: > type=mirror > bases= > volatile=False > Traceback (most recent call last): > File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> > load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() > File > "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", > line 29, in main > return method(hub, hub.args) > File > "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", > line 65, in main > return index_modify(hub, url, kvdict) > File > "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", > line 14, in index_modify > index_show(hub, url) > File > "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", > line 36, in index_show > ixconfig["uploadtrigger_jenkins"],)) > KeyError: 'uploadtrigger_jenkins' > [87112 refs] > > Hum... not very good :( > > I've opened issue #82 > https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile > > Meanwhile, I will play with another index and let you know how it goes. > > Thanks > > Richard Gomes > http://rgomes.info > http://www.linkedin.com/in/rgomes > mobile: +44(77)9955-6813 > inum <http://www.inum.net/>: +883(5100)0800-9804 > sip:r...@ippi.fr > > On 25/01/14 09:29, holger krekel wrote: >> Hi Richard, >> >> On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote: >>> Hello Holger, >>> >>> No, I haven't tried bandersnatch. >>> I think devpi is perfect for my workflow and I'm not willing to try >>> other things. >> ok. >> >>> Sorry for not being very clear on my intentions. >>> My fault. I made the question more complicated than it should be. >>> >>> In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I >>> should load all those 38K packages into devpi cache. >>> I guess that I should make /root/pypi volatile and I should upload >>> package by package onto it. >>> Does it make sense? >> There are 38K projects but many more archive files. If you want a full >> mirror, that's around 40Gbytes of storage (and the according network traffic). >> >> What i think makes more sense is to go for triggering all index pages >> by iterating over all projects and maybe get and thereby cache the first >> (highest version) archive files. You can get some ideas how to do that >> in the repository at server/extra/compare_devpi_server.py e.g. through >> the getnames() function. >> >>> I'm basically confused about how devpi decides (or detects) that >>> eventually a package must be updated from PyPI since I've uploaded it by >>> hand. I'm not sure if this idea of uploading by hand would work well or >>> would eventually make devpii confused about when new updates should be >>> downloaded from PyPI. >> One you touched/retrieved all project index pages, devpi-server will >> auto-update all index pages for those index pages. >> >> HTH, >> holger >> >> >>> http://rgomes.info >>> http://www.linkedin.com/in/rgomes >>> mobile: +44(77)9955-6813 >>> inum <http://www.inum.net/>: +883(5100)0800-9804 >>> sip:r...@ippi.fr >>> >>> On 24/01/14 20:05, holger krekel wrote: >>>> On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote: >>>>> I've downloaded all packages from PyPI and I'd like to create a local >>>>> mirror using devpi. >>>> If you want to have a full non-lazy mirror, did you consider using >>>> bandersnatch? >>>> >>>>> I had the following idea: >>>>> >>>>> 1. create a new index say /root/pypimirror based on /root/pypi >>>>> >>>>> 2. upload the entire folder containing 38,000+ packages onto >>>>> /root/pypimirror >>>> This might fail if your system uses a 32K limit on directory entries. >>>> I've just fixed it for /root/pypi (not released yet) and i can also >>>> fix it for private indices. >>>> >>>>> 3. eventually setting /root/pypimirrot to NotVolatile (if this can be >>>>> done, somehow) >>>> You can change index volatility any time. >>>> >>>>> /myuser/myindex >>>>> +-- /root/pypimirror >>>>> +-- /root/pypi >>>>> >>>>> Another idea would be creating a "parallel" index to /root/pypi and >>>>> exposing a third index which derives from both. >>>>> >>>>> /myuser/myindex >>>>> +-- /root/pypi >>>>> +-- /root/pypimirror >>>>> >>>>> >>>>> Does this idea make sense? Is there a better way of doing it, in particular >>>>> without having to download everything again from PyPI? >>>> Could you state more clearly what you want to achieve in the first >>>> place? I can see a number of possible motivations but would like >>>> to understand your particular ones. (Did i mention that i was always >>>> very bad in school at understanding textual questions in math questions >>>> although others seemed to be able to guess the correct meaning of the >>>> question? :) >>>> >>>>> I have another concern: performance. Since 38,000+ packages implies on >>>>> large directories in the file system... do you think that devpi will have >>>>> troubles managing such amount of packages? >>>> see above. /root/pypi handles >32K fine on trunk. And private indices >>>> can also be made to do so. (So far there wasn't a use case for >32K >>>> private packages -- let's see if we have one here) >>>> >>>> best, >>>> holger >>>> >>>> >>>> >>>>> Thanks a lot, >>>>> >>>>> -- Richard >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups "devpi-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. >>>>> To post to this group, send email to devp...@googlegroups.com. >>>>> Visit this group at http://groups.google.com/group/devpi-dev. >>>>> For more options, visit https://groups.google.com/groups/opt_out.
Hi Holger,
I am wondering then why you don't simply go for a little scraper script that touches all index /root/pypi and the 3-or-so top-most archive files so that you have an archive for each project. This is probably faster than 18 days and works with devpi-server today.
If you meant that I should have a workable index for the moment, for my most prominent needs... yeah! I have already another instance of devpi-server running on another port, on top of another repository which is fully functional and serving my current development process. That's why I can wait 20 days or even 60 days until I had PyPI fully mirrored in devpi. Thanks :) Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr On 26/01/14 14:52, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 14:48 +0000, Richard Gomes wrote:
Hi Holger,
the >32K entries at this point only applies to /root/pypi. True. I've forgotten that. Would it be easy to apply the change to all indexes? Not too hard, i guess, but it would mean we need a major version increase in devpi-server because the underlying storage layout changes as well.
Regarding performance: keeping the current pace, I expect that the entire upload will take some more 18 days or so. Yes, it's very slow. In 10 hours it was able to upload some 900 packages. There are more 37K+ waiting! :) It checks the pypi.python.org index pages for each upload, i am afraid. Not sure if this can be optimized. 18 days seems a bit very slow, though.
If you can implement the fix to all indexes in the next two weeks, I will be able to test it when my script fails. Apparently, I don't have to start from scratch. Apparently I can simply upload more packages from the point it stopped. Even if I had to upload everything from scratch... this is not a high price to pay at this point. I am wondering then why you don't simply go for a little scraper script that touches all index /root/pypi and the 3-or-so top-most archive files so that you have an archive for each project. This is probably faster than 18 days and works with devpi-server today.
best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:31, holger krekel wrote:
Hi Holger,
I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index. I've installed your most recent changes from bitbucket in regards to this subject. Let's see how it goes!
On Sun, Jan 26, 2014 at 14:29 +0000, Richard Gomes wrote: the >32K entries at this point only applies to /root/pypi.
Are you doing this only because you want to avoid forgetting about your downloads? I'm doing this because I had already my fair share of troubles with PyPI and I'd like to rule out details about whether PyPI is available or not. I simply cannot depend on external factors when dealing with systems in production. As you mentioned before, I could employ bandersnatch... but I definitely don't see reason for adopting more than one tool for exactly one purpose. Besides, I like the idea of having multiple indexes whilst developing applications. So, again, devpi fits the bill very well. I just need to upload those 38K+ packages in devpi once in a lifetime and probably never to think about this again. good luck then :)
holger
Cheers :)
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:06, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 13:55 +0000, Richard Gomes wrote:
Hello Holger,
Thanks a lot for your prompt answer. Yes, I understand that we'd better keep /root/pypi as it is.
I've created an inherited index based on /root/pypi, like shown below:
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypimirror http://pypi.localdomain:8080/root/pypimirror: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root
I'm currently uploading packages onto this index. I will let you know later how it goes in regards to difficulties I had and performance observed. I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index.
Am I correct to think that, if pip retrieves a packages which was updated on PyPI, it (pip) will get the updated package because /root/pypimirror inherits from /root/pypi which keeps itself in sync with PyPI? I think this should work, yes.
Are you doing this only because you want to avoid forgetting about your downloads?
best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 07:56, holger krekel wrote: > Hello Richard, > > On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote: >> Hello Holger, >> >> I understood the point about triggering all index pages. >> But I guess that it would download everything [again] from PyPI, correct? > When a fresh devpi-server starts up, nothing is done except getting > the list of all projects from pypi.python.org. Only the triggering > of index pages will cache them, and only accessing a specific archive > file will cache that. > >> So, in order to avoid another wasteful full download, cos I've already >> done that... I'm trying to upload packages from the a local folder. > If you want to upload pypi.python.org packages you have to do that > into a private index and that will not be automatically kept up to > date via the syncing mechanism. You would have to do any updates > by hand. There is no way around that, volatile index or not. > /root/pypi is an index where you cannot change properties like volatile. > The error message could be better, agreed. > > The only thing we could consider is adding an option to devpi-server > that allows looking up archive files from a local directory (structure) > before trying a remote operation. We know the md5 checksum and filename > and so can match precisely with already downloaded files. > But don't hold your breath on that. > > cheers, > > holger > >> Unless you tell me it is not going to work, I'd like to persist on this >> route. You know... I can take this chance to gain some mileage with devpi :) >> >> First thing would be making /root/pypi volatile (if I understood properly!). >> >> >> # let's play with /root/test first, just to see how it works >> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False >> /root/test changing volatile: False >> http://pypi.localdomain:8080/root/test: >> type=stage >> bases=root/pypi >> volatile=False >> uploadtrigger_jenkins=None >> acl_upload=root >> [87112 refs] >> >> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True >> /root/test changing volatile: True >> http://pypi.localdomain:8080/root/test: >> type=stage >> bases=root/pypi >> volatile=True >> uploadtrigger_jenkins=None >> acl_upload=root >> [87112 refs] >> >> # now let's try with /root/pypi >> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True >> /root/pypi changing volatile: True >> http://pypi.localdomain:8080/root/pypi: >> type=mirror >> bases= >> volatile=False >> Traceback (most recent call last): >> File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> >> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() >> File >> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", >> line 29, in main >> return method(hub, hub.args) >> File >> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >> line 65, in main >> return index_modify(hub, url, kvdict) >> File >> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >> line 14, in index_modify >> index_show(hub, url) >> File >> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >> line 36, in index_show >> ixconfig["uploadtrigger_jenkins"],)) >> KeyError: 'uploadtrigger_jenkins' >> [87112 refs] >> >> Hum... not very good :( >> >> I've opened issue #82 >> https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile >> >> Meanwhile, I will play with another index and let you know how it goes. >> >> Thanks >> >> Richard Gomes >> http://rgomes.info >> http://www.linkedin.com/in/rgomes >> mobile: +44(77)9955-6813 >> inum <http://www.inum.net/>: +883(5100)0800-9804 >> sip:r...@ippi.fr >> >> On 25/01/14 09:29, holger krekel wrote: >>> Hi Richard, >>> >>> On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote: >>>> Hello Holger, >>>> >>>> No, I haven't tried bandersnatch. >>>> I think devpi is perfect for my workflow and I'm not willing to try >>>> other things. >>> ok. >>> >>>> Sorry for not being very clear on my intentions. >>>> My fault. I made the question more complicated than it should be. >>>> >>>> In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I >>>> should load all those 38K packages into devpi cache. >>>> I guess that I should make /root/pypi volatile and I should upload >>>> package by package onto it. >>>> Does it make sense? >>> There are 38K projects but many more archive files. If you want a full >>> mirror, that's around 40Gbytes of storage (and the according network traffic). >>> >>> What i think makes more sense is to go for triggering all index pages >>> by iterating over all projects and maybe get and thereby cache the first >>> (highest version) archive files. You can get some ideas how to do that >>> in the repository at server/extra/compare_devpi_server.py e.g. through >>> the getnames() function. >>> >>>> I'm basically confused about how devpi decides (or detects) that >>>> eventually a package must be updated from PyPI since I've uploaded it by >>>> hand. I'm not sure if this idea of uploading by hand would work well or >>>> would eventually make devpii confused about when new updates should be >>>> downloaded from PyPI. >>> One you touched/retrieved all project index pages, devpi-server will >>> auto-update all index pages for those index pages. >>> >>> HTH, >>> holger >>> >>> >>>> http://rgomes.info >>>> http://www.linkedin.com/in/rgomes >>>> mobile: +44(77)9955-6813 >>>> inum <http://www.inum.net/>: +883(5100)0800-9804 >>>> sip:r...@ippi.fr >>>> >>>> On 24/01/14 20:05, holger krekel wrote: >>>>> On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote: >>>>>> I've downloaded all packages from PyPI and I'd like to create a local >>>>>> mirror using devpi. >>>>> If you want to have a full non-lazy mirror, did you consider using >>>>> bandersnatch? >>>>> >>>>>> I had the following idea: >>>>>> >>>>>> 1. create a new index say /root/pypimirror based on /root/pypi >>>>>> >>>>>> 2. upload the entire folder containing 38,000+ packages onto >>>>>> /root/pypimirror >>>>> This might fail if your system uses a 32K limit on directory entries. >>>>> I've just fixed it for /root/pypi (not released yet) and i can also >>>>> fix it for private indices. >>>>> >>>>>> 3. eventually setting /root/pypimirrot to NotVolatile (if this can be >>>>>> done, somehow) >>>>> You can change index volatility any time. >>>>> >>>>>> /myuser/myindex >>>>>> +-- /root/pypimirror >>>>>> +-- /root/pypi >>>>>> >>>>>> Another idea would be creating a "parallel" index to /root/pypi and >>>>>> exposing a third index which derives from both. >>>>>> >>>>>> /myuser/myindex >>>>>> +-- /root/pypi >>>>>> +-- /root/pypimirror >>>>>> >>>>>> >>>>>> Does this idea make sense? Is there a better way of doing it, in particular >>>>>> without having to download everything again from PyPI? >>>>> Could you state more clearly what you want to achieve in the first >>>>> place? I can see a number of possible motivations but would like >>>>> to understand your particular ones. (Did i mention that i was always >>>>> very bad in school at understanding textual questions in math questions >>>>> although others seemed to be able to guess the correct meaning of the >>>>> question? :) >>>>> >>>>>> I have another concern: performance. Since 38,000+ packages implies on >>>>>> large directories in the file system... do you think that devpi will have >>>>>> troubles managing such amount of packages? >>>>> see above. /root/pypi handles >32K fine on trunk. And private indices >>>>> can also be made to do so. (So far there wasn't a use case for >32K >>>>> private packages -- let's see if we have one here) >>>>> >>>>> best, >>>>> holger >>>>> >>>>> >>>>> >>>>>> Thanks a lot, >>>>>> >>>>>> -- Richard >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google Groups "devpi-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. >>>>>> To post to this group, send email to devp...@googlegroups.com. >>>>>> Visit this group at http://groups.google.com/group/devpi-dev. >>>>>> For more options, visit https://groups.google.com/groups/opt_out.
Hello Holger,
Not too hard, i guess, but it would mean we need a major version increase in devpi-server because the underlying storage layout changes as well.
Maybe the underlying storage layout could be discovered dynamically. This way it would be possible to keep old indexes in the current format whilst newly created indexes could adopt a new format. If you think this can be done, please gimme some pointers and I will change my working copy of devpi and let you know how it goes. Thanks Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr On 26/01/14 14:52, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 14:48 +0000, Richard Gomes wrote:
Hi Holger,
the >32K entries at this point only applies to /root/pypi. True. I've forgotten that. Would it be easy to apply the change to all indexes? Not too hard, i guess, but it would mean we need a major version increase in devpi-server because the underlying storage layout changes as well.
Regarding performance: keeping the current pace, I expect that the entire upload will take some more 18 days or so. Yes, it's very slow. In 10 hours it was able to upload some 900 packages. There are more 37K+ waiting! :) It checks the pypi.python.org index pages for each upload, i am afraid. Not sure if this can be optimized. 18 days seems a bit very slow, though.
If you can implement the fix to all indexes in the next two weeks, I will be able to test it when my script fails. Apparently, I don't have to start from scratch. Apparently I can simply upload more packages from the point it stopped. Even if I had to upload everything from scratch... this is not a high price to pay at this point. I am wondering then why you don't simply go for a little scraper script that touches all index /root/pypi and the 3-or-so top-most archive files so that you have an archive for each project. This is probably faster than 18 days and works with devpi-server today.
best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:31, holger krekel wrote:
Hi Holger,
I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index. I've installed your most recent changes from bitbucket in regards to this subject. Let's see how it goes!
On Sun, Jan 26, 2014 at 14:29 +0000, Richard Gomes wrote: the >32K entries at this point only applies to /root/pypi.
Are you doing this only because you want to avoid forgetting about your downloads? I'm doing this because I had already my fair share of troubles with PyPI and I'd like to rule out details about whether PyPI is available or not. I simply cannot depend on external factors when dealing with systems in production. As you mentioned before, I could employ bandersnatch... but I definitely don't see reason for adopting more than one tool for exactly one purpose. Besides, I like the idea of having multiple indexes whilst developing applications. So, again, devpi fits the bill very well. I just need to upload those 38K+ packages in devpi once in a lifetime and probably never to think about this again. good luck then :)
holger
Cheers :)
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:06, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 13:55 +0000, Richard Gomes wrote:
Hello Holger,
Thanks a lot for your prompt answer. Yes, I understand that we'd better keep /root/pypi as it is.
I've created an inherited index based on /root/pypi, like shown below:
(py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypimirror http://pypi.localdomain:8080/root/pypimirror: type=stage bases=root/pypi volatile=True uploadtrigger_jenkins=None acl_upload=root
I'm currently uploading packages onto this index. I will let you know later how it goes in regards to difficulties I had and performance observed. I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index.
Am I correct to think that, if pip retrieves a packages which was updated on PyPI, it (pip) will get the updated package because /root/pypimirror inherits from /root/pypi which keeps itself in sync with PyPI? I think this should work, yes.
Are you doing this only because you want to avoid forgetting about your downloads?
best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 07:56, holger krekel wrote: > Hello Richard, > > On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote: >> Hello Holger, >> >> I understood the point about triggering all index pages. >> But I guess that it would download everything [again] from PyPI, correct? > When a fresh devpi-server starts up, nothing is done except getting > the list of all projects from pypi.python.org. Only the triggering > of index pages will cache them, and only accessing a specific archive > file will cache that. > >> So, in order to avoid another wasteful full download, cos I've already >> done that... I'm trying to upload packages from the a local folder. > If you want to upload pypi.python.org packages you have to do that > into a private index and that will not be automatically kept up to > date via the syncing mechanism. You would have to do any updates > by hand. There is no way around that, volatile index or not. > /root/pypi is an index where you cannot change properties like volatile. > The error message could be better, agreed. > > The only thing we could consider is adding an option to devpi-server > that allows looking up archive files from a local directory (structure) > before trying a remote operation. We know the md5 checksum and filename > and so can match precisely with already downloaded files. > But don't hold your breath on that. > > cheers, > > holger > >> Unless you tell me it is not going to work, I'd like to persist on this >> route. You know... I can take this chance to gain some mileage with devpi :) >> >> First thing would be making /root/pypi volatile (if I understood properly!). >> >> >> # let's play with /root/test first, just to see how it works >> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False >> /root/test changing volatile: False >> http://pypi.localdomain:8080/root/test: >> type=stage >> bases=root/pypi >> volatile=False >> uploadtrigger_jenkins=None >> acl_upload=root >> [87112 refs] >> >> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True >> /root/test changing volatile: True >> http://pypi.localdomain:8080/root/test: >> type=stage >> bases=root/pypi >> volatile=True >> uploadtrigger_jenkins=None >> acl_upload=root >> [87112 refs] >> >> # now let's try with /root/pypi >> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True >> /root/pypi changing volatile: True >> http://pypi.localdomain:8080/root/pypi: >> type=mirror >> bases= >> volatile=False >> Traceback (most recent call last): >> File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> >> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() >> File >> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", >> line 29, in main >> return method(hub, hub.args) >> File >> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >> line 65, in main >> return index_modify(hub, url, kvdict) >> File >> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >> line 14, in index_modify >> index_show(hub, url) >> File >> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >> line 36, in index_show >> ixconfig["uploadtrigger_jenkins"],)) >> KeyError: 'uploadtrigger_jenkins' >> [87112 refs] >> >> Hum... not very good :( >> >> I've opened issue #82 >> https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile >> >> Meanwhile, I will play with another index and let you know how it goes. >> >> Thanks >> >> Richard Gomes >> http://rgomes.info >> http://www.linkedin.com/in/rgomes >> mobile: +44(77)9955-6813 >> inum <http://www.inum.net/>: +883(5100)0800-9804 >> sip:r...@ippi.fr >> >> On 25/01/14 09:29, holger krekel wrote: >>> Hi Richard, >>> >>> On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote: >>>> Hello Holger, >>>> >>>> No, I haven't tried bandersnatch. >>>> I think devpi is perfect for my workflow and I'm not willing to try >>>> other things. >>> ok. >>> >>>> Sorry for not being very clear on my intentions. >>>> My fault. I made the question more complicated than it should be. >>>> >>>> In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I >>>> should load all those 38K packages into devpi cache. >>>> I guess that I should make /root/pypi volatile and I should upload >>>> package by package onto it. >>>> Does it make sense? >>> There are 38K projects but many more archive files. If you want a full >>> mirror, that's around 40Gbytes of storage (and the according network traffic). >>> >>> What i think makes more sense is to go for triggering all index pages >>> by iterating over all projects and maybe get and thereby cache the first >>> (highest version) archive files. You can get some ideas how to do that >>> in the repository at server/extra/compare_devpi_server.py e.g. through >>> the getnames() function. >>> >>>> I'm basically confused about how devpi decides (or detects) that >>>> eventually a package must be updated from PyPI since I've uploaded it by >>>> hand. I'm not sure if this idea of uploading by hand would work well or >>>> would eventually make devpii confused about when new updates should be >>>> downloaded from PyPI. >>> One you touched/retrieved all project index pages, devpi-server will >>> auto-update all index pages for those index pages. >>> >>> HTH, >>> holger >>> >>> >>>> http://rgomes.info >>>> http://www.linkedin.com/in/rgomes >>>> mobile: +44(77)9955-6813 >>>> inum <http://www.inum.net/>: +883(5100)0800-9804 >>>> sip:r...@ippi.fr >>>> >>>> On 24/01/14 20:05, holger krekel wrote: >>>>> On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote: >>>>>> I've downloaded all packages from PyPI and I'd like to create a local >>>>>> mirror using devpi. >>>>> If you want to have a full non-lazy mirror, did you consider using >>>>> bandersnatch? >>>>> >>>>>> I had the following idea: >>>>>> >>>>>> 1. create a new index say /root/pypimirror based on /root/pypi >>>>>> >>>>>> 2. upload the entire folder containing 38,000+ packages onto >>>>>> /root/pypimirror >>>>> This might fail if your system uses a 32K limit on directory entries. >>>>> I've just fixed it for /root/pypi (not released yet) and i can also >>>>> fix it for private indices. >>>>> >>>>>> 3. eventually setting /root/pypimirrot to NotVolatile (if this can be >>>>>> done, somehow) >>>>> You can change index volatility any time. >>>>> >>>>>> /myuser/myindex >>>>>> +-- /root/pypimirror >>>>>> +-- /root/pypi >>>>>> >>>>>> Another idea would be creating a "parallel" index to /root/pypi and >>>>>> exposing a third index which derives from both. >>>>>> >>>>>> /myuser/myindex >>>>>> +-- /root/pypi >>>>>> +-- /root/pypimirror >>>>>> >>>>>> >>>>>> Does this idea make sense? Is there a better way of doing it, in particular >>>>>> without having to download everything again from PyPI? >>>>> Could you state more clearly what you want to achieve in the first >>>>> place? I can see a number of possible motivations but would like >>>>> to understand your particular ones. (Did i mention that i was always >>>>> very bad in school at understanding textual questions in math questions >>>>> although others seemed to be able to guess the correct meaning of the >>>>> question? :) >>>>> >>>>>> I have another concern: performance. Since 38,000+ packages implies on >>>>>> large directories in the file system... do you think that devpi will have >>>>>> troubles managing such amount of packages? >>>>> see above. /root/pypi handles >32K fine on trunk. And private indices >>>>> can also be made to do so. (So far there wasn't a use case for >32K >>>>> private packages -- let's see if we have one here) >>>>> >>>>> best, >>>>> holger >>>>> >>>>> >>>>> >>>>>> Thanks a lot, >>>>>> >>>>>> -- Richard >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google Groups "devpi-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. >>>>>> To post to this group, send email to devp...@googlegroups.com. >>>>>> Visit this group at http://groups.google.com/group/devpi-dev. >>>>>> For more options, visit https://groups.google.com/groups/opt_out.
Hi Richard, On Sun, Jan 26, 2014 at 22:31 +0000, Richard Gomes wrote:
Hello Holger,
Not too hard, i guess, but it would mean we need a major version increase in devpi-server because the underlying storage layout changes as well.
Maybe the underlying storage layout could be discovered dynamically. This way it would be possible to keep old indexes in the current format whilst newly created indexes could adopt a new format. If you think this can be done, please gimme some pointers and I will change my working copy of devpi and let you know how it goes.
We could make the code more flexible to work with different storage layouts at the same time but i don't think it's worth the effort nor the resulting complexity. There already is a version-upgrade mechanism and updating the layout for private indexes is thus relatively straight forward. If you want to help you could look into how it was done for /root/pypi (one of the last commits) and do something similar for private indexes. Requires some source code reading, though. cheers, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:52, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 14:48 +0000, Richard Gomes wrote:
Hi Holger,
the >32K entries at this point only applies to /root/pypi. True. I've forgotten that. Would it be easy to apply the change to all indexes? Not too hard, i guess, but it would mean we need a major version increase in devpi-server because the underlying storage layout changes as well.
Regarding performance: keeping the current pace, I expect that the entire upload will take some more 18 days or so. Yes, it's very slow. In 10 hours it was able to upload some 900 packages. There are more 37K+ waiting! :) It checks the pypi.python.org index pages for each upload, i am afraid. Not sure if this can be optimized. 18 days seems a bit very slow, though.
If you can implement the fix to all indexes in the next two weeks, I will be able to test it when my script fails. Apparently, I don't have to start from scratch. Apparently I can simply upload more packages from the point it stopped. Even if I had to upload everything from scratch... this is not a high price to pay at this point. I am wondering then why you don't simply go for a little scraper script that touches all index /root/pypi and the 3-or-so top-most archive files so that you have an archive for each project. This is probably faster than 18 days and works with devpi-server today.
best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:31, holger krekel wrote:
Hi Holger,
I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index. I've installed your most recent changes from bitbucket in regards to this subject. Let's see how it goes!
On Sun, Jan 26, 2014 at 14:29 +0000, Richard Gomes wrote: the >32K entries at this point only applies to /root/pypi.
Are you doing this only because you want to avoid forgetting about your downloads? I'm doing this because I had already my fair share of troubles with PyPI and I'd like to rule out details about whether PyPI is available or not. I simply cannot depend on external factors when dealing with systems in production. As you mentioned before, I could employ bandersnatch... but I definitely don't see reason for adopting more than one tool for exactly one purpose. Besides, I like the idea of having multiple indexes whilst developing applications. So, again, devpi fits the bill very well. I just need to upload those 38K+ packages in devpi once in a lifetime and probably never to think about this again. good luck then :)
holger
Cheers :)
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:06, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 13:55 +0000, Richard Gomes wrote: > Hello Holger, > > Thanks a lot for your prompt answer. > Yes, I understand that we'd better keep /root/pypi as it is. > > I've created an inherited index based on /root/pypi, like shown below: > > (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypimirror > http://pypi.localdomain:8080/root/pypimirror: > type=stage > bases=root/pypi > volatile=True > uploadtrigger_jenkins=None > acl_upload=root > > I'm currently uploading packages onto this index. > I will let you know later how it goes in regards to difficulties I had > and performance observed. I think you might run into a 32K entries per directory restriction when uploading so many packages to a private index.
> Am I correct to think that, if pip retrieves a packages which was > updated on PyPI, it (pip) will get the updated package because > /root/pypimirror inherits from /root/pypi which keeps itself in sync > with PyPI? I think this should work, yes.
Are you doing this only because you want to avoid forgetting about your downloads?
best, holger
> Thanks > > Richard Gomes > http://rgomes.info > http://www.linkedin.com/in/rgomes > mobile: +44(77)9955-6813 > inum <http://www.inum.net/>: +883(5100)0800-9804 > sip:r...@ippi.fr > > On 26/01/14 07:56, holger krekel wrote: >> Hello Richard, >> >> On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote: >>> Hello Holger, >>> >>> I understood the point about triggering all index pages. >>> But I guess that it would download everything [again] from PyPI, correct? >> When a fresh devpi-server starts up, nothing is done except getting >> the list of all projects from pypi.python.org. Only the triggering >> of index pages will cache them, and only accessing a specific archive >> file will cache that. >> >>> So, in order to avoid another wasteful full download, cos I've already >>> done that... I'm trying to upload packages from the a local folder. >> If you want to upload pypi.python.org packages you have to do that >> into a private index and that will not be automatically kept up to >> date via the syncing mechanism. You would have to do any updates >> by hand. There is no way around that, volatile index or not. >> /root/pypi is an index where you cannot change properties like volatile. >> The error message could be better, agreed. >> >> The only thing we could consider is adding an option to devpi-server >> that allows looking up archive files from a local directory (structure) >> before trying a remote operation. We know the md5 checksum and filename >> and so can match precisely with already downloaded files. >> But don't hold your breath on that. >> >> cheers, >> >> holger >> >>> Unless you tell me it is not going to work, I'd like to persist on this >>> route. You know... I can take this chance to gain some mileage with devpi :) >>> >>> First thing would be making /root/pypi volatile (if I understood properly!). >>> >>> >>> # let's play with /root/test first, just to see how it works >>> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False >>> /root/test changing volatile: False >>> http://pypi.localdomain:8080/root/test: >>> type=stage >>> bases=root/pypi >>> volatile=False >>> uploadtrigger_jenkins=None >>> acl_upload=root >>> [87112 refs] >>> >>> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True >>> /root/test changing volatile: True >>> http://pypi.localdomain:8080/root/test: >>> type=stage >>> bases=root/pypi >>> volatile=True >>> uploadtrigger_jenkins=None >>> acl_upload=root >>> [87112 refs] >>> >>> # now let's try with /root/pypi >>> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True >>> /root/pypi changing volatile: True >>> http://pypi.localdomain:8080/root/pypi: >>> type=mirror >>> bases= >>> volatile=False >>> Traceback (most recent call last): >>> File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> >>> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() >>> File >>> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", >>> line 29, in main >>> return method(hub, hub.args) >>> File >>> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >>> line 65, in main >>> return index_modify(hub, url, kvdict) >>> File >>> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >>> line 14, in index_modify >>> index_show(hub, url) >>> File >>> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >>> line 36, in index_show >>> ixconfig["uploadtrigger_jenkins"],)) >>> KeyError: 'uploadtrigger_jenkins' >>> [87112 refs] >>> >>> Hum... not very good :( >>> >>> I've opened issue #82 >>> https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile >>> >>> Meanwhile, I will play with another index and let you know how it goes. >>> >>> Thanks >>> >>> Richard Gomes >>> http://rgomes.info >>> http://www.linkedin.com/in/rgomes >>> mobile: +44(77)9955-6813 >>> inum <http://www.inum.net/>: +883(5100)0800-9804 >>> sip:r...@ippi.fr >>> >>> On 25/01/14 09:29, holger krekel wrote: >>>> Hi Richard, >>>> >>>> On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote: >>>>> Hello Holger, >>>>> >>>>> No, I haven't tried bandersnatch. >>>>> I think devpi is perfect for my workflow and I'm not willing to try >>>>> other things. >>>> ok. >>>> >>>>> Sorry for not being very clear on my intentions. >>>>> My fault. I made the question more complicated than it should be. >>>>> >>>>> In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I >>>>> should load all those 38K packages into devpi cache. >>>>> I guess that I should make /root/pypi volatile and I should upload >>>>> package by package onto it. >>>>> Does it make sense? >>>> There are 38K projects but many more archive files. If you want a full >>>> mirror, that's around 40Gbytes of storage (and the according network traffic). >>>> >>>> What i think makes more sense is to go for triggering all index pages >>>> by iterating over all projects and maybe get and thereby cache the first >>>> (highest version) archive files. You can get some ideas how to do that >>>> in the repository at server/extra/compare_devpi_server.py e.g. through >>>> the getnames() function. >>>> >>>>> I'm basically confused about how devpi decides (or detects) that >>>>> eventually a package must be updated from PyPI since I've uploaded it by >>>>> hand. I'm not sure if this idea of uploading by hand would work well or >>>>> would eventually make devpii confused about when new updates should be >>>>> downloaded from PyPI. >>>> One you touched/retrieved all project index pages, devpi-server will >>>> auto-update all index pages for those index pages. >>>> >>>> HTH, >>>> holger >>>> >>>> >>>>> http://rgomes.info >>>>> http://www.linkedin.com/in/rgomes >>>>> mobile: +44(77)9955-6813 >>>>> inum <http://www.inum.net/>: +883(5100)0800-9804 >>>>> sip:r...@ippi.fr >>>>> >>>>> On 24/01/14 20:05, holger krekel wrote: >>>>>> On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote: >>>>>>> I've downloaded all packages from PyPI and I'd like to create a local >>>>>>> mirror using devpi. >>>>>> If you want to have a full non-lazy mirror, did you consider using >>>>>> bandersnatch? >>>>>> >>>>>>> I had the following idea: >>>>>>> >>>>>>> 1. create a new index say /root/pypimirror based on /root/pypi >>>>>>> >>>>>>> 2. upload the entire folder containing 38,000+ packages onto >>>>>>> /root/pypimirror >>>>>> This might fail if your system uses a 32K limit on directory entries. >>>>>> I've just fixed it for /root/pypi (not released yet) and i can also >>>>>> fix it for private indices. >>>>>> >>>>>>> 3. eventually setting /root/pypimirrot to NotVolatile (if this can be >>>>>>> done, somehow) >>>>>> You can change index volatility any time. >>>>>> >>>>>>> /myuser/myindex >>>>>>> +-- /root/pypimirror >>>>>>> +-- /root/pypi >>>>>>> >>>>>>> Another idea would be creating a "parallel" index to /root/pypi and >>>>>>> exposing a third index which derives from both. >>>>>>> >>>>>>> /myuser/myindex >>>>>>> +-- /root/pypi >>>>>>> +-- /root/pypimirror >>>>>>> >>>>>>> >>>>>>> Does this idea make sense? Is there a better way of doing it, in particular >>>>>>> without having to download everything again from PyPI? >>>>>> Could you state more clearly what you want to achieve in the first >>>>>> place? I can see a number of possible motivations but would like >>>>>> to understand your particular ones. (Did i mention that i was always >>>>>> very bad in school at understanding textual questions in math questions >>>>>> although others seemed to be able to guess the correct meaning of the >>>>>> question? :) >>>>>> >>>>>>> I have another concern: performance. Since 38,000+ packages implies on >>>>>>> large directories in the file system... do you think that devpi will have >>>>>>> troubles managing such amount of packages? >>>>>> see above. /root/pypi handles >32K fine on trunk. And private indices >>>>>> can also be made to do so. (So far there wasn't a use case for >32K >>>>>> private packages -- let's see if we have one here) >>>>>> >>>>>> best, >>>>>> holger >>>>>> >>>>>> >>>>>> >>>>>>> Thanks a lot, >>>>>>> >>>>>>> -- Richard >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google Groups "devpi-dev" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. >>>>>>> To post to this group, send email to devp...@googlegroups.com. >>>>>>> Visit this group at http://groups.google.com/group/devpi-dev. >>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
Hi Holger, Yes, I had a look at the change you've done on /root/pypi. It looked too simple to be true, to be honest! :) I will have a look at the source code and I will try to apply a similar change to other indexes. I will ask help if needed... you can count on that! But it will take some time to happen. I'm involved on some other things and playing with devpi is a secondary thing that will happen only when I get fed up of the primary things I have to address. :) Cheers Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr On 27/01/14 07:30, holger krekel wrote:
Hi Richard,
Hello Holger,
Not too hard, i guess, but it would mean we need a major version increase in devpi-server because the underlying storage layout changes as well. Maybe the underlying storage layout could be discovered dynamically. This way it would be possible to keep old indexes in the current format whilst newly created indexes could adopt a new format. If you think this can be done, please gimme some pointers and I will change my working copy of devpi and let you know how it goes. We could make the code more flexible to work with different storage layouts at the same time but i don't think it's worth
On Sun, Jan 26, 2014 at 22:31 +0000, Richard Gomes wrote: the effort nor the resulting complexity. There already is a version-upgrade mechanism and updating the layout for private indexes is thus relatively straight forward. If you want to help you could look into how it was done for /root/pypi (one of the last commits) and do something similar for private indexes. Requires some source code reading, though.
cheers, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:52, holger krekel wrote:
hi Richard,
On Sun, Jan 26, 2014 at 14:48 +0000, Richard Gomes wrote:
Hi Holger,
the >32K entries at this point only applies to /root/pypi. True. I've forgotten that. Would it be easy to apply the change to all indexes? Not too hard, i guess, but it would mean we need a major version increase in devpi-server because the underlying storage layout changes as well.
Regarding performance: keeping the current pace, I expect that the entire upload will take some more 18 days or so. Yes, it's very slow. In 10 hours it was able to upload some 900 packages. There are more 37K+ waiting! :) It checks the pypi.python.org index pages for each upload, i am afraid. Not sure if this can be optimized. 18 days seems a bit very slow, though.
If you can implement the fix to all indexes in the next two weeks, I will be able to test it when my script fails. Apparently, I don't have to start from scratch. Apparently I can simply upload more packages from the point it stopped. Even if I had to upload everything from scratch... this is not a high price to pay at this point. I am wondering then why you don't simply go for a little scraper script that touches all index /root/pypi and the 3-or-so top-most archive files so that you have an archive for each project. This is probably faster than 18 days and works with devpi-server today.
best, holger
Thanks
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:31, holger krekel wrote:
Hi Holger,
> I think you might run into a 32K entries per directory restriction > when uploading so many packages to a private index. I've installed your most recent changes from bitbucket in regards to this subject. Let's see how it goes!
On Sun, Jan 26, 2014 at 14:29 +0000, Richard Gomes wrote: the >32K entries at this point only applies to /root/pypi.
> Are you doing this only because you want to avoid forgetting > about your downloads? I'm doing this because I had already my fair share of troubles with PyPI and I'd like to rule out details about whether PyPI is available or not. I simply cannot depend on external factors when dealing with systems in production. As you mentioned before, I could employ bandersnatch... but I definitely don't see reason for adopting more than one tool for exactly one purpose. Besides, I like the idea of having multiple indexes whilst developing applications. So, again, devpi fits the bill very well. I just need to upload those 38K+ packages in devpi once in a lifetime and probably never to think about this again. good luck then :)
holger
Cheers :)
Richard Gomes http://rgomes.info http://www.linkedin.com/in/rgomes mobile: +44(77)9955-6813 inum <http://www.inum.net/>: +883(5100)0800-9804 sip:r...@ippi.fr
On 26/01/14 14:06, holger krekel wrote: > hi Richard, > > On Sun, Jan 26, 2014 at 13:55 +0000, Richard Gomes wrote: >> Hello Holger, >> >> Thanks a lot for your prompt answer. >> Yes, I understand that we'd better keep /root/pypi as it is. >> >> I've created an inherited index based on /root/pypi, like shown below: >> >> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypimirror >> http://pypi.localdomain:8080/root/pypimirror: >> type=stage >> bases=root/pypi >> volatile=True >> uploadtrigger_jenkins=None >> acl_upload=root >> >> I'm currently uploading packages onto this index. >> I will let you know later how it goes in regards to difficulties I had >> and performance observed. > I think you might run into a 32K entries per directory restriction > when uploading so many packages to a private index. > >> Am I correct to think that, if pip retrieves a packages which was >> updated on PyPI, it (pip) will get the updated package because >> /root/pypimirror inherits from /root/pypi which keeps itself in sync >> with PyPI? > I think this should work, yes. > > Are you doing this only because you want to avoid forgetting > about your downloads? > > best, > holger > > >> Thanks >> >> Richard Gomes >> http://rgomes.info >> http://www.linkedin.com/in/rgomes >> mobile: +44(77)9955-6813 >> inum <http://www.inum.net/>: +883(5100)0800-9804 >> sip:r...@ippi.fr >> >> On 26/01/14 07:56, holger krekel wrote: >>> Hello Richard, >>> >>> On Sun, Jan 26, 2014 at 02:39 +0000, Richard Gomes wrote: >>>> Hello Holger, >>>> >>>> I understood the point about triggering all index pages. >>>> But I guess that it would download everything [again] from PyPI, correct? >>> When a fresh devpi-server starts up, nothing is done except getting >>> the list of all projects from pypi.python.org. Only the triggering >>> of index pages will cache them, and only accessing a specific archive >>> file will cache that. >>> >>>> So, in order to avoid another wasteful full download, cos I've already >>>> done that... I'm trying to upload packages from the a local folder. >>> If you want to upload pypi.python.org packages you have to do that >>> into a private index and that will not be automatically kept up to >>> date via the syncing mechanism. You would have to do any updates >>> by hand. There is no way around that, volatile index or not. >>> /root/pypi is an index where you cannot change properties like volatile. >>> The error message could be better, agreed. >>> >>> The only thing we could consider is adding an option to devpi-server >>> that allows looking up archive files from a local directory (structure) >>> before trying a remote operation. We know the md5 checksum and filename >>> and so can match precisely with already downloaded files. >>> But don't hold your breath on that. >>> >>> cheers, >>> >>> holger >>> >>>> Unless you tell me it is not going to work, I'd like to persist on this >>>> route. You know... I can take this chance to gain some mileage with devpi :) >>>> >>>> First thing would be making /root/pypi volatile (if I understood properly!). >>>> >>>> >>>> # let's play with /root/test first, just to see how it works >>>> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=False >>>> /root/test changing volatile: False >>>> http://pypi.localdomain:8080/root/test: >>>> type=stage >>>> bases=root/pypi >>>> volatile=False >>>> uploadtrigger_jenkins=None >>>> acl_upload=root >>>> [87112 refs] >>>> >>>> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/test volatile=True >>>> /root/test changing volatile: True >>>> http://pypi.localdomain:8080/root/test: >>>> type=stage >>>> bases=root/pypi >>>> volatile=True >>>> uploadtrigger_jenkins=None >>>> acl_upload=root >>>> [87112 refs] >>>> >>>> # now let's try with /root/pypi >>>> (py276)rgomes@pypi:/srv/pypi$ devpi index /root/pypi volatile=True >>>> /root/pypi changing volatile: True >>>> http://pypi.localdomain:8080/root/pypi: >>>> type=mirror >>>> bases= >>>> volatile=False >>>> Traceback (most recent call last): >>>> File "/home/rgomes/.virtualenvs/py276/bin/devpi", line 9, in <module> >>>> load_entry_point('devpi-common==1.2', 'console_scripts', 'devpi')() >>>> File >>>> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/main.py", >>>> line 29, in main >>>> return method(hub, hub.args) >>>> File >>>> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >>>> line 65, in main >>>> return index_modify(hub, url, kvdict) >>>> File >>>> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >>>> line 14, in index_modify >>>> index_show(hub, url) >>>> File >>>> "/home/rgomes/.virtualenvs/py276/lib/python2.7/site-packages/devpi/index.py", >>>> line 36, in index_show >>>> ixconfig["uploadtrigger_jenkins"],)) >>>> KeyError: 'uploadtrigger_jenkins' >>>> [87112 refs] >>>> >>>> Hum... not very good :( >>>> >>>> I've opened issue #82 >>>> https://bitbucket.org/hpk42/devpi/issue/82/cannot-make-root-pypi-volatile >>>> >>>> Meanwhile, I will play with another index and let you know how it goes. >>>> >>>> Thanks >>>> >>>> Richard Gomes >>>> http://rgomes.info >>>> http://www.linkedin.com/in/rgomes >>>> mobile: +44(77)9955-6813 >>>> inum <http://www.inum.net/>: +883(5100)0800-9804 >>>> sip:r...@ippi.fr >>>> >>>> On 25/01/14 09:29, holger krekel wrote: >>>>> Hi Richard, >>>>> >>>>> On Fri, Jan 24, 2014 at 21:51 +0000, Richard Gomes wrote: >>>>>> Hello Holger, >>>>>> >>>>>> No, I haven't tried bandersnatch. >>>>>> I think devpi is perfect for my workflow and I'm not willing to try >>>>>> other things. >>>>> ok. >>>>> >>>>>> Sorry for not being very clear on my intentions. >>>>>> My fault. I made the question more complicated than it should be. >>>>>> >>>>>> In a nutshell, I just wanted a full PyPI mirror, but I'm not sure how I >>>>>> should load all those 38K packages into devpi cache. >>>>>> I guess that I should make /root/pypi volatile and I should upload >>>>>> package by package onto it. >>>>>> Does it make sense? >>>>> There are 38K projects but many more archive files. If you want a full >>>>> mirror, that's around 40Gbytes of storage (and the according network traffic). >>>>> >>>>> What i think makes more sense is to go for triggering all index pages >>>>> by iterating over all projects and maybe get and thereby cache the first >>>>> (highest version) archive files. You can get some ideas how to do that >>>>> in the repository at server/extra/compare_devpi_server.py e.g. through >>>>> the getnames() function. >>>>> >>>>>> I'm basically confused about how devpi decides (or detects) that >>>>>> eventually a package must be updated from PyPI since I've uploaded it by >>>>>> hand. I'm not sure if this idea of uploading by hand would work well or >>>>>> would eventually make devpii confused about when new updates should be >>>>>> downloaded from PyPI. >>>>> One you touched/retrieved all project index pages, devpi-server will >>>>> auto-update all index pages for those index pages. >>>>> >>>>> HTH, >>>>> holger >>>>> >>>>> >>>>>> http://rgomes.info >>>>>> http://www.linkedin.com/in/rgomes >>>>>> mobile: +44(77)9955-6813 >>>>>> inum <http://www.inum.net/>: +883(5100)0800-9804 >>>>>> sip:r...@ippi.fr >>>>>> >>>>>> On 24/01/14 20:05, holger krekel wrote: >>>>>>> On Fri, Jan 24, 2014 at 04:02 -0800, Richard Gomes wrote: >>>>>>>> I've downloaded all packages from PyPI and I'd like to create a local >>>>>>>> mirror using devpi. >>>>>>> If you want to have a full non-lazy mirror, did you consider using >>>>>>> bandersnatch? >>>>>>> >>>>>>>> I had the following idea: >>>>>>>> >>>>>>>> 1. create a new index say /root/pypimirror based on /root/pypi >>>>>>>> >>>>>>>> 2. upload the entire folder containing 38,000+ packages onto >>>>>>>> /root/pypimirror >>>>>>> This might fail if your system uses a 32K limit on directory entries. >>>>>>> I've just fixed it for /root/pypi (not released yet) and i can also >>>>>>> fix it for private indices. >>>>>>> >>>>>>>> 3. eventually setting /root/pypimirrot to NotVolatile (if this can be >>>>>>>> done, somehow) >>>>>>> You can change index volatility any time. >>>>>>> >>>>>>>> /myuser/myindex >>>>>>>> +-- /root/pypimirror >>>>>>>> +-- /root/pypi >>>>>>>> >>>>>>>> Another idea would be creating a "parallel" index to /root/pypi and >>>>>>>> exposing a third index which derives from both. >>>>>>>> >>>>>>>> /myuser/myindex >>>>>>>> +-- /root/pypi >>>>>>>> +-- /root/pypimirror >>>>>>>> >>>>>>>> >>>>>>>> Does this idea make sense? Is there a better way of doing it, in particular >>>>>>>> without having to download everything again from PyPI? >>>>>>> Could you state more clearly what you want to achieve in the first >>>>>>> place? I can see a number of possible motivations but would like >>>>>>> to understand your particular ones. (Did i mention that i was always >>>>>>> very bad in school at understanding textual questions in math questions >>>>>>> although others seemed to be able to guess the correct meaning of the >>>>>>> question? :) >>>>>>> >>>>>>>> I have another concern: performance. Since 38,000+ packages implies on >>>>>>>> large directories in the file system... do you think that devpi will have >>>>>>>> troubles managing such amount of packages? >>>>>>> see above. /root/pypi handles >32K fine on trunk. And private indices >>>>>>> can also be made to do so. (So far there wasn't a use case for >32K >>>>>>> private packages -- let's see if we have one here) >>>>>>> >>>>>>> best, >>>>>>> holger >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Thanks a lot, >>>>>>>> >>>>>>>> -- Richard >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google Groups "devpi-dev" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. >>>>>>>> To post to this group, send email to devp...@googlegroups.com. >>>>>>>> Visit this group at http://groups.google.com/group/devpi-dev. >>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
participants (2)
-
holger krekel
-
Richard Gomes