Mirroring. Proposal and questions
Hi! We want to support external mirrors other than PyPI. tl;dr The basics are mostly clear. Migration from a mirror could use input. Mirroring of devpi indexes needs more thought. I'd say we only support mirrors that provide a PyPI simple view where each project has one folder. For "findlinks" like private package storages which have all packages in one folder, I recommend to use CheesePrism and mirror that. I don't think we could support that lazily, as each file needs to be inspected for metadata, so we would have to download everything. When creating a mirror index, you provide the root url of the simple view (like https://pypi.python.org/simple/) and you can set the cache expiry time (defaults to 1800 seconds). The index settings are "mirror_url" and "mirror_cache_expiry". Mirrors are lazy, exactly like root/pypi is at the moment. To enable migration, we add support to "push" from a mirror into a regular index (also useful to selectively get pypi packages into an index) and look into "bulk pushing". Any thoughts on this? We should rename "pypi_whitelist" into "mirror_whitelist" or something like that. We will do a 3.0.0 anyway and can switch this during export/import. Should we add a setting that exempts a mirror from the whitelisting, or should we just get people to migrate to a regular index to avoid the need to whitelist packages? What about mirroring a devpi index? I see 3 scenarios: 1. Just mirror +simple of the index. + works now - problem is, that all inherited indexes and pypi would be mirrored as well. 2. Mirror a view which excludes mirrors. 3. Mirror a view that excludes any inherited indexes. I think both 2 and 3 have valid use cases. We need good names for the URL, I didn't come up with good ones yet (is there one word for "not inheriting"). We could later optimize devpi mirroring in various ways. For example with a serial header and with a notification protocol, though I'm not sure more than the serial is really necessary for most use cases. Regards, Florian Schulze
On 13 Nov 2015, at 12:55, Florian Schulze wrote:
We should rename "pypi_whitelist" into "mirror_whitelist" or something like that.
I was thinking a bit more about this and there are two different kinds of whitelisting that make sense IMO. The current on is whitelisting on a regular index for packages that have custom uploads to prevent security issues for private packages. I think with general mirroring the name should be made better. Maybe "inherited_mirror_whitelist"? There might also be use cases for blocking all inherited uploads ("inherited_whitelist")? The second one would be a whitelist on a mirror index. That way one can block all packages from being mirrored, except the whitelisted ones. The default here would be "*". This would enable preventing download of stuff you don't want. For all the whitelists we might want to support version specifiers. That way we can support indexes that provide a "known good set" for example. Thoughts? For now my main concern is to get the naming right, so we don't have to change it later on. The implementation for these different kind of whitelists can come later. Regards, Florian Schulze
On Sun, Nov 15, 2015 at 15:00 +0100, Florian Schulze wrote:
On 13 Nov 2015, at 12:55, Florian Schulze wrote:
We should rename "pypi_whitelist" into "mirror_whitelist" or something like that.
I was thinking a bit more about this and there are two different kinds of whitelisting that make sense IMO.
The current on is whitelisting on a regular index for packages that have custom uploads to prevent security issues for private packages. I think with general mirroring the name should be made better. Maybe "inherited_mirror_whitelist"? There might also be use cases for blocking all inherited uploads ("inherited_whitelist")?
The second one would be a whitelist on a mirror index. That way one can block all packages from being mirrored, except the whitelisted ones. The default here would be "*". This would enable preventing download of stuff you don't want.
For all the whitelists we might want to support version specifiers. That way we can support indexes that provide a "known good set" for example.
Thoughts? For now my main concern is to get the naming right, so we don't have to change it later on. The implementation for these different kind of whitelists can come later.
In general i think we should not dive into more whitelisting mechanics but rather move towards having a good way to bulk-copy releases from one (mirror) index to another (private one). Therefore i think we should just rename pypi_whitelist to mirror_whitelist and not introduce version-specifiers or a mirror-specific whitelist option. holger
Regards, Florian Schulze
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/d/optout.
-- about me: http://holgerkrekel.net/about-me/ contracting: http://merlinux.eu
Hi Florian, On Fri, Nov 13, 2015 at 12:55 +0100, Florian Schulze wrote:
Hi!
We want to support external mirrors other than PyPI.
tl;dr The basics are mostly clear. Migration from a mirror could use input. Mirroring of devpi indexes needs more thought.
I'd say we only support mirrors that provide a PyPI simple view where each project has one folder. For "findlinks" like private package storages which have all packages in one folder, I recommend to use CheesePrism and mirror that. I don't think we could support that lazily, as each file needs to be inspected for metadata, so we would have to download everything.
Not sure i understand why we need to inspect metadata and could not from a findlinks page filter names for a simple page. But i anyway agree to first only care for mirroring servers that have simple pages.
When creating a mirror index, you provide the root url of the simple view (like https://pypi.python.org/simple/) and you can set the cache expiry time (defaults to 1800 seconds). The index settings are "mirror_url" and "mirror_cache_expiry".
Mirrors are lazy, exactly like root/pypi is at the moment.
sounds good.
To enable migration, we add support to "push" from a mirror into a regular index (also useful to selectively get pypi packages into an index) and look into "bulk pushing". Any thoughts on this?
Let's first support a simple single-release push before considering bulk-push.
We should rename "pypi_whitelist" into "mirror_whitelist" or something like that.
yes.
We will do a 3.0.0 anyway and can switch this during export/import.
yes.
Should we add a setting that exempts a mirror from the whitelisting, or should we just get people to migrate to a regular index to avoid the need to whitelist packages?
not sure i understand this question. In any case, for now i see mirroring as a kind of "temporary" measure to help with full migration to devpi-server and would like to not think too hard about how to fine-tune mirroring.
What about mirroring a devpi index?
I see 3 scenarios: 1. Just mirror +simple of the index. + works now - problem is, that all inherited indexes and pypi would be mirrored as well.
I'd go for that only. More advanced devpi-to-devpi mirroring can be considered post-3.0 IMO. holger
2. Mirror a view which excludes mirrors. 3. Mirror a view that excludes any inherited indexes.
I think both 2 and 3 have valid use cases. We need good names for the URL, I didn't come up with good ones yet (is there one word for "not inheriting").
We could later optimize devpi mirroring in various ways. For example with a serial header and with a notification protocol, though I'm not sure more than the serial is really necessary for most use cases.
Regards, Florian Schulze
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/d/optout.
-- about me: http://holgerkrekel.net/about-me/ contracting: http://merlinux.eu
Hi Florian, Yesterday i spent some time on top of the devpi-server-3.0.0 branch to move our storage to use normalized projectnames instead of "real" names (let's call them "display names" -- Donald uses this also for warehouse IIRC). One area of complications was in project existence checks and i got the impression we might even get rid of needing a up-front full simple list. In any case i think the normalization effort should simplify (and speed up) the code for mirroring and would conflict with a parallel effort to introduce generalized mirroring. Therefore i suggest to first land normalization before tackling mirroring. I hope to finish it soon but probably not before mid next week. cheers, holger On Fri, Nov 13, 2015 at 12:55 +0100, Florian Schulze wrote:
Hi!
We want to support external mirrors other than PyPI.
tl;dr The basics are mostly clear. Migration from a mirror could use input. Mirroring of devpi indexes needs more thought.
I'd say we only support mirrors that provide a PyPI simple view where each project has one folder. For "findlinks" like private package storages which have all packages in one folder, I recommend to use CheesePrism and mirror that. I don't think we could support that lazily, as each file needs to be inspected for metadata, so we would have to download everything.
When creating a mirror index, you provide the root url of the simple view (like https://pypi.python.org/simple/) and you can set the cache expiry time (defaults to 1800 seconds). The index settings are "mirror_url" and "mirror_cache_expiry".
Mirrors are lazy, exactly like root/pypi is at the moment.
To enable migration, we add support to "push" from a mirror into a regular index (also useful to selectively get pypi packages into an index) and look into "bulk pushing". Any thoughts on this?
We should rename "pypi_whitelist" into "mirror_whitelist" or something like that. We will do a 3.0.0 anyway and can switch this during export/import.
Should we add a setting that exempts a mirror from the whitelisting, or should we just get people to migrate to a regular index to avoid the need to whitelist packages?
What about mirroring a devpi index? I see 3 scenarios: 1. Just mirror +simple of the index. + works now - problem is, that all inherited indexes and pypi would be mirrored as well. 2. Mirror a view which excludes mirrors. 3. Mirror a view that excludes any inherited indexes.
I think both 2 and 3 have valid use cases. We need good names for the URL, I didn't come up with good ones yet (is there one word for "not inheriting").
We could later optimize devpi mirroring in various ways. For example with a serial header and with a notification protocol, though I'm not sure more than the serial is really necessary for most use cases.
Regards, Florian Schulze
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/d/optout.
-- about me: http://holgerkrekel.net/about-me/ contracting: http://merlinux.eu
participants (2)
-
Florian Schulze
-
holger krekel