new devpi features / architcture / Pyramid migration

Hi there, (CC Donald who develops warehouse/next-pypi), There are two upcoming major new features coming up for devpi and devpi-server, partially funded by an undisclosed company: - a near-real-time mirroring facility that allows to have one devpi-server instance mirror all changes from another devpi-server instance. One major use case is to have a failover server that you can switch in if the main server crashed. Another use case is to allow for faster nearby mirrors in geo-distributed settings. - a new web/search interface that is being developed currently by Florian Schulze. It currently uses "whoosh" as a search backend and is generally based on the Pyramid web framework. One bigger architectural question is how the web interface and the devpi-server core communicate and relate to each other, also in context of the new mirroring functionality. One possibility (A) is to run the web interface in a separate process (or even on a separate server) and have it access the json core API via http. This allows for de-coupling and the search/web interface would be free to implement its own backend and data handling. It also means you would need to run two processes to get all benefits. And we would need to add new externally visible APIs to minimize the number of network requests between the components (give me all metadata of private projects and of all accessed PyPI ones etc.) The other possibility (B) is run both the core and web/search components in one process. This allows direct access of internal data structures without the need to define external APIs and without the implied network communication overheads. If we can run the web interface within a mirroring devpi-server process we still achieve a decoupled deployment of the more complex web interface and the core devpi-server interface. In this setting, the devpi-server core package will only implement the json/pip/setuptools and the mirroring APIs. The devpi-web package would additionally implement a web interface by depending on the devpi-server package and modifying/enriching the http endpoints. After a longer discussion with Florian this morning i therefore tend to go for this possibility B and to go for migrating also the core devpi-server to use Pyramid instead of bottle, resulting in more unified server code. Also Pyramid is certainly a rich and very well tested system for building web services. It particularly allows for explicit http request passing without relying on thread locals. So much for the current thoughts and plans. If you have any thoughts/comments, please shoot. best, holger

Howdy, I was at first more inclined towards solution A in order to keep devpi small and clean (as perhaps some people have no use for these fancier web interface/search capabilities) but thinking more and more about it, I think solution B is the one that makes most sense. For instance, if the documentation has to be indexed, not only would the web component needs to have access to internal data structures but perhaps documentation source as well. Search context -------------- I am not sure how much thoughts have been put into this already but I think it would be important to enable the user to specify the search context, which could be "index only", "index tree" or "server". - index only would limit the scope of the search within the specified index (i.e. lpbrac/dev). This could be useful when we want to ignore upstream/downstream versions of projects. - index tree would set the scope to all indices part of the inheritance tree (say lpbrac/prod lpbrac/dev, etc...) - Finally server wide would look across all indices on the server. I can see a case where someone wants to find out where project "X" lives which is kind of difficult today if you have a large number of users and indices. There is also the question of version specifier. For project A, we may have 10 versions of the documentation (one for each release file). So I think it would be useful to be able to scope in that dimension as well (like == 1.0.2, <= 3.1, etc). Interpshinx ----------- It this something intended to be supported? That's all I can think for now. Thanks for starting the thread. Laurent On 4/28/14, 5:42 AM, holger krekel wrote:
Hi there, (CC Donald who develops warehouse/next-pypi),
There are two upcoming major new features coming up for devpi and devpi-server, partially funded by an undisclosed company:
- a near-real-time mirroring facility that allows to have one devpi-server instance mirror all changes from another devpi-server instance. One major use case is to have a failover server that you can switch in if the main server crashed. Another use case is to allow for faster nearby mirrors in geo-distributed settings.
- a new web/search interface that is being developed currently by Florian Schulze. It currently uses "whoosh" as a search backend and is generally based on the Pyramid web framework.
One bigger architectural question is how the web interface and the devpi-server core communicate and relate to each other, also in context of the new mirroring functionality.
One possibility (A) is to run the web interface in a separate process (or even on a separate server) and have it access the json core API via http. This allows for de-coupling and the search/web interface would be free to implement its own backend and data handling. It also means you would need to run two processes to get all benefits. And we would need to add new externally visible APIs to minimize the number of network requests between the components (give me all metadata of private projects and of all accessed PyPI ones etc.)
The other possibility (B) is run both the core and web/search components in one process. This allows direct access of internal data structures without the need to define external APIs and without the implied network communication overheads.
If we can run the web interface within a mirroring devpi-server process we still achieve a decoupled deployment of the more complex web interface and the core devpi-server interface. In this setting, the devpi-server core package will only implement the json/pip/setuptools and the mirroring APIs. The devpi-web package would additionally implement a web interface by depending on the devpi-server package and modifying/enriching the http endpoints.
After a longer discussion with Florian this morning i therefore tend to go for this possibility B and to go for migrating also the core devpi-server to use Pyramid instead of bottle, resulting in more unified server code. Also Pyramid is certainly a rich and very well tested system for building web services. It particularly allows for explicit http request passing without relying on thread locals.
So much for the current thoughts and plans. If you have any thoughts/comments, please shoot.
best, holger

Howdy,
Hi!
I was at first more inclined towards solution A in order to keep devpi small and clean (as perhaps some people have no use for these fancier web interface/search capabilities) but thinking more and more about it, I think solution B is the one that makes most sense.
The server itself will stay small (even smaller than today). The web interface is a separate package which when installed will augment the server. The details are still a bit in flux while we are experimenting, but that is the grand goal.
For instance, if the documentation has to be indexed, not only would the web component needs to have access to internal data structures but perhaps documentation source as well.
That is indeed the case.
Search context --------------
I am not sure how much thoughts have been put into this already but I think it would be important to enable the user to specify the search context, which could be "index only", "index tree" or "server".
- index only would limit the scope of the search within the specified index (i.e. lpbrac/dev).
This could be useful when we want to ignore upstream/downstream versions of projects.
- index tree would set the scope to all indices part of the inheritance tree (say lpbrac/prod lpbrac/dev, etc...)
You mean limited to the user in this case, or do you mean the inheritance of the index (afaik there are plans to enhance the inheritance possibilities)?
- Finally server wide would look across all indices on the server.
I can see a case where someone wants to find out where project "X" lives which is kind of difficult today if you have a large number of users and indices.
This will certainly be the case. My first experiments were already so useful and fast, that I started to search for pypi packages with the local interface :)
There is also the question of version specifier. For project A, we may have 10 versions of the documentation (one for each release file). So I think it would be useful to be able to scope in that dimension as well (like == 1.0.2, <= 3.1, etc).
This is certainly something we still have to think about and test. The operators on the version are a nice idea and should be easily added. The biggest question is the UI. The query language is pretty powerful, but not necessarily easy to use. There will be a help page for it. At least for the more common use cases we should have a web UI for the filtering. Once there basics are useable through the query language I would really like some ideas on that.
Interpshinx -----------
It this something intended to be supported?
I have to read up on that. Holger and I discussed this briefly but have no definite plans yet. Regards, Florian Schulze

On 4/30/14, 10:42 AM, Florian Schulze wrote: Hi Florian
I was at first more inclined towards solution A in order to keep devpi small and clean (as perhaps some people have no use for these fancier web interface/search capabilities) but thinking more and more about it, I think solution B is the one that makes most sense.
The server itself will stay small (even smaller than today). The web interface is a separate package which when installed will augment the server. The details are still a bit in flux while we are experimenting, but that is the grand goal.
For instance, if the documentation has to be indexed, not only would the web component needs to have access to internal data structures but perhaps documentation source as well.
That is indeed the case.
Search context --------------
I am not sure how much thoughts have been put into this already but I think it would be important to enable the user to specify the search context, which could be "index only", "index tree" or "server".
- index only would limit the scope of the search within the specified index (i.e. lpbrac/dev).
This could be useful when we want to ignore upstream/downstream versions of projects.
- index tree would set the scope to all indices part of the inheritance tree (say lpbrac/prod lpbrac/dev, etc...)
You mean limited to the user in this case, or do you mean the inheritance of the index (afaik there are plans to enhance the inheritance possibilities)?
I meant within the current implementation limits bu not limited to the user (say if lpbrac/dev derives from florian/prod the search would apply to our index as well). I am kind of throwing things up in the air to see what sticks, hoping that others have opinions about this so we come up with a meaningful set.
- Finally server wide would look across all indices on the server.
I can see a case where someone wants to find out where project "X" lives which is kind of difficult today if you have a large number of users and indices.
This will certainly be the case. My first experiments were already so useful and fast, that I started to search for pypi packages with the local interface :)
Interesting ... I was about to say not to go there as it would be too much, but if it comes for free, then great :)
There is also the question of version specifier. For project A, we may have 10 versions of the documentation (one for each release file). So I think it would be useful to be able to scope in that dimension as well (like == 1.0.2, <= 3.1, etc).
This is certainly something we still have to think about and test. The operators on the version are a nice idea and should be easily added.
I think this would be useful indeed.
The biggest question is the UI. The query language is pretty powerful, but not necessarily easy to use. There will be a help page for it. At least for the more common use cases we should have a web UI for the filtering. Once there basics are useable through the query language I would really like some ideas on that.
I think people should be able to figure out the query language and use it. What about saving filters so that they can be re-used by others as opposed to have built-in ones?
Interpshinx -----------
It this something intended to be supported?
I have to read up on that. Holger and I discussed this briefly but have no definite plans yet.
This might get pretty complicated. Read The Docs seems to always point to the latest catalog of a project ... which is okay if this project is only in one place... I will try to educate myself on this as well to have something more meaningful to say. Again, just wanted to raise this up. It seems that this is already on the table. Thanks for getting back to me. Best/Laurent
Regards, Florian Schulze

Hey Laurent, On Wed, Apr 30, 2014 at 09:46 -0700, Laurent Brack wrote:
Howdy,
I was at first more inclined towards solution A in order to keep devpi small and clean (as perhaps some people have no use for these fancier web interface/search capabilities) but thinking more and more about it, I think solution B is the one that makes most sense.
For instance, if the documentation has to be indexed, not only would the web component needs to have access to internal data structures but perhaps documentation source as well.
Yes, access to internal data structures was one reason i preferred B. Note, however, that in the current setting doc sources and build instructions are not available at the server side. Only the readily built documentation zip is transferred via the "upload_docs" facility of setuptools/sphinx. See more about this below.
Search context --------------
I am not sure how much thoughts have been put into this already but I think it would be important to enable the user to specify the search context, which could be "index only", "index tree" or "server".
- index only would limit the scope of the search within the specified index (i.e. lpbrac/dev).
This could be useful when we want to ignore upstream/downstream versions of projects.
- index tree would set the scope to all indices part of the inheritance tree (say lpbrac/prod lpbrac/dev, etc...)
- Finally server wide would look across all indices on the server.
index-tree and "all indexes" make sense to me. index-only maybe as well although i'd think that an index-tree search would present results in "closest index first" order anyway.
I can see a case where someone wants to find out where project "X" lives which is kind of difficult today if you have a large number of users and indices.
Yes, search should certainly help with finding indices where projects live.
There is also the question of version specifier. For project A, we may have 10 versions of the documentation (one for each release file). So I think it would be useful to be able to scope in that dimension as well (like == 1.0.2, <= 3.1, etc).
Yes, probably makes sense to be able to have that. I guess most searches will only care for the newest versions, though. Documentation always gets better and more, right? :) I wonder if we should introduce something to the search language like: - index:hpk/dev for searching in the "hpk/dev" index - index:* (also default) search in all indices - index:hpk/* all my own indices - version:<2.5 to search only release files prior to version 2.5
Interpshinx -----------
It this something intended to be supported?
With the current protocols (see above) the server does not have the sources or the build information and does not manage builds. So it cannot rebuild docs and configure intersphinx docs itself. We would need to add an "upload doc sources" facility along with build instructions (the doc building has dependencies if it documents modules that need to be imported etc.) and then make devpi-server or another component manage the building. Quite a lot of work. But we should still look at how devpi-server could support intersphinx building: if i upload project A's docs and project B's docs to the same index and A wants to have intersphinx links to B it could potentially load the "objects.inv" from the url where devpi-server/web serves it. If we now push A to a new index, it will stick back-reference B's docs in the first index. If B's docs move to the new index, that does not change. I don't know if intersphinx could work by generating relative links which would help this particular case.
That's all I can think for now. Thanks for starting the thread.
Thanks for the feedback! holger
Laurent
On 4/28/14, 5:42 AM, holger krekel wrote:
Hi there, (CC Donald who develops warehouse/next-pypi),
There are two upcoming major new features coming up for devpi and devpi-server, partially funded by an undisclosed company:
- a near-real-time mirroring facility that allows to have one devpi-server instance mirror all changes from another devpi-server instance. One major use case is to have a failover server that you can switch in if the main server crashed. Another use case is to allow for faster nearby mirrors in geo-distributed settings.
- a new web/search interface that is being developed currently by Florian Schulze. It currently uses "whoosh" as a search backend and is generally based on the Pyramid web framework.
One bigger architectural question is how the web interface and the devpi-server core communicate and relate to each other, also in context of the new mirroring functionality.
One possibility (A) is to run the web interface in a separate process (or even on a separate server) and have it access the json core API via http. This allows for de-coupling and the search/web interface would be free to implement its own backend and data handling. It also means you would need to run two processes to get all benefits. And we would need to add new externally visible APIs to minimize the number of network requests between the components (give me all metadata of private projects and of all accessed PyPI ones etc.)
The other possibility (B) is run both the core and web/search components in one process. This allows direct access of internal data structures without the need to define external APIs and without the implied network communication overheads.
If we can run the web interface within a mirroring devpi-server process we still achieve a decoupled deployment of the more complex web interface and the core devpi-server interface. In this setting, the devpi-server core package will only implement the json/pip/setuptools and the mirroring APIs. The devpi-web package would additionally implement a web interface by depending on the devpi-server package and modifying/enriching the http endpoints.
After a longer discussion with Florian this morning i therefore tend to go for this possibility B and to go for migrating also the core devpi-server to use Pyramid instead of bottle, resulting in more unified server code. Also Pyramid is certainly a rich and very well tested system for building web services. It particularly allows for explicit http request passing without relying on thread locals.
So much for the current thoughts and plans. If you have any thoughts/comments, please shoot.
best, holger
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/d/optout.

On 4/30/14, 10:57 AM, holger krekel wrote:
Hey Laurent,
On Wed, Apr 30, 2014 at 09:46 -0700, Laurent Brack wrote:
Howdy,
I was at first more inclined towards solution A in order to keep devpi small and clean (as perhaps some people have no use for these fancier web interface/search capabilities) but thinking more and more about it, I think solution B is the one that makes most sense.
For instance, if the documentation has to be indexed, not only would the web component needs to have access to internal data structures but perhaps documentation source as well.
Yes, access to internal data structures was one reason i preferred B. Note, however, that in the current setting doc sources and build instructions are not available at the server side. Only the readily built documentation zip is transferred via the "upload_docs" facility of setuptools/sphinx. See more about this below. Yes, I was aware of this. I think the ReST source are also part of that build output but I wonder if this is only the case when "show source" is
Hi Holger, I already replied to Florian but I think you are bringing additional details. so here we go... true.
Search context --------------
I am not sure how much thoughts have been put into this already but I think it would be important to enable the user to specify the search context, which could be "index only", "index tree" or "server".
- index only would limit the scope of the search within the specified index (i.e. lpbrac/dev).
This could be useful when we want to ignore upstream/downstream versions of projects.
- index tree would set the scope to all indices part of the inheritance tree (say lpbrac/prod lpbrac/dev, etc...)
- Finally server wide would look across all indices on the server.
index-tree and "all indexes" make sense to me. index-only maybe as well although i'd think that an index-tree search would present results in "closest index first" order anyway.
agreed.
I can see a case where someone wants to find out where project "X" lives which is kind of difficult today if you have a large number of users and indices.
Yes, search should certainly help with finding indices where projects live.
There is also the question of version specifier. For project A, we may have 10 versions of the documentation (one for each release file). So I think it would be useful to be able to scope in that dimension as well (like == 1.0.2, <= 3.1, etc).
Yes, probably makes sense to be able to have that. I guess most searches will only care for the newest versions, though. Documentation always gets better and more, right? :)
Thanks for reminding me I need to finish the devpi documentation... didn't forget about it :)
I wonder if we should introduce something to the search language like:
- index:hpk/dev for searching in the "hpk/dev" index - index:* (also default) search in all indices - index:hpk/* all my own indices - version:<2.5 to search only release files prior to version 2.5
I don' know the search syntax, but what you show above makes a lot of sense. Again, being able to save those filters, and then being able to use those with different search terms could be useful and time saving. Hope I am clear enough here and I make sense.
Interpshinx -----------
It this something intended to be supported?
With the current protocols (see above) the server does not have the sources or the build information and does not manage builds. So it cannot rebuild docs and configure intersphinx docs itself. We would need to add an "upload doc sources" facility along with build instructions (the doc building has dependencies if it documents modules that need to be imported etc.) and then make devpi-server or another component manage the building. Quite a lot of work.
But we should still look at how devpi-server could support intersphinx building: if i upload project A's docs and project B's docs to the same index and A wants to have intersphinx links to B it could potentially load the "objects.inv" from the url where devpi-server/web serves it. If we now push A to a new index, it will stick back-reference B's docs in the first index. If B's docs move to the new index, that does not change. I don't know if intersphinx could work by generating relative links which would help this particular case.
Yes, this is a big can of worm which I need to look into.
That's all I can think for now. Thanks for starting the thread.
Thanks for the feedback!
holger
Laurent
On 4/28/14, 5:42 AM, holger krekel wrote:
Hi there, (CC Donald who develops warehouse/next-pypi),
There are two upcoming major new features coming up for devpi and devpi-server, partially funded by an undisclosed company:
- a near-real-time mirroring facility that allows to have one devpi-server instance mirror all changes from another devpi-server instance. One major use case is to have a failover server that you can switch in if the main server crashed. Another use case is to allow for faster nearby mirrors in geo-distributed settings.
- a new web/search interface that is being developed currently by Florian Schulze. It currently uses "whoosh" as a search backend and is generally based on the Pyramid web framework.
One bigger architectural question is how the web interface and the devpi-server core communicate and relate to each other, also in context of the new mirroring functionality.
One possibility (A) is to run the web interface in a separate process (or even on a separate server) and have it access the json core API via http. This allows for de-coupling and the search/web interface would be free to implement its own backend and data handling. It also means you would need to run two processes to get all benefits. And we would need to add new externally visible APIs to minimize the number of network requests between the components (give me all metadata of private projects and of all accessed PyPI ones etc.)
The other possibility (B) is run both the core and web/search components in one process. This allows direct access of internal data structures without the need to define external APIs and without the implied network communication overheads.
If we can run the web interface within a mirroring devpi-server process we still achieve a decoupled deployment of the more complex web interface and the core devpi-server interface. In this setting, the devpi-server core package will only implement the json/pip/setuptools and the mirroring APIs. The devpi-web package would additionally implement a web interface by depending on the devpi-server package and modifying/enriching the http endpoints.
After a longer discussion with Florian this morning i therefore tend to go for this possibility B and to go for migrating also the core devpi-server to use Pyramid instead of bottle, resulting in more unified server code. Also Pyramid is certainly a rich and very well tested system for building web services. It particularly allows for explicit http request passing without relying on thread locals.
So much for the current thoughts and plans. If you have any thoughts/comments, please shoot.
best, holger
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/d/optout.
participants (3)
-
Florian Schulze
-
holger krekel
-
Laurent Brack