Hi,
I'm playing around with devpi themes.
My goal is to create a nice theme for my company.
I'd like to add extra information to some pages, like the index page or the
main page.
I added a snippet of code which produces an "index inheritance diagram" on
each index page display.
Because the company I work for is not that small, these diagrams can get
strange.
A sample diagram: index inheritance diagram
<https://lh3.googleusercontent.com/--9yr5WSohOc/VZ0BL6xN3YI/AAAAAAAAAAs/8wZq…>.
Each box is an index, I intentionally removed all user and index names,
except for root/pypi.
It's not my indices, please don't judge me for the design :)
I'm concerned about performance here. First of all, currently I generate
those on display action, which doesn't seem the way to go.
Could devpi provide a hook so that I'm able to regenerate a diagram on an
index update event?
Additionally, I'd like to add more information to an index page and the
root pages. So the question is: Could functions in devpi_web.views expose
more data in get_index() and root()? Potentially other functions?
I know I can always get the data with python snippets in the templates, but
it all takes precious time.
For instance, in my company, some indices are regular user indices, some of
them are project indices.
I'd like to sort the list of indices on the root page: display the project
ones first, and user indices below.
The way I would see this is the custom_data property of an index.
The bottom line: I think people could easily create fancier themes if the
devpi web views functions exposed more data to the templates, even if that
data is not used in the default theme.
It would be great to get the full indexconfig json structure for each index
in both get_index() and root() functions from devpi_web.views module.
Now I'm not sure how much more data others might want to get there.
I hope these things make sense.
Any thoughts?
Hi Florian, all,
there are at least three issues that somewhat interelate and share the
common topic of service reliability, concurrency and interactions with
remote pypi.python.org or devpi masters:
https://bitbucket.org/hpk42/devpi/issues/267/intermittent-assertionerror-in…
multiple devpi-server processes write to the same (networked shared) file system
resulting in failed transaction handling. devpi-server was not
designed for it.
https://bitbucket.org/hpk42/devpi/issues/274/recurring-consistency-issues-w…
under high load database/transaction handling issues arise.
(although it's unclear what the precise scenario is, how to replicate)
https://bitbucket.org/hpk42/devpi/issues/208/pip-gets-timeout-on-large-pack…
trying to install an uncached package that originates from pypi.python.org
can fail if devpi-server cannot download the package fast enough.
Starting with the last issue, we probably need to re-introduce a way to
stream remote files instead of first retrieving it in full and only then
starting a client response . This should take into account that there
could be two threads (or even two processes) which try to retrieve the
same file. This means that we start a response as soon as we got a http
return code and them forward-stream the content.
The first two issues could be mitigated by introducing a better
read/write transaction separation. background: GET-ting simple pages or
release files can cause write transactions in a devpi-server process
because we may need to retrive & cache information from pypi.python.org
or a devpi-server master. Currently, during the processing of the GET
request we at some point promote a READ-transaction into a
WRITE-transaction through a call to keyfs.restart_as_write_transaction()
and persist what we have. This all happens before the response to the
client is returned. "Restarting as write" is somewhat brittle because
something might have changed since we started our long-running request.
Strategy and notes on how to mitigate all three issues:
- release files: cache and stream chunks of what we remotely receive,
all within a READ-transaction and all within RAM. This should ideally
be done in such a way that if multiple threads stream the same file,
only one remote http request is done for fetching the file. Otherwise
we end up retrieving large files multiple times unneccessarily. After
the http response to the client is complete we (try to) write it to
sqlite/filesystem so that subsequent requests can work from the local
filesystem. Here we need to be careful and consider that we might
have multiple writers/streamers. If we discover that where we want to
write someone else already has we can simply forget about it.
- simple pages: first retrieve the remote simple page in RAM, process
it, serve the full pyramid response and then (try to) cache it after
the response is completed. Here we probably don't need to care if
multiple threads are trying to retrieve the same simple page because
simple pages are not big.
- we cache things in RAM because even for large files it shouldn't
matter given that servers typically have multiple gigabytes of RAM.
And we can avoid synchronization issues wrt to the file system (see
also the first issues where multiple processes write to the file
system).
- we always finish response to the client before we attempt to do a
write transaction. The write transaction part should be implemented
in a separate function to make it clear what kind of state we can rely
on and what we must re-check. (currently we do the READ->Write switch
in the middle of a view function).
- we also need to review how exactly we open the sqlite DB for writing
and if multiple processes correctly serialize on their write attempts,
particularly in the multi-process case.
- care must be taken wrt to waitress and nginx configuration and their
buffering, see for example:
http://www.4byte.cn/question/68410/pyramid-stream-response-body.html
any feedback or thoughts welcome.
holger
--
about me: http://holgerkrekel.net/about-me/
contracting: http://merlinux.eu
Our corp standard requires that all in-house build products (notably
including .whl files) must be housed in Artifactory. We can resolve them
from there via Artifactory's PyPi facade (read-only). But we don't want to
have to check in external packages (from public PyPi) into Artifactory--and
it makes no sense to do so.
Can an index in devpi have, as a base, both public/pipi and an additional
external, read-only repo, and serve as proxy for both?
Hi,
I meet a bad issue yesterday:
- We have a devpi server running with some local packages uploaded, but
with normal fall back to the pypi index.
- Referring packages not found on the devpi server is correctly fetched
from pypi.
- As part of building a local wheel using 'pip wheel' and pushing it to
devpi, I also pushed all the depending wheels generated to devpi.
- Now, when I try 'pip install maven' I properly get the version stored
on my devpi server (e.g. version 3.4)
- HOWEVER, if I do 'pip install maven==3.5', devpi reports no such
version and fails as it only knows of version 3.4 - it does NOT fall back
to search the pypi index...
Is this by design? Shouldn't devpi also fall back to searching the pypi
index in this case (package is found, but no matching version)?
Comments anyone?