Hi Stephan, On Fri, Oct 30, 2015 at 10:06 +0000, Erb, Stephan wrote:
what's your impression on the additional code complexity that will introduce?
There is some increased code complexity but i think we should be able to contain it in separately-testable classes/functions.
We are currently facing an other set of concurrency and performance problems in devpi. We easily have 300 package versions per +simple page of a package. A single request takes 0.2 up to 1 second. When there are multiple concurrent read requests (~10) the latency goes up significantly.
Do you have profiling data for the 300-package per simple page scenarios? Most of the time spent in get_releaselinks?
Still, this problem is manageable and we are working on a few performance patches to improve the situation. However, I fear that the large rework proposed here might make the code more difficult and thus more difficult to tune.
I don't suspect the two efforts clash much. What did you do so far? That said, we are currently caching at "list of release file links" level and i think it's worthwhile to check if we should rather cache at the simple-page layer. Apart from performance improvements it also has potential to simplify the code if we manage to only have caching at simple-page and not in addition to the releaselinks-caching. best, holger
________________________________________ From: devp...@googlegroups.com
on behalf of holger krekel Sent: Thursday, October 29, 2015 2:23 PM To: devpi-dev Subject: [devpi-dev] improving concurrency, reliability of devpi-server
Hi Florian, all,
there are at least three issues that somewhat interelate and share the common topic of service reliability, concurrency and interactions with remote pypi.python.org or devpi masters:
multiple devpi-server processes write to the same (networked shared) file system resulting in failed transaction handling. devpi-server was not designed for it.
under high load database/transaction handling issues arise. (although it's unclear what the precise scenario is, how to replicate)
trying to install an uncached package that originates from pypi.python.org can fail if devpi-server cannot download the package fast enough.
Starting with the last issue, we probably need to re-introduce a way to stream remote files instead of first retrieving it in full and only then starting a client response . This should take into account that there could be two threads (or even two processes) which try to retrieve the same file. This means that we start a response as soon as we got a http return code and them forward-stream the content.
The first two issues could be mitigated by introducing a better read/write transaction separation. background: GET-ting simple pages or release files can cause write transactions in a devpi-server process because we may need to retrive & cache information from pypi.python.org or a devpi-server master. Currently, during the processing of the GET request we at some point promote a READ-transaction into a WRITE-transaction through a call to keyfs.restart_as_write_transaction() and persist what we have. This all happens before the response to the client is returned. "Restarting as write" is somewhat brittle because something might have changed since we started our long-running request.
Strategy and notes on how to mitigate all three issues:
- release files: cache and stream chunks of what we remotely receive, all within a READ-transaction and all within RAM. This should ideally be done in such a way that if multiple threads stream the same file, only one remote http request is done for fetching the file. Otherwise we end up retrieving large files multiple times unneccessarily. After the http response to the client is complete we (try to) write it to sqlite/filesystem so that subsequent requests can work from the local filesystem. Here we need to be careful and consider that we might have multiple writers/streamers. If we discover that where we want to write someone else already has we can simply forget about it.
- simple pages: first retrieve the remote simple page in RAM, process it, serve the full pyramid response and then (try to) cache it after the response is completed. Here we probably don't need to care if multiple threads are trying to retrieve the same simple page because simple pages are not big.
- we cache things in RAM because even for large files it shouldn't matter given that servers typically have multiple gigabytes of RAM. And we can avoid synchronization issues wrt to the file system (see also the first issues where multiple processes write to the file system).
- we always finish response to the client before we attempt to do a write transaction. The write transaction part should be implemented in a separate function to make it clear what kind of state we can rely on and what we must re-check. (currently we do the READ->Write switch in the middle of a view function).
- we also need to review how exactly we open the sqlite DB for writing and if multiple processes correctly serialize on their write attempts, particularly in the multi-process case.
- care must be taken wrt to waitress and nginx configuration and their buffering, see for example: http://www.4byte.cn/question/68410/pyramid-stream-response-body.html
any feedback or thoughts welcome.
-- You received this message because you are subscribed to the Google Groups "devpi-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com. To post to this group, send email to devp...@googlegroups.com. Visit this group at http://groups.google.com/group/devpi-dev. For more options, visit https://groups.google.com/d/optout.