Re: [devpi-dev] improving concurrency, reliability of devpi-server

30 Oct 2015

      Hi Stephan,

On Fri, Oct 30, 2015 at 10:06 +0000, Erb, Stephan wrote:
...
Hi Holger,
what's your impression on the additional code complexity that will introduce?
There is some increased code complexity but i think we should be able to contain
it in separately-testable classes/functions.
...
We are currently facing an other set of  concurrency and performance problems in devpi. We easily have 300 package versions per +simple page of  a package. A single request takes 0.2 up to 1 second. When there are multiple concurrent read requests (~10) the latency goes up significantly.
Do you have profiling data for the 300-package per simple page scenarios?
Most of the time spent in get_releaselinks?
...
Still, this problem is manageable and we are working on a few performance patches to improve the situation. However, I fear that the large rework proposed here might make the code more difficult and thus more difficult to tune.
I don't suspect the two efforts clash much.  What did you do so far?

That said, we are currently caching at "list of release file links"
level and i think it's worthwhile to check if we should rather cache at
the simple-page layer.  Apart from performance improvements it also has
potential to simplify the code if we manage to only have caching at
simple-page and not in addition to the releaselinks-caching.  

best,
holger
...
Regards,
Stephan
________________________________________
From: devp...@googlegroups.com <devp...@googlegroups.com> on behalf of holger krekel <hol...@merlinux.eu>
Sent: Thursday, October 29, 2015 2:23 PM
To: devpi-dev
Subject: [devpi-dev] improving concurrency, reliability of devpi-server
Hi Florian, all,
there are at least three issues that somewhat interelate and share the
common topic of service reliability, concurrency and interactions with
remote pypi.python.org or devpi masters:
https://bitbucket.org/hpk42/devpi/issues/267/intermittent-assertionerror-in-...
multiple devpi-server processes write to the same (networked shared) file system
    resulting in failed transaction handling.  devpi-server was not
    designed for it.
https://bitbucket.org/hpk42/devpi/issues/274/recurring-consistency-issues-wi...
under high load database/transaction handling issues arise.
    (although it's unclear what the precise scenario is, how to replicate)
https://bitbucket.org/hpk42/devpi/issues/208/pip-gets-timeout-on-large-packa...
trying to install an uncached package that originates from pypi.python.org
    can fail if devpi-server cannot download the package fast enough.
Starting with the last issue, we probably need to re-introduce a way to
stream remote files instead of first retrieving it in full and only then
starting a client response .  This should take into account that there
could be two threads (or even two processes) which try to retrieve the
same file.  This means that we start a response as soon as we got a http
return code and them forward-stream the content.
The first two issues could be mitigated by introducing a better
read/write transaction separation.  background: GET-ting simple pages or
release files can cause write transactions in a devpi-server process
because we may need to retrive & cache information from pypi.python.org
or a devpi-server master.  Currently, during the processing of the GET
request we at some point promote a READ-transaction into a
WRITE-transaction through a call to keyfs.restart_as_write_transaction()
and persist what we have.  This all happens before the response to the
client is returned.  "Restarting as write" is somewhat brittle because
something might have changed since we started our long-running request.
Strategy and notes on how to mitigate all three issues:
- release files: cache and stream chunks of what we remotely receive,
  all within a READ-transaction and all within RAM. This should ideally
  be done in such a way that if multiple threads stream the same file,
  only one remote http request is done for fetching the file.  Otherwise
  we end up retrieving large files multiple times unneccessarily.  After
  the http response to the client is complete we (try to) write it to
  sqlite/filesystem so that subsequent requests can work from the local
  filesystem.  Here we need to be careful and consider that we might
  have multiple writers/streamers.  If we discover that where we want to
  write someone else already has we can simply forget about it.
- simple pages: first retrieve the remote simple page in RAM, process
  it, serve the full pyramid response and then (try to) cache it after
  the response is completed.  Here we probably don't need to care if
  multiple threads are trying to retrieve the same simple page because
  simple pages are not big.
- we cache things in RAM because even for large files it shouldn't
  matter given that servers typically have multiple gigabytes of RAM.
  And we can avoid synchronization issues wrt to the file system (see
  also the first issues where multiple processes write to the file
  system).
- we always finish response to the client before we attempt to do a
  write transaction.  The write transaction part should be implemented
  in a separate function to make it clear what kind of state we can rely
  on and what we must re-check.  (currently we do the READ->Write switch
  in the middle of a view function).
- we also need to review how exactly we open the sqlite DB for writing
  and if multiple processes correctly serialize on their write attempts,
  particularly in the multi-process case.
- care must be taken wrt to waitress and nginx configuration and their
  buffering, see for example:
  http://www.4byte.cn/question/68410/pyramid-stream-response-body.html
any feedback or thoughts welcome.
holger
--
about me:    http://holgerkrekel.net/about-me/
contracting: http://merlinux.eu
--
You received this message because you are subscribed to the Google Groups "devpi-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to devpi-dev+...@googlegroups.com.
To post to this group, send email to devp...@googlegroups.com.
Visit this group at http://groups.google.com/group/devpi-dev.
For more options, visit https://groups.google.com/d/optout.
-- 
about me:    http://holgerkrekel.net/about-me/
contracting: http://merlinux.eu