Hi,
I'm playing around with devpi themes.
My goal is to create a nice theme for my company.
I'd like to add extra information to some pages, like the index page or the
main page.
I added a snippet of code which produces an "index inheritance diagram" on
each index page display.
Because the company I work for is not that small, these diagrams can get
strange.
A sample diagram: index inheritance diagram
<https://lh3.googleusercontent.com/--9yr5WSohOc/VZ0BL6xN3YI/AAAAAAAAAAs/8wZq…>.
Each box is an index, I intentionally removed all user and index names,
except for root/pypi.
It's not my indices, please don't judge me for the design :)
I'm concerned about performance here. First of all, currently I generate
those on display action, which doesn't seem the way to go.
Could devpi provide a hook so that I'm able to regenerate a diagram on an
index update event?
Additionally, I'd like to add more information to an index page and the
root pages. So the question is: Could functions in devpi_web.views expose
more data in get_index() and root()? Potentially other functions?
I know I can always get the data with python snippets in the templates, but
it all takes precious time.
For instance, in my company, some indices are regular user indices, some of
them are project indices.
I'd like to sort the list of indices on the root page: display the project
ones first, and user indices below.
The way I would see this is the custom_data property of an index.
The bottom line: I think people could easily create fancier themes if the
devpi web views functions exposed more data to the templates, even if that
data is not used in the default theme.
It would be great to get the full indexconfig json structure for each index
in both get_index() and root() functions from devpi_web.views module.
Now I'm not sure how much more data others might want to get there.
I hope these things make sense.
Any thoughts?
Hi again,
we just did a hotfix release, devpi-server-2.5.1, which fixes
a regression with replicas, thanks to Stephan Erb for reporting
and Florian Schulze for fixing.
holger
2.5.1 (2015-11-20)
------------------
- fix issue289: fix simple page serving on replicas
Hi!
We want to support external mirrors other than PyPI.
tl;dr
The basics are mostly clear. Migration from a mirror could use input.
Mirroring of devpi indexes needs more thought.
I'd say we only support mirrors that provide a PyPI simple view where
each project has one folder.
For "findlinks" like private package storages which have all packages in
one folder, I recommend to use CheesePrism and mirror that. I don't
think we could support that lazily, as each file needs to be inspected
for metadata, so we would have to download everything.
When creating a mirror index, you provide the root url of the simple
view (like https://pypi.python.org/simple/) and you can set the cache
expiry time (defaults to 1800 seconds).
The index settings are "mirror_url" and "mirror_cache_expiry".
Mirrors are lazy, exactly like root/pypi is at the moment.
To enable migration, we add support to "push" from a mirror into a
regular index (also useful to selectively get pypi packages into an
index) and look into "bulk pushing". Any thoughts on this?
We should rename "pypi_whitelist" into "mirror_whitelist" or something
like that.
We will do a 3.0.0 anyway and can switch this during export/import.
Should we add a setting that exempts a mirror from the whitelisting, or
should we just get people to migrate to a regular index to avoid the
need to whitelist packages?
What about mirroring a devpi index?
I see 3 scenarios:
1. Just mirror +simple of the index.
+ works now
- problem is, that all inherited indexes and pypi would be mirrored as
well.
2. Mirror a view which excludes mirrors.
3. Mirror a view that excludes any inherited indexes.
I think both 2 and 3 have valid use cases.
We need good names for the URL, I didn't come up with good ones yet (is
there one word for "not inheriting").
We could later optimize devpi mirroring in various ways. For example
with a serial header and with a notification protocol, though I'm not
sure more than the serial is really necessary for most use cases.
Regards,
Florian Schulze
We just released devpi-{server,web}-2.5.0 to pypi, see changelogs below
for more details. While it's not required to do an export/import cycle
for this release it's recommended especially if you are running with replicas.
Docs for the private pypi packaging server at: http://doc.devpi.net
Thanks to Florian Schulze, Jason R. Coombs and all issue reporters.
For your information, we are now starting work for devpi-server-3.0
which will introduce further speedups, internal code simplifications
and new features (like mirroring from arbitrary pypi-servers).
cheers,
holger krekel
server-2.5.0 (2015-11-19)
-------------------------
- fix a regression of 2.3.0 which would cause many write-transactions
for mirrored simple-page entries that didn't change. Previous to the fix,
accesses to mirrored simple pages will result in a new
write-transaction every 30 minutes if the page is accessed which
is likely on a somewhat busy site. If you running with replicas
it is recommended to do an an export/import cycle to remove all
the unneccessary writes that were produced since devpi-server-2.3.0.
They delay the setup of new replicas considerably.
- add info about pypi_whitelist on simple page when root/pypi is blocked for
a project.
- replica simple-page serving will not unneccessarily wait for new
simple-page entries to arrive at the replication side if the master
does not return any changes in the initial simple-page request.
Previously a replica would wait for the replication-thread to catch
up even if no links changed.
- fix setup.py to work on py34 and with LANG="C" environments.
Thanks Jason R. Coombs.
- fix issue284: allow users who are listed in acl_upload to delete packages
web-2.5.0 (2015-11-19)
----------------------
- fix issue288: classifiers rendering wrong with read only data views
- index.pt, project.pt, version.pt: added info about pypi_whitelist. This
requires devpi-server > 2.4.0 to work.
- fix issue286: indexing of most data failed due to new read only views
Hi Stephan, Florian, all,
while the last release speed up simple-page serving it helped more for
the root/pypi case than the private indexes. I just did some benchmarking
for private indexes with roughly 200 release file entries. I got around
34 requests per second on my machine and the attached proflog-2.4.0
profile. Most of the time is spent in "get_linkstore_perstage" and
"get_versiondata_perstage" because we store each version under its own
key, resulting in hundreds of db-accesses and dependent processing.
So I introduced a per-project db-persisted "simplelinks" cache similar
to the root/pypi one and got 133 requests per second, roughly 4 times
better, also see "proflog-simplecache" profile attached. There are
still some easy improvements that can be made on top. The diff is quite
simple but it would currently require an export-import:
https://bitbucket.org/hpk42/devpi/branch/simplecache#diff
It's likely possible to avoid the export/import if neccessary.
If you'd like to repeat the measurements you can import this devpi-server state:
http://merlinux.eu/~hpk/exp1.tar.gz
either from devpi-server-2.4.0 or from the branch and after server-startup run:
ab -H "User-Agent: pip/6.0" -n 100 http://localhost:3141/hpk/dev/+simple/dddttt
The "dddttt" project also has releases on pypi.python.org.
Maybe after merging the simplecache branch we could
also cleanup how we internally treat projectnames: always use
normalized names for keys in the DB so that we can simply compute
them without any db-access (currently we need to access the database
to get from a name to the "real projectname", e.g. to "Django" from "django").
It's a long standing issue to simplify this and we would anyway require
an export-import cycle for it.
best,
holger
--
about me: http://holgerkrekel.net/about-me/
contracting: http://merlinux.eu
On Sat, Nov 07, 2015 at 13:43 +0100, Florian Schulze wrote:
> I'd say before we add such high level caches we first try to improve
> the performance differently.
Identifying bottle-necks and improving overall performance also
makes sense. But simple-page serving is special because it's
the one thing that is needed by pip and must be handled by devpi-server.
Other than serving simple requests for pip i don't think performance
is a problem with devpi-server.
> As you noticed by the writeup, cache invalidation is hard and adds a
> lot of complexity. It's much easier to try to do as many things on
> write that are repeated for each read. That's also some kind of
> caching, but on a lower level at which we know exactly what to
> invalidate.
>
> Another idea. If we can quickly get the current serial of each index
> that is involved for rendering a simple page, then we can add an
> etag header that is used by a caching proxy like varnish. Then we
> would only provide a cache key, but the storage and invalidation is
> handled for us.
If we got all serials of all versiondata's for each based index
we'd have a key we could use, also as ETAG. You would still need to
special-handle mirroring as mentioned below.
FWIW the below cache invalidation is somewhat complex but all of the
parts are straight-forward IMHO. And the root "caching" object can be
tested fully independently. All items below (except for mirroring)
below would then just add a single line of code IISIC. It would
also be easy to make caching optional.
best,
holger
> Regards,
> Florian Schulze
>
>
> On 7 Nov 2015, at 13:28, holger krekel wrote:
>
> >Hi Stephan,
> >
> >so with the recent PRs we do get rid of "copy_if_mutable" and
> >thanks mostly
> >to your PRs simple-page serving is twice as fast as before, on my
> >machine
> >it's 170 requests per seconds.
> >
> >I just hacked a simple-project serving cache (without any invalidation
> >which is the hard part) which gets us to ~550 requests per seconds.
> >I think we could get faster even if we bypassed all transaction
> >machinery which is implicitely used for each request. But first
> >steps first.
> >
> >Regarding cache invalidation i think this is the simplest approach:
> >
> >- maintain a per-index LRUcache (we already have a utility class for
> >that in keyfs) which maps projectnames to simple pages and use/fill
> >it from the simple page serving as it is now.
> >
> >- if an index config changes (rare event) kill caches of that index
> >and all inheriting indexes.
> >
> >- if a projectname changes kill caches of that index and all
> >inheriting
> >cache's entries for this projectname.
> >
> >- at startup time build a RAM data structure which tells us for
> >each index about all dependent indexes (currently we only
> >have the bases of an index,). This data structure needs
> >to be updated when a "bases" property of an index changes.
> >It also tells us if an index ultimately uses a mirroring index,
> >currently only root/pypi.
> >This data structure can be and should be fully unittested
> >without invoking any devpi machinery. It also needs to
> >be thread safe.
> >
> >- part of the "do we have a cache-hit" check is to see if
> >we depend on a mirroring index and if it's timeout has been
> >reached. If so we kill the cache and thus let the normal
> >current logic run.
> >
> >- The data structure which maps index names to per-index LRUcache
> >instances can live on the XOM object.
> >
> >Any comments, further thoughts on this?
> >We could otherwise put this into an issue for anyone who wants to
> >tackle
> >it (you? :)
> >
> >holger
> >
> >
> >
> >On Wed, Nov 04, 2015 at 17:57 +0000, Erb, Stephan wrote:
> >>Hi Holger,
> >>
> >>I like the idea of getting rid of copy_if_mutable in some way or
> >>the other.
> >>
> >>Pyrsistent looks very promising. However, I am not sure if it is
> >>thread-safe (doesn't look like it). So we would have to be
> >>careful here.
> >>
> >>
> >>Best Regards,
> >>Stephan
> >>________________________________________
> >>From: holger krekel <hol...(a)merlinux.eu>
> >>Sent: Tuesday, November 3, 2015 6:29 PM
> >>To: Erb, Stephan
> >>Cc: holger krekel; devp...(a)googlegroups.com
> >>Subject: Re: [devpi-dev] improving concurrency, reliability of
> >>devpi-server
> >>
> >>Hi Stephan,
> >>
> >>On Fri, Oct 30, 2015 at 14:10 +0000, Erb, Stephan wrote:
> >>>Hi Holger,
> >>>
> >>>in order not to de-rail this discussion any further, I have
> >>>performed a brain dumb in a separate ticket:
> >>>
> >>>https://bitbucket.org/hpk42/devpi/issues/280/devpi-performance-issues
> >>
> >>Thanks. FWIW I am wondering if we could avoid "copy_if_mutable"
> >>alltogether.
> >>We'd need a recursive dict proxy which does what
> >>"copy_if_mutable" does but
> >>lazily, e.g.:
> >>
> >> d = {"a": [1,2,3], "b": set()}
> >> d2 = make_recursive_readonly_proxy(d)
> >>
> >> d2["b"] = 3 # would give an readonly error
> >> d2["a"].append(4) # would give an readonly error
> >> "x" in d2["b"] # would be true
> >> ...
> >>
> >>Is anybody aware of such a proxy? I found
> >>
> >> https://pypi.python.org/pypi/dictproxyhack
> >>
> >>but it only offers a non-recursive readonly dict interface
> >>so the above readonly-errors would not occur. It's not too hard
> >>to do an implementation which suffices for devpi-server purposes
> >>but if there is a readymade solid solution we could use it.
> >>
> >>FWIW i have also been thinking of using "pyrsistent", a well thought
> >>out library to work with immutable data structures:
> >>
> >> http://pyrsistent.readthedocs.org/
> >>
> >>It would help to avoid some programming errors and allows to avoid
> >>accidental modifications. The basic idea is that any modifying
> >>operation returns a new reference:
> >>
> >> >>> map1 = pyrsistent.m(a=3, b=4)
> >> >>> map2 = map1.set("x", 5)
> >> >>> map1
> >> pmap({'a': 3})
> >> >>> map2
> >> pmap({'a': 3, 'x': 2})
> >>
> >>There is no way to modify the map1 reference.
> >>
> >>best,
> >>holger
> >>
> >>
> >>
> >>
> >>>Best Regards,
> >>>Stephan
> >>>________________________________________
> >>>From: holger krekel <hol...(a)merlinux.eu>
> >>>Sent: Friday, October 30, 2015 11:23 AM
> >>>To: Erb, Stephan
> >>>Cc: holger krekel; devp...(a)googlegroups.com
> >>>Subject: Re: [devpi-dev] improving concurrency, reliability of
> >>>devpi-server
> >>>
> >>>Hi Stephan,
> >>>
> >>>On Fri, Oct 30, 2015 at 10:06 +0000, Erb, Stephan wrote:
> >>>>Hi Holger,
> >>>>
> >>>>what's your impression on the additional code complexity
> >>>>that will introduce?
> >>>
> >>>There is some increased code complexity but i think we should
> >>>be able to contain
> >>>it in separately-testable classes/functions.
> >>>
> >>>>We are currently facing an other set of concurrency and
> >>>>performance problems in devpi. We easily have 300 package
> >>>>versions per +simple page of a package. A single request
> >>>>takes 0.2 up to 1 second. When there are multiple concurrent
> >>>>read requests (~10) the latency goes up significantly.
> >>>
> >>>Do you have profiling data for the 300-package per simple page
> >>>scenarios?
> >>>Most of the time spent in get_releaselinks?
> >>>
> >>>>Still, this problem is manageable and we are working on a
> >>>>few performance patches to improve the situation. However, I
> >>>>fear that the large rework proposed here might make the code
> >>>>more difficult and thus more difficult to tune.
> >>>
> >>>I don't suspect the two efforts clash much. What did you do so far?
> >>>
> >>>That said, we are currently caching at "list of release file links"
> >>>level and i think it's worthwhile to check if we should rather
> >>>cache at
> >>>the simple-page layer. Apart from performance improvements it
> >>>also has
> >>>potential to simplify the code if we manage to only have caching at
> >>>simple-page and not in addition to the releaselinks-caching.
> >>>
> >>>best,
> >>>holger
> >>>
> >>>>Regards,
> >>>>Stephan
> >>>>
> >>>>________________________________________
> >>>>From: devp...(a)googlegroups.com
> >>>><devp...(a)googlegroups.com> on behalf of holger krekel
> >>>><hol...(a)merlinux.eu>
> >>>>Sent: Thursday, October 29, 2015 2:23 PM
> >>>>To: devpi-dev
> >>>>Subject: [devpi-dev] improving concurrency, reliability of
> >>>>devpi-server
> >>>>
> >>>>Hi Florian, all,
> >>>>
> >>>>there are at least three issues that somewhat interelate and
> >>>>share the
> >>>>common topic of service reliability, concurrency and
> >>>>interactions with
> >>>>remote pypi.python.org or devpi masters:
> >>>>
> >>>>https://bitbucket.org/hpk42/devpi/issues/267/intermittent-assertionerror-in…
> >>>>
> >>>> multiple devpi-server processes write to the same
> >>>>(networked shared) file system
> >>>> resulting in failed transaction handling. devpi-server was not
> >>>> designed for it.
> >>>>
> >>>>
> >>>>https://bitbucket.org/hpk42/devpi/issues/274/recurring-consistency-issues-w…
> >>>>
> >>>> under high load database/transaction handling issues arise.
> >>>> (although it's unclear what the precise scenario is, how to
> >>>>replicate)
> >>>>
> >>>>
> >>>>https://bitbucket.org/hpk42/devpi/issues/208/pip-gets-timeout-on-large-pack…
> >>>>
> >>>> trying to install an uncached package that originates from
> >>>>pypi.python.org
> >>>> can fail if devpi-server cannot download the package fast enough.
> >>>>
> >>>>
> >>>>Starting with the last issue, we probably need to
> >>>>re-introduce a way to
> >>>>stream remote files instead of first retrieving it in full
> >>>>and only then
> >>>>starting a client response . This should take into account
> >>>>that there
> >>>>could be two threads (or even two processes) which try to
> >>>>retrieve the
> >>>>same file. This means that we start a response as soon as
> >>>>we got a http
> >>>>return code and them forward-stream the content.
> >>>>
> >>>>The first two issues could be mitigated by introducing a better
> >>>>read/write transaction separation. background: GET-ting
> >>>>simple pages or
> >>>>release files can cause write transactions in a devpi-server
> >>>>process
> >>>>because we may need to retrive & cache information from
> >>>>pypi.python.org
> >>>>or a devpi-server master. Currently, during the processing
> >>>>of the GET
> >>>>request we at some point promote a READ-transaction into a
> >>>>WRITE-transaction through a call to
> >>>>keyfs.restart_as_write_transaction()
> >>>>and persist what we have. This all happens before the
> >>>>response to the
> >>>>client is returned. "Restarting as write" is somewhat
> >>>>brittle because
> >>>>something might have changed since we started our
> >>>>long-running request.
> >>>>
> >>>>Strategy and notes on how to mitigate all three issues:
> >>>>
> >>>>- release files: cache and stream chunks of what we remotely
> >>>>receive,
> >>>>all within a READ-transaction and all within RAM. This
> >>>>should ideally
> >>>>be done in such a way that if multiple threads stream the
> >>>>same file,
> >>>>only one remote http request is done for fetching the file.
> >>>>Otherwise
> >>>>we end up retrieving large files multiple times
> >>>>unneccessarily. After
> >>>>the http response to the client is complete we (try to) write it to
> >>>>sqlite/filesystem so that subsequent requests can work from
> >>>>the local
> >>>>filesystem. Here we need to be careful and consider that we might
> >>>>have multiple writers/streamers. If we discover that where
> >>>>we want to
> >>>>write someone else already has we can simply forget about it.
> >>>>
> >>>>- simple pages: first retrieve the remote simple page in
> >>>>RAM, process
> >>>>it, serve the full pyramid response and then (try to) cache
> >>>>it after
> >>>>the response is completed. Here we probably don't need to care if
> >>>>multiple threads are trying to retrieve the same simple page
> >>>>because
> >>>>simple pages are not big.
> >>>>
> >>>>- we cache things in RAM because even for large files it shouldn't
> >>>>matter given that servers typically have multiple gigabytes of RAM.
> >>>>And we can avoid synchronization issues wrt to the file system (see
> >>>>also the first issues where multiple processes write to the file
> >>>>system).
> >>>>
> >>>>- we always finish response to the client before we attempt to do a
> >>>>write transaction. The write transaction part should be
> >>>>implemented
> >>>>in a separate function to make it clear what kind of state
> >>>>we can rely
> >>>>on and what we must re-check. (currently we do the
> >>>>READ->Write switch
> >>>>in the middle of a view function).
> >>>>
> >>>>- we also need to review how exactly we open the sqlite DB
> >>>>for writing
> >>>>and if multiple processes correctly serialize on their write
> >>>>attempts,
> >>>>particularly in the multi-process case.
> >>>>
> >>>>- care must be taken wrt to waitress and nginx configuration
> >>>>and their
> >>>>buffering, see for example:
> >>>>http://www.4byte.cn/question/68410/pyramid-stream-response-body.html
> >>>>
> >>>>any feedback or thoughts welcome.
> >>>>
> >>>>holger
> >>>>
> >>>>--
> >>>>about me: http://holgerkrekel.net/about-me/
> >>>>contracting: http://merlinux.eu
> >>>>
> >>>>--
> >>>>You received this message because you are subscribed to the
> >>>>Google Groups "devpi-dev" group.
> >>>>To unsubscribe from this group and stop receiving emails
> >>>>from it, send an email to
> >>>>devpi-dev+...(a)googlegroups.com.
> >>>>To post to this group, send email to devp...(a)googlegroups.com.
> >>>>Visit this group at http://groups.google.com/group/devpi-dev.
> >>>>For more options, visit https://groups.google.com/d/optout.
> >>>>
> >>>
> >>>--
> >>>about me: http://holgerkrekel.net/about-me/
> >>>contracting: http://merlinux.eu
> >>
> >>--
> >>about me: http://holgerkrekel.net/about-me/
> >>contracting: http://merlinux.eu
> >
> >--
> >about me: http://holgerkrekel.net/about-me/
> >contracting: http://merlinux.eu
> >
> >--
> >You received this message because you are subscribed to the Google
> >Groups "devpi-dev" group.
> >To unsubscribe from this group and stop receiving emails from it,
> >send an email to devpi-dev+...(a)googlegroups.com.
> >To post to this group, send email to devp...(a)googlegroups.com.
> >Visit this group at http://groups.google.com/group/devpi-dev.
> >For more options, visit https://groups.google.com/d/optout.
>
--
about me: http://holgerkrekel.net/about-me/
contracting: http://merlinux.eu
We just pushed devpi-{server,web,client,common} release files out to pypi.
Most notably, the private pypi package server allows much faster installs
due to much improved simple-page serving speed. See the changelog
below for a host of other changes and fixes as well as for compatibility
warnings.
Docs about the devpi system are to be found here:
http://doc.devpi.net
Many thanks to my co-maintainer Florian Schulze and particularly
to Stephan Erb and Chad Wagner for their contributions.
cheers,
holger
--
about me: http://holgerkrekel.net/about-me/
contracting: http://merlinux.eu
devpi-server 2.4.0 (2015-11-11)
-------------------------------
- NOTE: devpi-server-2.4 is compatible to data from devpi-server-2.3 but
not the other way round. Once you run devpi-server-2.4 you can not go
back. It's always a good idea to make a backup before trying a new version :)
- NOTE: if you use "--logger-cfg" with .yaml files you will need to
install pyyaml yourself as devpi-server-2.4 dropped it as a direct
dependency as it does not install for win32/python3.5 and is
not needed for devpi-server operations except for logging configuration.
Specifying a *.json file always works.
- add timeout to replica requests
- fix issue275: improve error message when a serverdir exists but has no
version
- improve testing mechanics and name normalization related to storing doczips
- refine keyfs to provide lazy deep readonly-views for
dict/set/list/tuple types by default. This introduces safety because
users (including plugins) of keyfs-values can only write/modify a value
by explicitly getting it with readonly=False (thereby deep copying it)
and setting it with the transaction. It also allows to avoid unnecessary
copy-operations when just reading values.
- fix issue283: pypi cache didn't work for replicas.
- performance improvements for simple pages with lots of releases.
this also changed the db layout of the caching from pypi.python.org mirrors
but will seamlessly work on older data, see NOTE at top.
- add "--profile-requests=NUM" option which turns on per-request
profiling and will print out after NUM requests are executed
and then restart profiling.
- fix tests for pypy. We officially support pypy now.
devpi-client-2.3.2 (2015-11-11)
-------------------------------
- fix git submodules for devpi upload. ``.git`` is a file not a folder for
submodules. Before this fix the repository which contains the submodule was
found instead, which caused a failure, because the files aren't tracked there.
- new option "devpi upload --setupdir-only" which will only
vcs-export the directory containing setup.py. You can also
set "setupdirs-only = 1" in the "[devpi:upload]" section
of setup.cfg for the same effect. Thanks Chad Wagner for the PR.
devpi-web 2.4.2 (2015-11-11)
----------------------------
- log exceptions during search index updates.
- adapted tests/code to work with devpi-server-2.4
devpi-common 2.0.8 (2015-11-11)
-------------------------------
- fix URL.joinpath to not add double slashes
Hi Florian, all,
there are at least three issues that somewhat interelate and share the
common topic of service reliability, concurrency and interactions with
remote pypi.python.org or devpi masters:
https://bitbucket.org/hpk42/devpi/issues/267/intermittent-assertionerror-in…
multiple devpi-server processes write to the same (networked shared) file system
resulting in failed transaction handling. devpi-server was not
designed for it.
https://bitbucket.org/hpk42/devpi/issues/274/recurring-consistency-issues-w…
under high load database/transaction handling issues arise.
(although it's unclear what the precise scenario is, how to replicate)
https://bitbucket.org/hpk42/devpi/issues/208/pip-gets-timeout-on-large-pack…
trying to install an uncached package that originates from pypi.python.org
can fail if devpi-server cannot download the package fast enough.
Starting with the last issue, we probably need to re-introduce a way to
stream remote files instead of first retrieving it in full and only then
starting a client response . This should take into account that there
could be two threads (or even two processes) which try to retrieve the
same file. This means that we start a response as soon as we got a http
return code and them forward-stream the content.
The first two issues could be mitigated by introducing a better
read/write transaction separation. background: GET-ting simple pages or
release files can cause write transactions in a devpi-server process
because we may need to retrive & cache information from pypi.python.org
or a devpi-server master. Currently, during the processing of the GET
request we at some point promote a READ-transaction into a
WRITE-transaction through a call to keyfs.restart_as_write_transaction()
and persist what we have. This all happens before the response to the
client is returned. "Restarting as write" is somewhat brittle because
something might have changed since we started our long-running request.
Strategy and notes on how to mitigate all three issues:
- release files: cache and stream chunks of what we remotely receive,
all within a READ-transaction and all within RAM. This should ideally
be done in such a way that if multiple threads stream the same file,
only one remote http request is done for fetching the file. Otherwise
we end up retrieving large files multiple times unneccessarily. After
the http response to the client is complete we (try to) write it to
sqlite/filesystem so that subsequent requests can work from the local
filesystem. Here we need to be careful and consider that we might
have multiple writers/streamers. If we discover that where we want to
write someone else already has we can simply forget about it.
- simple pages: first retrieve the remote simple page in RAM, process
it, serve the full pyramid response and then (try to) cache it after
the response is completed. Here we probably don't need to care if
multiple threads are trying to retrieve the same simple page because
simple pages are not big.
- we cache things in RAM because even for large files it shouldn't
matter given that servers typically have multiple gigabytes of RAM.
And we can avoid synchronization issues wrt to the file system (see
also the first issues where multiple processes write to the file
system).
- we always finish response to the client before we attempt to do a
write transaction. The write transaction part should be implemented
in a separate function to make it clear what kind of state we can rely
on and what we must re-check. (currently we do the READ->Write switch
in the middle of a view function).
- we also need to review how exactly we open the sqlite DB for writing
and if multiple processes correctly serialize on their write attempts,
particularly in the multi-process case.
- care must be taken wrt to waitress and nginx configuration and their
buffering, see for example:
http://www.4byte.cn/question/68410/pyramid-stream-response-body.html
any feedback or thoughts welcome.
holger
--
about me: http://holgerkrekel.net/about-me/
contracting: http://merlinux.eu