Monitoring/Prometheus
Hi! I'm currently reading up on monitoring and specifically Prometheus. I'm thinking of adding instrumentation to devpi and create a devpi-prometheus plugin for exposition. The instrumentation is pretty lightweight and shouldn't affect anyone who won't use it. Has anyone here used Prometheus and can provide some insights on what kind of metrics would be useful? Maybe someone has pointers to other Python/Pyramid software which already added that? Regards, Florian Schulze
Hi Florian, this is a great idea. We actually thought about implementing something like this but never had the chance yet. We are running Nginx in front of our Devpi instances and therefore already have sufficient Prometheus metrics coverage of HTTP requests, request latencies, etc. What would still be helpful: * The master serial, current serial, and processed event serial. This would allow us to easily alert on lagging replicas. * The number of keyfs cache hits and cache misses so that we know when to tune the keyfs-cache-size * Some internal counters to figure out when and how often we are running into expired mirror caches. Best regards, Stephan On 05.04.19, 09:34, "Florian Schulze" <mail@florian-schulze.net> wrote: Hi! I'm currently reading up on monitoring and specifically Prometheus. I'm thinking of adding instrumentation to devpi and create a devpi-prometheus plugin for exposition. The instrumentation is pretty lightweight and shouldn't affect anyone who won't use it. Has anyone here used Prometheus and can provide some insights on what kind of metrics would be useful? Maybe someone has pointers to other Python/Pyramid software which already added that? Regards, Florian Schulze _______________________________________________ devpi-dev mailing list -- devpi-dev@python.org To unsubscribe send an email to devpi-dev-leave@python.org https://mail.python.org/mailman3/lists/devpi-dev.python.org/
On 5 Apr 2019, at 10:22, Stephan Erb wrote:
Hi Florian,
this is a great idea. We actually thought about implementing something like this but never had the chance yet.
We are running Nginx in front of our Devpi instances and therefore already have sufficient Prometheus metrics coverage of HTTP requests, request latencies, etc. What would still be helpful:
* The master serial, current serial, and processed event serial. This would allow us to easily alert on lagging replicas. * The number of keyfs cache hits and cache misses so that we know when to tune the keyfs-cache-size
I thought of the above myself.
* Some internal counters to figure out when and how often we are running into expired mirror caches.
Could you elaborate on this? All the above would be possible solely from a plugin, so I think I will go that route first. Regards, Florian Schulze
We have a sidecar that scrapes the +status page and exports metrics with a prometheus_client. "status" would be helpful too. Other than the contents of status page, we have disk usage metrics which is not really a devpi function. On Fri, Apr 5, 2019 at 6:01 AM Florian Schulze <mail@florian-schulze.net> wrote:
On 5 Apr 2019, at 10:22, Stephan Erb wrote:
Hi Florian,
this is a great idea. We actually thought about implementing something like this but never had the chance yet.
We are running Nginx in front of our Devpi instances and therefore already have sufficient Prometheus metrics coverage of HTTP requests, request latencies, etc. What would still be helpful:
* The master serial, current serial, and processed event serial. This would allow us to easily alert on lagging replicas. * The number of keyfs cache hits and cache misses so that we know when to tune the keyfs-cache-size
I thought of the above myself.
* Some internal counters to figure out when and how often we are running into expired mirror caches.
Could you elaborate on this?
All the above would be possible solely from a plugin, so I think I will go that route first.
Regards, Florian Schulze _______________________________________________ devpi-dev mailing list -- devpi-dev@python.org To unsubscribe send an email to devpi-dev-leave@python.org https://mail.python.org/mailman3/lists/devpi-dev.python.org/
-- Regards Venkatesh Email venkatesh.thirumale@gmail.com Mobile: 857 272 2125
I have attached you a screenshot that shows our existing Prometheus/Grafana dashboard for the Devpi Nginx instances. The interesting part is the spiky latency for both devpi-replicas. It is hard from the outside to judge what is going on internally that might cause those latency variations. For example, having additional metrics to detect when a replica is waiting for data from a master (or when a master is fetching data from a mirror) might help to pinpoint what is going on. This is not a pressing issue for us right now. I mostly sharing this so that you get an idea of what could be of interest for users like us. Have a nice weekend! Stephan On 05.04.19, 12:01, "Florian Schulze" <mail@florian-schulze.net> wrote: On 5 Apr 2019, at 10:22, Stephan Erb wrote: > Hi Florian, > > this is a great idea. We actually thought about implementing something > like this but never had the chance yet. > > We are running Nginx in front of our Devpi instances and therefore > already have sufficient Prometheus metrics coverage of HTTP requests, > request latencies, etc. What would still be helpful: > > * The master serial, current serial, and processed event serial. This > would allow us to easily alert on lagging replicas. > * The number of keyfs cache hits and cache misses so that we know when > to tune the keyfs-cache-size I thought of the above myself. > * Some internal counters to figure out when and how often we are > running into expired mirror caches. Could you elaborate on this? All the above would be possible solely from a plugin, so I think I will go that route first. Regards, Florian Schulze
participants (3)
-
Florian Schulze
-
Stephan Erb
-
Venkatesh