[devpi-dev] Re: Monitoring/Prometheus

5 Apr 2019

      I have attached you a screenshot that shows our existing Prometheus/Grafana dashboard for the Devpi Nginx instances. The interesting part is the spiky latency for both devpi-replicas. It is hard from the outside to judge what is going on internally that might cause those latency variations. For example, having additional metrics to detect when a replica is waiting for data from a master (or when a master is fetching data from a mirror) might help to pinpoint what is going on.

This is not a pressing issue for us right now. I mostly sharing this so that you get an idea of what could be of interest for users like us.

Have a nice weekend!
Stephan

On 05.04.19, 12:01, "Florian Schulze"  wrote:

    On 5 Apr 2019, at 10:22, Stephan Erb wrote:

    > Hi Florian,
    >
    > this is a great idea. We actually thought about implementing something 
    > like this but never had the chance yet.
    >
    > We are running Nginx in front of our Devpi instances and therefore 
    > already have sufficient Prometheus metrics coverage of HTTP requests, 
    > request latencies, etc.  What would still be helpful:
    >
    > * The master serial, current serial, and processed event serial. This 
    > would allow us to easily alert on lagging replicas.
    > * The number of keyfs cache hits and cache misses so that we know when 
    > to tune the keyfs-cache-size

    I thought of the above myself.

    > * Some internal counters to figure out when and how often we are 
    > running into expired mirror caches.

    Could you elaborate on this?

    All the above would be possible solely from a plugin, so I think I will 
    go that route first.

    Regards,
    Florian Schulze