Hi Florian,
this is a great idea. We actually thought about implementing something like this but never had the chance yet.
We are running Nginx in front of our Devpi instances and therefore already have sufficient Prometheus metrics coverage of HTTP requests, request latencies, etc. What would still be helpful:
* The master serial, current serial, and processed event serial. This would allow us to easily alert on lagging replicas.
* The number of keyfs cache hits and cache misses so that we know when to tune the keyfs-cache-size
* Some internal counters to figure out when and how often we are running into expired mirror caches.
Best regards,
Stephan
On 05.04.19, 09:34, "Florian Schulze"