[devpi-client] plugin for measuring replica latency
Hello everyone, We have a devpi deployment with several replicas distributed around the world. Sometimes, a developer, say in Australia (with his index set to the local replica), triggers a build which is most likely to occur in North America. We have seen checksum error (due to our so so network infrastructure) when pulling things across the WAN. Most people use a homegrown bootstrap script, which among other things, measures the response time between the host and known replicas and lock on the fastest one. This has been working really well for us and has mitigated our network related issues. However, some people use the devpi client directly which uses whatever server it’s been configured with. So I was wondering if, perhaps through some client plugin hooks, we could integrate that feature, that is, perform latency measurements and switch to the best replica on the fly. I am not sure if we could make a generic plugin as it would need to be aware of the replicas available in your deployment, but if we can, then we would release it (if there is an interest of course). Anyhow, I am just fishing here, but any input/suggestions would be greatly appreciated. Thanks in advance. /Laurent
Hi! I wrote a mail to this list at the end of 2016 about that. Unfortunately the mailman UI is currently down, so I can't link it. I'll copy it here instead: Hi! I was thinking about a way to let "devpi use" select a replica automatically. This is more of a brain dump for now. My current idea would be this: The primary server would provide a json file with a list of available replicas. When you invoke "devpi use" on the primary server, devpi-client would look for that list and then somehow select a good replica. The hard part is the selection of the replica. A simple solution would be to request the +api route on each replica, which is quick, and we measure the time it took. When we tried all replicas, we use the fastest reply. This has some obvious problems, like having to try all replicas, handling timeouts and momentary slowness of replies. I still think this would be a nice addition. One can still always explicitly "devpi use" a certain replica. IMO "devpi use" can take up to 2-3 seconds for the replica selection without making it painful for normal use. The mirror selection could also be done server side, by providing a dynamic replica list based on request IP or whatever. Initially it would be easiest to provide the replica list statically via nginx, because atm the primary only knows the IP address of replicas. This is because most installations use the X-Outside-Url header instead of the --outside-url option to provide more flexibility. We might also want to provide a way in devpi-client to know which replica belongs to a primary to share the login info. I guess the UUID would be useful for that. Regards, Florian Schulze On 19 May 2018, at 0:39, Brack, Laurent P. wrote:
Hello everyone,
We have a devpi deployment with several replicas distributed around the world. Sometimes, a developer, say in Australia (with his index set to the local replica), triggers a build which is most likely to occur in North America. We have seen checksum error (due to our so so network infrastructure) when pulling things across the WAN.
Most people use a homegrown bootstrap script, which among other things, measures the response time between the host and known replicas and lock on the fastest one. This has been working really well for us and has mitigated our network related issues.
However, some people use the devpi client directly which uses whatever server it’s been configured with. So I was wondering if, perhaps through some client plugin hooks, we could integrate that feature, that is, perform latency measurements and switch to the best replica on the fly.
I am not sure if we could make a generic plugin as it would need to be aware of the replicas available in your deployment, but if we can, then we would release it (if there is an interest of course).
Anyhow, I am just fishing here, but any input/suggestions would be greatly appreciated.
Thanks in advance.
/Laurent
_______________________________________________ devpi-dev mailing list devpi-dev@python.org https://mail.python.org/mm3/mailman3/lists/devpi-dev.python.org/
On May 19, 2018 at 1:32:11 AM, Florian Schulze (mail@florian-schulze.net<mailto:mail@florian-schulze.net>) wrote: Hi! I wrote a mail to this list at the end of 2016 about that. Unfortunately the mailman UI is currently down, so I can't link it. I'll copy it here instead: We must have missed that one :). Glad you remember though. BTW, are you suggesting that this becomes part of the devpi client core features (it seems so) or an add on? Hi! I was thinking about a way to let "devpi use" select a replica automatically. This is more of a brain dump for now. My current idea would be this: The primary server would provide a json file with a list of available replicas. When you invoke "devpi use" on the primary server, devpi-client would look for that list and then somehow select a good replica. There is always the danger that the primary server goes down. So perhaps the replica information could be … replicated, so no matter which server a user has its client pointed to (lets call it the primary server), it can get access to that list (without having to reach out to the master). Now there is also the case where the replica (your primary) goes down. I was thinking that every time the client gets the replica information from its “primary server”, it caches it (even perhaps with some access time statistics computed over time). If new replicas appear (or disappear) this data gets updated. So say the client tries to access its primary server and doesn’t get a response within the average access time plus a certain tolerance, it reverts to the information found in the cached data (excluding the server that failed of course). The hard part is the selection of the replica. A simple solution would be to request the +api route on each replica, which is quick, and we measure the time it took. When we tried all replicas, we use the fastest reply. This has some obvious problems, like having to try all replicas, handling timeouts and momentary slowness of replies. I still think this would be a nice addition. One can still always explicitly "devpi use" a certain replica. IMO "devpi use" can take up to 2-3 seconds for the replica selection without making it painful for normal use. I think we have this part pretty much nailed down in our homegrown script and I have to say that it has been working very nicely and reliably of the last couple of year. I am pretty sure we can propose our implementation as a starting point and optimize it if needs be. We get something like: Auto-detecting fastest devpi server by contacting servers and measuring response time. Server https://devpi-us.dolby.net responded within 0.362 seconds. devpi server: https://devpi-us.dolby.net devpi index: gti/prod The mirror selection could also be done server side, by providing a dynamic replica list based on request IP or whatever. IMHO I would leave the decision up to the client to figure this out. Initially it would be easiest to provide the replica list statically via nginx, because atm the primary only knows the IP address of replicas. This is because most installations use the X-Outside-Url header instead of the --outside-url option to provide more flexibility. We might also want to provide a way in devpi-client to know which replica belongs to a primary to share the login info. I guess the UUID would be useful for that. Yup. I mean in our case we are far more concerned about robust read access but I can see this being important for people who do perform uploads. Cheers/Laurent Regards, Florian Schulze On 19 May 2018, at 0:39, Brack, Laurent P. wrote: Hello everyone, We have a devpi deployment with several replicas distributed around the world. Sometimes, a developer, say in Australia (with his index set to the local replica), triggers a build which is most likely to occur in North America. We have seen checksum error (due to our so so network infrastructure) when pulling things across the WAN. Most people use a homegrown bootstrap script, which among other things, measures the response time between the host and known replicas and lock on the fastest one. This has been working really well for us and has mitigated our network related issues. However, some people use the devpi client directly which uses whatever server it’s been configured with. So I was wondering if, perhaps through some client plugin hooks, we could integrate that feature, that is, perform latency measurements and switch to the best replica on the fly. I am not sure if we could make a generic plugin as it would need to be aware of the replicas available in your deployment, but if we can, then we would release it (if there is an interest of course). Anyhow, I am just fishing here, but any input/suggestions would be greatly appreciated. Thanks in advance. /Laurent _______________________________________________ devpi-dev mailing list devpi-dev@python.org https://mail.python.org/mm3/mailman3/lists/devpi-dev.python.org/<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.python.org_mm3_mailman3_lists_devpi-2Ddev.python.org_&d=DwMFaQ&c=lI8Zb6TzM3d1tX4iEu7bpg&r=GMRrlHWXYLHda0cjqKvqUA&m=p_a7ZcZvZQ5ycc9w552IPyEHAQXEZtVyh4qRe7WsEqY&s=i0M5MQkcysnaIfWTKqRlszw9ndaJOW4wvzaEcIsPrU8&e=>
I apologize for the spam but Holger made me realize that somehow my email client didn’t quote things properly (even though it looked fine on my side … works on my machine kind of thing). So here it is again, and hopefully, at least the formatting makes sense this time :) On May 19, 2018 at 1:32:11 AM, Florian Schulze (mail@florian-schulze.net) wrote:
Hi!
I wrote a mail to this list at the end of 2016 about that. Unfortunately the mailman UI is currently down, so I can't link it. I'll copy it here instead:
We must have missed that one :). Glad you remember though. BTW, are you suggesting that this becomes part of the devpi client core features (it seems so) or an add on?
Hi!
I was thinking about a way to let "devpi use" select a replica automatically.
This is more of a brain dump for now. My current idea would be this:
The primary server would provide a json file with a list of available replicas. When you invoke "devpi use" on the primary server, devpi-client would look for that list and then somehow select a good replica.
There is always the danger that the primary server goes down. So perhaps the replica information could be … replicated, so no matter which server a user has its client pointed to (lets call it the primary server), it can get access to that list (without having to reach out to the master). Now there is also the case where the replica (your primary) goes down. I was thinking that every time the client gets the replica information from its “primary server”, it caches it (even perhaps with some access time statistics computed over time). If new replicas appear (or disappear) this data gets updated. So say the client tries to access its primary server and doesn’t get a response within the average access time plus a certain tolerance, it reverts to the information found in the cached data (excluding the server that failed of course).
The hard part is the selection of the replica.
A simple solution would be to request the +api route on each replica, which is quick, and we measure the time it took. When we tried all replicas, we use the fastest reply. This has some obvious problems, like having to try all replicas, handling timeouts and momentary slowness of replies. I still think this would be a nice addition. One can still always explicitly "devpi use" a certain replica. IMO "devpi use" can take up to 2-3 seconds for the replica selection without making it painful for normal use.
I think we have this part pretty much nailed down in our homegrown script and I have to say that it has been working very nicely and reliably of the last couple of year. I am pretty sure we can propose our implementation as a starting point and optimize it if needs be. We get something like: Auto-detecting fastest devpi server by contacting servers and measuring response time. Server https://devpi-us.dolby.net responded within 0.362 seconds. devpi server: https://devpi-us.dolby.net devpi index: gti/prod
The mirror selection could also be done server side, by providing a dynamic replica list based on request IP or whatever.
IMHO I would leave the decision up to the client to figure this out.
Initially it would be easiest to provide the replica list statically via nginx, because atm the primary only knows the IP address of replicas. This is because most installations use the X-Outside-Url header instead of the --outside-url option to provide more flexibility.
We might also want to provide a way in devpi-client to know which replica belongs to a primary to share the login info. I guess the UUID would be useful for that.
Yup. I mean in our case we are far more concerned about robust read access but I can see this being important for people who do perform uploads. Cheers/Laurent
Regards, Florian Schulze
On 19 May 2018, at 0:39, Brack, Laurent P. wrote:
Hello everyone,
We have a devpi deployment with several replicas distributed around the world. Sometimes, a developer, say in Australia (with his index set to the local replica), triggers a build which is most likely to occur in North America. We have seen checksum error (due to our so so network infrastructure) when pulling things across the WAN.
Most people use a homegrown bootstrap script, which among other things, measures the response time between the host and known replicas and lock on the fastest one. This has been working really well for us and has mitigated our network related issues.
However, some people use the devpi client directly which uses whatever server it’s been configured with. So I was wondering if, perhaps through some client plugin hooks, we could integrate that feature, that is, perform latency measurements and switch to the best replica on the fly.
I am not sure if we could make a generic plugin as it would need to be aware of the replicas available in your deployment, but if we can, then we would release it (if there is an interest of course).
Anyhow, I am just fishing here, but any input/suggestions would be greatly appreciated.
Thanks in advance.
/Laurent
_______________________________________________ devpi-dev mailing list devpi-dev@python.org https://mail.python.org/mm3/mailman3/lists/devpi-dev.python.org/
On 22 May 2018, at 22:43, Brack, Laurent P. wrote:
On May 19, 2018 at 1:32:11 AM, Florian Schulze (mail@florian-schulze.net) wrote:
Hi!
I wrote a mail to this list at the end of 2016 about that. Unfortunately the mailman UI is currently down, so I can't link it. I'll copy it here instead:
We must have missed that one :). Glad you remember though. BTW, are you suggesting that this becomes part of the devpi client core features (it seems so) or an add on?
At some point it can go into the core, but it makes sense to explore as a plugin first.
I was thinking about a way to let "devpi use" select a replica automatically.
This is more of a brain dump for now. My current idea would be this:
The primary server would provide a json file with a list of available replicas. When you invoke "devpi use" on the primary server, devpi-client would look for that list and then somehow select a good replica.
There is always the danger that the primary server goes down. So perhaps the replica information could be … replicated, so no matter which server a user has its client pointed to (lets call it the primary server), it can get access to that list (without having to reach out to the master).
Now there is also the case where the replica (your primary) goes down. I was thinking that every time the client gets the replica information from its “primary server”, it caches it (even perhaps with some access time statistics computed over time). If new replicas appear (or disappear) this data gets updated.
So say the client tries to access its primary server and doesn’t get a response within the average access time plus a certain tolerance, it reverts to the information found in the cached data (excluding the server that failed of course).
All of this should be explored in the plugin. I think we need a hook in get_index_url method in devpi/use.py and maybe additional hooks or at least some API in devpi-client for the cache etc to support this from a plugin. As for the server side, since we would use a new endpoint like +mirrors this can be done statically via the webserver or dynamically via a devpi-server plugin.
The hard part is the selection of the replica.
A simple solution would be to request the +api route on each replica, which is quick, and we measure the time it took. When we tried all replicas, we use the fastest reply. This has some obvious problems, like having to try all replicas, handling timeouts and momentary slowness of replies. I still think this would be a nice addition. One can still always explicitly "devpi use" a certain replica. IMO "devpi use" can take up to 2-3 seconds for the replica selection without making it painful for normal use.
I think we have this part pretty much nailed down in our homegrown script and I have to say that it has been working very nicely and reliably of the last couple of year. I am pretty sure we can propose our implementation as a starting point and optimize it if needs be.
We get something like:
Auto-detecting fastest devpi server by contacting servers and measuring response time. Server https://devpi-us.dolby.net responded within 0.362 seconds. devpi server: https://devpi-us.dolby.net devpi index: gti/prod
Sounds good.
The mirror selection could also be done server side, by providing a dynamic replica list based on request IP or whatever.
IMHO I would leave the decision up to the client to figure this out.
The server can still influence it via the returned list of mirrors, if it only returns one, the client has no choice.
Initially it would be easiest to provide the replica list statically via nginx, because atm the primary only knows the IP address of replicas. This is because most installations use the X-Outside-Url header instead of the --outside-url option to provide more flexibility.
We might also want to provide a way in devpi-client to know which replica belongs to a primary to share the login info. I guess the UUID would be useful for that.
Yup. I mean in our case we are far more concerned about robust read access but I can see this being important for people who do perform uploads.
This is certainly something which can be explored later on.
Cheers/Laurent
Regards, Florian Schulze
Regards, Florian Schulze
On 19 May 2018, at 0:39, Brack, Laurent P. wrote:
Hello everyone,
We have a devpi deployment with several replicas distributed around the world. Sometimes, a developer, say in Australia (with his index set to the local replica), triggers a build which is most likely to occur in North America. We have seen checksum error (due to our so so network infrastructure) when pulling things across the WAN.
Most people use a homegrown bootstrap script, which among other things, measures the response time between the host and known replicas and lock on the fastest one. This has been working really well for us and has mitigated our network related issues.
However, some people use the devpi client directly which uses whatever server it’s been configured with. So I was wondering if, perhaps through some client plugin hooks, we could integrate that feature, that is, perform latency measurements and switch to the best replica on the fly.
I am not sure if we could make a generic plugin as it would need to be aware of the replicas available in your deployment, but if we can, then we would release it (if there is an interest of course).
Anyhow, I am just fishing here, but any input/suggestions would be greatly appreciated.
Thanks in advance.
/Laurent
_______________________________________________ devpi-dev mailing list devpi-dev@python.org https://mail.python.org/mm3/mailman3/lists/devpi-dev.python.org/
participants (2)
-
Brack, Laurent P.
-
Florian Schulze