[Twisted-Python] Writing a low-level network debugging tool
Hi all, I'm trying to toubleshoot network connectivity issues we have in one of our office and I would like to monitor some metrics which seems to be relevant for us, especially when trying to open TCP connections towards external endpoints. In particular, I'm looking for a way to get the following information (let's say I want to monitor the connectivity towards the swisscom.com, port 80 using TCP): * how long does it take to resolv the domain name to (at least) one of its IP address - against a specified name server or using the system configured servers - how many tries did it require * if there were several tries, the timing of each ones * how long does it take to get the first bytes of the endpoint - how long does it take to complete the TCP connection handshake - the status of the packets exchanged (how many retries, how many packets lost, etc.) It's not exactly the same, but the curl option --write-out allows to get this kind of values (especially time_namelookup, time_connect, time_pretransfer, time_starttransfer and time_total) but I would like to have more flexibility and more in-depth informations (like the state of the packets exchanged, etc.) How far can I do this kind of things with Twisted? I know I can somewhat easily get the timings of the name resolution, the TCP connection handshake also and the time to first byte(s), but what about the packets? I haven't look at the code of Twisted Names yet, but if it's doing the DNS request by itself, I may be able to plug-in somewhere and have my request counter and the timers associated, but I'm not sure if the underlying details of the TCP protocol are exposed to the upper layer such as Twisted? Thanks for the help! Jonathan
On 27/11/15 14:05, Jonathan Ballet wrote:
* how long does it take to resolv the domain name to (at least) one of its IP address - against a specified name server or using the system configured servers
That is relatively straightforward.
- how many tries did it require * if there were several tries, the timing of each ones
Typically, application-layer code doesn't retry a DNS lookup; rather the c or other runtime will handle this, for example getaddrinfo() in glibc, according to settings read from /etc/resolv.conf or compiled-in defaults. So it depends on whether you want to emulate "typical" application code, a specific application stack that may or may not do it's own resolution (e.g. modern browsers) or something else.
* how long does it take to get the first bytes of the endpoint - how long does it take to complete the TCP connection handshake - the status of the packets exchanged (how many retries, how many packets lost, etc.)
Some of this is available in platform-specific APIs e.g. SIOCGSTAMP and TCP_INFO socket options available on Linux. In general, any timings you make based on return of control from kernel will include error relating to system/scheduling issues. If you're concerned about getting raw, on-the-wire timings, this is extremely difficult without being in-kernel, and even then various issues - TCP offload for example - can end up hiding data from you.
How far can I do this kind of things with Twisted? I know I can somewhat easily get the timings of the name resolution, the TCP connection handshake also and the time to first byte(s), but what about the packets? I haven't look at the code of Twisted Names yet, but if it's doing the DNS request by itself, I may be able to plug-in somewhere and have my request counter and the timers associated, but I'm not sure if the underlying details of the TCP protocol are exposed to the upper layer such as Twisted?
Only via platform-specific options. To do this kind of thing "reliably", you'd need to reimplement TCP in user-space. But the info above may be a helpful start.
On 27/11/15 14:39, Phil Mayers wrote:
But the info above may be a helpful start.
You may also want to look at TCP_CC_INFO. See the kernel source: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/ipv... ...for an idea of what else is available. Be aware some of these are relatively new.
On 11/27/2015 03:39 PM, Phil Mayers wrote:
On 27/11/15 14:05, Jonathan Ballet wrote:
- how many tries did it require * if there were several tries, the timing of each ones
Typically, application-layer code doesn't retry a DNS lookup; rather the c or other runtime will handle this, for example getaddrinfo() in glibc, according to settings read from /etc/resolv.conf or compiled-in defaults.
So it depends on whether you want to emulate "typical" application code, a specific application stack that may or may not do it's own resolution (e.g. modern browsers) or something else.
That's a fair point, and I would like, as best as it can, to be as close as possible as a "typical" application; the goal really would be to measure the network conditions the applications are facing. Although I understand I won't be able to get the retries number if the underlying code is using getaddrinfo() or something like this, I was thinking that twisted.names was maybe offering a "hand-made" resolver, which was producing the UDP packets itself and offered a way to plug some code there to measure these retries; I haven't checked yet. But in any case, I guess it's going a little bit against my previous point which was to try to measure things "as used by a 'typical' application" (which isn't going to use twisted.names custom resolver.) (Actually, I was surprised to discover that, although it can be configured from the command-line, `dig` doesn't report the number of tries/retries it makes, you can only deduce it by looking at the overall command execution time.)
* how long does it take to get the first bytes of the endpoint - how long does it take to complete the TCP connection handshake - the status of the packets exchanged (how many retries, how many packets lost, etc.)
Some of this is available in platform-specific APIs e.g. SIOCGSTAMP and TCP_INFO socket options available on Linux.
In general, any timings you make based on return of control from kernel will include error relating to system/scheduling issues. If you're concerned about getting raw, on-the-wire timings, this is extremely difficult without being in-kernel, and even then various issues - TCP offload for example - can end up hiding data from you.
I will have a look at these options, as I will be running my tests under Linux anyway. It's probably out of scope of Twisted anyway, but I could also retrieve the packets sent on the wire by listening on the related network interface set in promiscuous mode and correlate packets together. It's ... "slightly" more work though...
How far can I do this kind of things with Twisted? I know I can somewhat easily get the timings of the name resolution, the TCP connection handshake also and the time to first byte(s), but what about the packets? I haven't look at the code of Twisted Names yet, but if it's doing the DNS request by itself, I may be able to plug-in somewhere and have my request counter and the timers associated, but I'm not sure if the underlying details of the TCP protocol are exposed to the upper layer such as Twisted?
Only via platform-specific options.
To do this kind of thing "reliably", you'd need to reimplement TCP in user-space.
But the info above may be a helpful start.
Thanks for your answer Phil, I'll see what I can come up with! Jonathan
participants (2)
-
Jonathan Ballet
-
Phil Mayers