Extending __format__ method in ipaddress

Folks- The ipaddress library returns an IP address object which can represent itself in a number of ways: In [1]: import ipaddress In [2]: v4 = ipaddress.IPv4Address('1.2.3.4') In [3]: print(v4) 1.2.3.4 In [4]: v4 Out[4]: IPv4Address('1.2.3.4') In [6]: v4.packed Out[6]: b'\x01\x02\x03\x04' In [9]: str(v4) Out[9]: '1.2.3.4' In [10]: int(v4) Out[10]: 16909060 In [13]: bin(int(v4)) Out[13]: '0b1000000100000001100000100' In [14]: hex(int(v4)) Out[14]: '0x1020304' In [15]: oct(int(v4)) Out[15]: '0o100401404' There are IPv6 objects as well: In [6]: v6 = ipaddress.IPv6Address('2001:0db8:85a3:0000:0000:8a2e:0370:7334') In [7]: int(v6) Out[7]: 42540766452641154071740215577757643572 and what I'm proposing will work for both address families. In either case, bin/hex/oct don't work on them directly, but on the integer representation. This is a little annoying but not such a big deal. What is a big deal (at least to me) is that the binary representation isn't zero-padded. This makes it harder to compare two IP addresses by eye to see what the differences are, i.e.: In [16]: a = ipaddress.IPv4Address('0.2.3.4') In [30]: bin(int(a)) Out[30]: '0b100000001100000100' In [31]: bin(int(v4)) Out[31]: '0b1000000100000001100000100' It would be nice if there was a way to have an IP address always present itself in fully zero-padded binary (32 bits for IPv4, 128 bits for IPv6). I find this particularly convenient when putting together training material, as it's easier to show subnetting and aggregation if you point at the binary than if you give people dotted-quad addresses and ask them to do the binary conversion in their head. Hex is also handy when you're comparing a dotted-quad IP address to a hex sniffer trace. It's possible to do this in a one-liner (thanks to Eric Smith): f'{int(v4):#0{34}b}'. But this is a little cryptic. I opened bpo-32820 (https://github.com/python/cpython/pull/5627) to contribute a way to do this. I started with an __index__ method but Issue 15559 ( https://github.com/python/cpython/commit/e0c3f5edc0f20cc28363258df501758c1bd...) rules this out. I instead added a bits() class method so that v4.bits would return the fully padded string. This was not terribly pretty, but it mirrored packed(), at least. Nick Coghlan suggested I instead extend __format__, which is what the diffs in the current pull request do. This allows a great deal more flexibility: the current code takes 'b', 'n', or 'x' types, as well as the '#' option and support for the '_' separator. I realize now I didn't add 'o' but I certainly can for completeness. I debated adding rfc1924 encoding for IPv6 addresses but decided it was entirely too silly. This is just a convenience function, but IMO fills a need. Is this worth pursuing? eric

On 15 February 2018 at 08:29, Eric Osborne <eric@notcom.com> wrote:
+1 from me (unsurprisingly). We added __format__ specifically to give types more control over how they're printed, and this approach is amenable to the simple explanation that the custom IP address formatting works via: - conversion to int - printing in a fixed width field (based on the address size) - in binary or hex based on either the given format character, or the address size ("n", where IPv4=b and IPv6=x) - with a suitable prefix if "#" is given - with four-digit separators if "_" is given
I realize now I didn't add 'o' but I certainly can for completeness.
I'd suggest leaving it out, as octal characters are 3 bits each, so they don't have a natural association with IP address representations any more than decimal representation does (neither 32 nor 128 are divisible by 3).
I debated adding rfc1924 encoding for IPv6 addresses but decided it was entirely too silly.
Yeah, if we decided to support that, we likely *would* add a separate method for it. __format__ works well for "print an IP address as an integer with zero-padding and an automatically calculated field width" though, since we can borrow the notation from regular integer formatting to select the digit base and tweak the display details. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15 February 2018 at 08:29, Eric Osborne <eric@notcom.com> wrote:
+1 from me (unsurprisingly). We added __format__ specifically to give types more control over how they're printed, and this approach is amenable to the simple explanation that the custom IP address formatting works via: - conversion to int - printing in a fixed width field (based on the address size) - in binary or hex based on either the given format character, or the address size ("n", where IPv4=b and IPv6=x) - with a suitable prefix if "#" is given - with four-digit separators if "_" is given
I realize now I didn't add 'o' but I certainly can for completeness.
I'd suggest leaving it out, as octal characters are 3 bits each, so they don't have a natural association with IP address representations any more than decimal representation does (neither 32 nor 128 are divisible by 3).
I debated adding rfc1924 encoding for IPv6 addresses but decided it was entirely too silly.
Yeah, if we decided to support that, we likely *would* add a separate method for it. __format__ works well for "print an IP address as an integer with zero-padding and an automatically calculated field width" though, since we can borrow the notation from regular integer formatting to select the digit base and tweak the display details. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (3)
-
Eric Osborne
-
Ethan Furman
-
Nick Coghlan