Give ipaddresses an __index__ method

This idea is inspired by Eric Osborne's post "Extending __format__ method in ipaddress", but I wanted to avoid derailing that thread. I notice what seems to be an inconsistency in the ipaddress objects: py> v4 = ipaddress.IPv4Address('1.2.3.4') py> bin(v4) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'IPv4Address' object cannot be interpreted as an integer But that's surely not right: we just need to explicitly do so: py> bin(int(v4)) '0b1000000100000001100000100' IP addresses are, in a strong sense, integers: either 32 or 128 bits. And they can be explicitly converted losslessly to and from integers: py> v4 == ipaddress.IPv4Address(int(v4)) True Is there a good reason not to give them an __index__ method so that bin(), oct() and hex() will work directly? py> class X(ipaddress.IPv4Address): ... def __index__(self): ... return int(self) ... py> a = X('1.2.3.4') py> bin(a) '0b1000000100000001100000100' I acknowledge one potentially undesirable side-effect: this would allow using IP addresses as indexes into sequences: py> 'abcdef'[X('0.0.0.2')] 'c' but while it's weird to do this, I don't think it's logically wrong. -- Steve

On Thu, Feb 15, 2018 at 11:18 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Except that this computer's IPv4 is not 3232235539, and I never want to enter it that way. I enter it as 192.168.0.19 - as four separate integers. The __index__ method means "this thing really is an integer, and can be used as an index". With IPv6, similar: you think about them as eight separate blocks of digits. IP addresses can be losslessly converted to and from strings, too, and that's a lot more useful. But they still don't have string methods, because they're not strings.
That's not a side effect. That is the *primary* effect of __index__. If you call int() on something, you are *converting* it to an integer (eg int(2.3) ==> 2), and IMO that is the appropriate way to turn 192.168.0.19 into 3232235539 if ever you want that. Unless you have a use-case for using IP addresses as integers, distinct from Eric's ideas? ChrisA

On Thu, Feb 15, 2018 at 11:45:46AM +1100, Chris Angelico wrote:
That's partly convention (and a useful convention: it is less error- prone than 3232235539) and partly that because you're a sys admin who can read the individual subfields of an IP address. I'm not suggesting you ought to change your habit. But to civilians, 192.168.0.19 is as opaque as 3232235539 or 0xC0A80013 would be. We allow creating IP address objects from a single int, we don't require four separate int arguments (one for each subfield), and unless I've missed something, IP addresses are not treated as a collection of four separate integers (or more for v6). I can't even find a method to split an address into four ints. (Nor am I sure that there is good reason to want to do so.) So calling a single address "four separate integers" is not really accurate. [...]
I agree they're not strings, I never suggested they were. Python only allows IP addresses to be entered as strings because we don't have a "dotted-quad" syntax for 32-bit integers. (Nor am I suggesting we should.) It is meaningless to perform string operations on IP addresses. What would it mean to call addr.replace('.', 'Z') or addr.split('2')? But doing *at least some* int operations on addresses isn't meaningless: py> a = ipaddress.ip_address('192.168.0.19') py> a + 1 IPv4Address('192.168.0.20') -- Steve

On Thu, Feb 15, 2018 at 3:14 PM, Steven D'Aprano <steve@pearwood.info> wrote:
To some people, any form of address is as opaque as any other, true. (That's part of why we have DNS.) Also true, however, is that the conventional notation has value and meaning. That's partly historical (before CIDR, all networks were sized as either class A (10.x.y.z for any x, y, z), class B (172.16.x.y), or class C (192.168.0.x)), partly self-perpetuating (we use a lot of /24 addresses in local networks, not because we HAVE to, but because a /24 lets you lock three parts of the address and have the last part vary), but also definitely practical.
The most common way to create an IPv4Address object is to construct it from a string, which has the four separate integers in it. The dots delimit those integers. It's not an arbitrary string; it is most definitely a tuple of four integers, represented in its standard string notation. Simply because it's not actually the Python type Tuple[Int] doesn't mean it isn't functionally and logically a sequence of numbers. And if ever you actually do have the four integers, you can use a one-liner anyway:
How meaningful is that, when you don't have the netmask?
If that's a /24, one of those is a broadcast address, one is an unrelated network address, and one is an unrelated host address. "Adding 1" to an IP address is meaningless. And it definitely does NOT mean that IP addresses should have __index__, because that implies that they truly are integers, which would mean you could do something like this:
The __int__ method *converts* something to an integer. Nobody is disagreeing that you can convert an IP address into an integer. But they are NOT integers. It doesn't make sense to treat one as an integer implicitly. ChrisA

On Thu, 15 Feb 2018 15:14:03 +1100, Steven D'Aprano wrote:
There was a lengthy discussion (or more than one) about supporting decimal unicode code point literals. Is U+03B1 (GREEK SMALL LETTER ALPHA) somehow less clear than X+945? 192.168.0.19 speaks volumes, but 3232235539 is not only opaque, but also obtuse.
py> a = ipaddress.ip_address('192.168.1.255')
py> a + 1 IPv4Address('192.168.1.256')
Uh, oh. py> a = ipaddress.ip_address('255.255.255.255')
py> a + 1
Mu? Yes, if I were writing a DHCP server, the notion of "the next IP address that meets certain constraints, or an exception if no such address exists" has meaning. But it's not as simple as "ip + 1." Dan

On 15 February 2018 at 10:18, Steven D'Aprano <steve@pearwood.info> wrote:
That error message should probably either have an "implicitly" in it, or else use the word "handled" rather than "interpreted". There are tests that ensure IP addresses don't implement __index__, and the pragmatic reason for that is the downside you mentioned: to ensure they can't be used as indices, slice endpoints, or range endpoints. While IP addresses can be converted to an integer, they are *not* integers in any mathematical sense, and it doesn't make sense to treat them that way. A useful heuristic for answering the question "Should this type implement __index__?" is "Does this type conform to the numbers.Integral ABC?" (IP addresses definitely don't, as there's no concept of addition, subtraction, multiplication, division, etc - they're discrete entities with a numeric representation, not numbers) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Feb 15, 2018 at 01:39:13PM +1000, Nick Coghlan wrote:
If it is an intentional decision to disallow treating IP addresses as integers implicitly, I guess that is definite then. No change. I can see that this is a reasonable decision for pragmatic reasons. However, for the record (and under no illusion that I'll change your mind *wink*) ...
I really don't think this is strictly correct. IP addresses already support adding to regular ints, and conceptually they are indexes into a 32-bit or 128-bit space. They define "successor" and "predecessor" relations via addition and subtraction, which is pretty much all you need to build all other int operations from, mathematically speaking. (Actually, you don't even need predecessor.) I think there's a good case to make that they are ordinal numbers (each IP address uniquely specifies a logical position in a sequence from 0 to 2**32-1). Python ints already do quadruple duty as: - ordinal numbers, e.g. indexing, "string".find("r"); - cardinal numbers, e.g. counting, len("string"); - nominal numbers, e.g. id(obj); - subset of the Reals in the numeric tower. But anyway, at this point the discussion is getting rather esoteric. I accept the argument from pragmatism that the benefit of supporting __index__ is less than the disadvantage, so I think we're done here :-) -- Steve

On Thu, Feb 15, 2018 at 11:18 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Except that this computer's IPv4 is not 3232235539, and I never want to enter it that way. I enter it as 192.168.0.19 - as four separate integers. The __index__ method means "this thing really is an integer, and can be used as an index". With IPv6, similar: you think about them as eight separate blocks of digits. IP addresses can be losslessly converted to and from strings, too, and that's a lot more useful. But they still don't have string methods, because they're not strings.
That's not a side effect. That is the *primary* effect of __index__. If you call int() on something, you are *converting* it to an integer (eg int(2.3) ==> 2), and IMO that is the appropriate way to turn 192.168.0.19 into 3232235539 if ever you want that. Unless you have a use-case for using IP addresses as integers, distinct from Eric's ideas? ChrisA

On Thu, Feb 15, 2018 at 11:45:46AM +1100, Chris Angelico wrote:
That's partly convention (and a useful convention: it is less error- prone than 3232235539) and partly that because you're a sys admin who can read the individual subfields of an IP address. I'm not suggesting you ought to change your habit. But to civilians, 192.168.0.19 is as opaque as 3232235539 or 0xC0A80013 would be. We allow creating IP address objects from a single int, we don't require four separate int arguments (one for each subfield), and unless I've missed something, IP addresses are not treated as a collection of four separate integers (or more for v6). I can't even find a method to split an address into four ints. (Nor am I sure that there is good reason to want to do so.) So calling a single address "four separate integers" is not really accurate. [...]
I agree they're not strings, I never suggested they were. Python only allows IP addresses to be entered as strings because we don't have a "dotted-quad" syntax for 32-bit integers. (Nor am I suggesting we should.) It is meaningless to perform string operations on IP addresses. What would it mean to call addr.replace('.', 'Z') or addr.split('2')? But doing *at least some* int operations on addresses isn't meaningless: py> a = ipaddress.ip_address('192.168.0.19') py> a + 1 IPv4Address('192.168.0.20') -- Steve

On Thu, Feb 15, 2018 at 3:14 PM, Steven D'Aprano <steve@pearwood.info> wrote:
To some people, any form of address is as opaque as any other, true. (That's part of why we have DNS.) Also true, however, is that the conventional notation has value and meaning. That's partly historical (before CIDR, all networks were sized as either class A (10.x.y.z for any x, y, z), class B (172.16.x.y), or class C (192.168.0.x)), partly self-perpetuating (we use a lot of /24 addresses in local networks, not because we HAVE to, but because a /24 lets you lock three parts of the address and have the last part vary), but also definitely practical.
The most common way to create an IPv4Address object is to construct it from a string, which has the four separate integers in it. The dots delimit those integers. It's not an arbitrary string; it is most definitely a tuple of four integers, represented in its standard string notation. Simply because it's not actually the Python type Tuple[Int] doesn't mean it isn't functionally and logically a sequence of numbers. And if ever you actually do have the four integers, you can use a one-liner anyway:
How meaningful is that, when you don't have the netmask?
If that's a /24, one of those is a broadcast address, one is an unrelated network address, and one is an unrelated host address. "Adding 1" to an IP address is meaningless. And it definitely does NOT mean that IP addresses should have __index__, because that implies that they truly are integers, which would mean you could do something like this:
The __int__ method *converts* something to an integer. Nobody is disagreeing that you can convert an IP address into an integer. But they are NOT integers. It doesn't make sense to treat one as an integer implicitly. ChrisA

On Thu, 15 Feb 2018 15:14:03 +1100, Steven D'Aprano wrote:
There was a lengthy discussion (or more than one) about supporting decimal unicode code point literals. Is U+03B1 (GREEK SMALL LETTER ALPHA) somehow less clear than X+945? 192.168.0.19 speaks volumes, but 3232235539 is not only opaque, but also obtuse.
py> a = ipaddress.ip_address('192.168.1.255')
py> a + 1 IPv4Address('192.168.1.256')
Uh, oh. py> a = ipaddress.ip_address('255.255.255.255')
py> a + 1
Mu? Yes, if I were writing a DHCP server, the notion of "the next IP address that meets certain constraints, or an exception if no such address exists" has meaning. But it's not as simple as "ip + 1." Dan

On 15 February 2018 at 10:18, Steven D'Aprano <steve@pearwood.info> wrote:
That error message should probably either have an "implicitly" in it, or else use the word "handled" rather than "interpreted". There are tests that ensure IP addresses don't implement __index__, and the pragmatic reason for that is the downside you mentioned: to ensure they can't be used as indices, slice endpoints, or range endpoints. While IP addresses can be converted to an integer, they are *not* integers in any mathematical sense, and it doesn't make sense to treat them that way. A useful heuristic for answering the question "Should this type implement __index__?" is "Does this type conform to the numbers.Integral ABC?" (IP addresses definitely don't, as there's no concept of addition, subtraction, multiplication, division, etc - they're discrete entities with a numeric representation, not numbers) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Feb 15, 2018 at 01:39:13PM +1000, Nick Coghlan wrote:
If it is an intentional decision to disallow treating IP addresses as integers implicitly, I guess that is definite then. No change. I can see that this is a reasonable decision for pragmatic reasons. However, for the record (and under no illusion that I'll change your mind *wink*) ...
I really don't think this is strictly correct. IP addresses already support adding to regular ints, and conceptually they are indexes into a 32-bit or 128-bit space. They define "successor" and "predecessor" relations via addition and subtraction, which is pretty much all you need to build all other int operations from, mathematically speaking. (Actually, you don't even need predecessor.) I think there's a good case to make that they are ordinal numbers (each IP address uniquely specifies a logical position in a sequence from 0 to 2**32-1). Python ints already do quadruple duty as: - ordinal numbers, e.g. indexing, "string".find("r"); - cardinal numbers, e.g. counting, len("string"); - nominal numbers, e.g. id(obj); - subset of the Reals in the numeric tower. But anyway, at this point the discussion is getting rather esoteric. I accept the argument from pragmatism that the benefit of supporting __index__ is less than the disadvantage, so I think we're done here :-) -- Steve
participants (4)
-
Chris Angelico
-
Dan Sommers
-
Nick Coghlan
-
Steven D'Aprano