
There's a bunch of FutureWarnings e.g. about 0xffffffff<<1 that promise they will disappear in Python 2.4. If anyone has time to fix these, I'd appreciate it. (It's not just a matter of removing the FutureWarnings -- you actually have to implement the promised future behavior. :-) I may get to these myself, but they're not exactly rocket science, so they might be a good thing for a beginning developer (use SF please if you'd like someone to review the changes first). Another -- much bigger -- TODO is to implement generator expressions (PEP 289). Raymond asked for help but I don't think he got any, unless it was offered through private email. Anyone interested? (Of course, I don't want any of this to interfere with the work to get 2.3.3 out in December.) --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido van Rossum]
I've submitted a patch (http://python.org/sf/849227). And yes, somebody should probably take a good look at it before applying. The (modified) test suite does pass on my machine, but that's all. I may well have forgotten to add tests for new special cases, and I'm not the most experienced C programmer on the block either. As a side note, I think that line 233 in Lib/test/test_format.py if sys.maxint == 2**32-1: should be if sys.maxint == 2**31-1: but I didn't include that in the patch or submit a bug report. Should I? Peace, Kalle -- Kalle Svensson, http://www.juckapan.org/~kalle/ Student, root and saint in the Church of Emacs.

Thanks!
This definitely smells like a bug (I've never seen a machine with 33-bit ints :-) so feel free to submit a separate patch to SF. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Kalle Svensson]
Well, it looks like you got everything right. Congratulations! I've checked your code into CVS. There are now two pieces of PEP 237 unimplemented (apart from the complete and total eradication of long literals, which won't happen until 3.0). (1) PEP 237 promises that after the new semantics are introduced for hex/oct literals and conversions, and left shifts, operations that cause a different result than before will produce a warning that is on by default. Given the pain we've suffered through the warnings in 2.3 about this stuff, I propose to forget about these warnings. The new semantics are clear and consistent, warnings would just cause more distress, and code first ported to 2.3 will already have silenced the warnings. (2) PEP 237 promises that repr() of a long should no longer show a trailing 'L'. This is not yet implemented (i.e., repr() of a long still has a trailing 'L'). First, past experience suggests that quite a bit of end user code will break, and it may easily break silently: there used to be code that did str(x)[:-1] (knowing x was a long) to strip the 'L', which broke when str() of a long no longer returned a trailing 'L'. Apparently some of this code was "fixed" by changing str() into repr(), and this code will now break again. Second, I *like* seeing a trailing L on longs, especially when there's no reason for it to be a long: if some expression returns 1L, I know something fishy may have gone on. Any comments on these? Should I update PEP 237 to reflect this?
Fixed that too. But somebody might want to backport it to 2.3.3. --Guido van Rossum (home page: http://www.python.org/~guido/)

+1, The warnings cause more pain than they save. Part of the purpose of a warning is to leave you feeling unsettled -- I don't think that is a worthy goal when the code is going to work fine anyway. Let PyChecker or some such warn about prior version compatibility issues like that.
-0, The reasons are good but this one has been promised for several years. It's time for an L free python -- one less thing to have to learn. If there is transition difficultly, let it be a prompt to consider applying the forthcoming Decimal module. If necessary, we could add a debug mode switch for L's to be on or off. By putting it the debug build, we keep people from using it in production code. The purpose is to allow code to be run twice to see if different results are obtained. Also, we can put migration advice in PEP 290 and whatsnew24.tex to grep for indicators like [:-1] on the same line as long() or repr().
Should I update PEP 237 to reflect this?
Yes, that's better than surprising people later. Raymond

Yes, but people using type() or isinstance() or __class__ will still have to remember that there are two types of integers: int and long. And both built-ins will be with us for years, and they aren't quite aliases for each other (long('12') returns a long, but int('12') an int).
If there is transition difficultly, let it be a prompt to consider applying the forthcoming Decimal module.
This I don't understand.
But making a debug build is far from trivial (especially on Windows). Perhaps it should be a switch on the regular build but also produce a warning, to annoy. :-)
Also, we can put migration advice in PEP 290 and whatsnew24.tex to grep for indicators like [:-1] on the same line as long() or repr().
Can you take care of that?
Should I update PEP 237 to reflect this?
Yes, that's better than surprising people later.
I'll do that (in due time). --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
+1, and especially since it looks like 2.3 is going to become the next 1.5.2 (i.e., the version everyone flocks to, and then badgers you about for the next 20 years <wink>).
+1. Changing string representations is always traumatic (lots of programs rely on parsing them), and I have a hard time imagining what positive good could come from stripping the 'L'. Making that change for str(long) seemed like pure loss from my POV (broke stuff and helped nothing).
Any comments on these? Should I update PEP 237 to reflect this?
The PEP should reflect The Plan, sure.

On Sat, Nov 29, 2003, Guido van Rossum wrote:
That makes sense to me; there should be an easy way from Python to detect what kind of object you've got (as a string representation), and repr() is precisely the place for it. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.

[me]
[Aahz]
OK. I've got one weak (-0 from Raymond) opposition to this idea, and two strong agreements (+1 from Tim and Aahz), so I'm going to go ahead and make this change to the PEP. This pretty much makes PEP 237 finished except for the complete and utter eradication of the trailing 'L' (and probably of the 'long' type altogether) in Python 3.0. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Monday 01 December 2003 08:10 pm, Guido van Rossum wrote: This pretty much makes PEP 237
finished except for the complete and utter eradication of the trailing 'L' (and probably of the 'long' type altogether) in Python 3.0.
Would there still be an int type and a long type in Python 3.0, or would the notion of a long be be dropped. If it were dropped, then the int representation would be transparently represented as a long if the size of the number could not fit in an int. If long is dropped then the long function could be added to the list of builtins that will disappear in Python 3.0.

I'm not sure about that yet. I'd *like* to find a hack that lets the int type change representations, but the fact is that it's much easier to use different types to indicate different representations. But you're right, even if there are two types, there's probably no reason to expose the 'long' type as a builtin. So long should go on the list as *likely* to disappear in 3.0. --Guido van Rossum (home page: http://www.python.org/~guido/)

Aren't integers immutable? If so, I would think it doesn't make sense for them to change representation, as they don't change value. Anyway, if you want to use type to encode representation, I would think that the various integer types should be related by inheritance. As a long can always substitute for an int, at least in theory, I would think that long should be derived from int. Then isinstance(42L, int) would yield True. If integers are related this way, LSP implies that converting a long to a string should not put an L at the end.

I was using shorthand-speak meaning that different instances of the same class would use a different representation (which the class can somehow recognize by looking at the instance, of course).
Or should int be a subclass of long? I believe that OO theorists consider the base class the set with the largest number of elements (since it is the least constrained). Now, all ints are longs, but all longs are not ints, since ints can only represent values in [-sys.maxint-1, sys.maxint]. According to this reasoning, long should be the base class. But the naming suggests different: 'int' suggests no particular size (except to C/C++ users :-) so should be the more general class, and that pleads for your version. I don't particularly like either approach, because the 'long' type is an implementation detail over which the user has no control (this is a change of heart since Python's original design!). I guess that means I have to work harder and make the single int type support both representations. I'm sure it can be done.
If integers are related this way, LSP implies that converting a long to a string should not put an L at the end.
Well, they aren't in Python 2.x, which is why the L stays until 3.0. --Guido van Rossum (home page: http://www.python.org/~guido/)

Aren't integers immutable? If so, I would think it doesn't make sense for them to change representation, as they don't change value.
Got it.
I think that int should be the base class, because I can imagine long supporting operations that int does not support, but not vice versa.

Andrew Koenig <ark-mlist@att.net>
I think that int should be the base class, because I can imagine long supporting operations that int does not support, but not vice versa.
Perhaps 'int' should be an abstract class, with concrete subclasses 'long' and 'short'. And 'bool' would fit into this family as a sort of "extremely short int". Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

MWH:
That's quite a nice idea -- a bit like an Objective C class cluster.
I've been pondering this one too. The only downside I can think of is that code that inherits from class int to add some method (e.g. a class hexint whose repr() and str() call hex() instead) will no longer inherit any implementation, and thus won't be very useful. Inheriting from short or long doesn't quite solve the problem either. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Isn't the 'long' format the more general representation, with the 'int' format then being a performance hack to take advantage of the speed of C integer arithmetic? I'm just wondering if a different way of thinking about it might help with figuring out how to handle a combined implementation. Cheers, Nick.

Nick Coghlan:
I proposed that too, but Andrew Koenig didn't like it. --Guido van Rossum (home page: http://www.python.org/~guido/)

It's not me, it's Barbara Liskov at MIT.
:-) (For non-OO wizards, this is called "Liskov substitutability".)
So the question is, does long have operations that int doesn't have? And if so, why can't those operations be added to int? And if there's a reason, is it good enough? If the sets of operations are identical, is there a way to break the tie? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wednesday 03 December 2003 11:10 am, Guido van Rossum wrote:
Taking into account their difference in representation, a long can support 1<<32, but an int can't. I'm not saying that the "proper" (long) behavior can't be unified into a single type, just giving an example of an operation long supports but int doesn't. Jeremy

At 12:15 PM 12/3/03 -0500, Jeremy Fincher wrote:
Not so. Both '1' and '32' can be represented by int, so only the operation *result* needs to be a long. Further, if the idea here is that 'int' will be a subclass of 'long', then it's perfectly valid to return an int from any operation declared as returning a long. Further, since it's acceptable to *pass* an int for any argument declared 'long', it should suffice to use 'long' for all integer inputs and outputs of methods on 'long'. Anyway, if the idea is that 'long' will be the base class, IMO the name 'long' is confusing. It should probably be called 'integer', with the subclass being either 'int' or 'short'.

We're talking about a hypothetical int here where that operation returns 4294967296L. (Not so hypothetical, it's implemented in Python 2.4 in CVS.)
Sorry, your counterexample is rejected. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
Guido van Rossum wrote:
The 'performance hack' point of view I was trying to suggest was along the lines of: "Python integers are capable of storing values of arbitrary magnitude, subject only to the memory capacity of the machine. As a matter of performance, Python will use native C integers (and native arithmetic) when the stored value is small enough to fit." That is, I was agreeing with Guido's first point above.

Right. The implementation uses the type as the vtable of a C++ class, and the functions in the vtable "know" the representation, which is very different for short and long ints. This is all nice and efficient, except that the identity of the vtable is returned as the type of the object, so two different vtables means two different types. The alternative implementation technique is to have a single type/vtable, embed a flag in each instance that tells the implementation which representation is used, and have an "if" at the top of each implementation function switching between the two. The downslide should be obvious: not only an extra test+branch for each operation, but also extra space in the object (the short representation has no spare bits). --Guido van Rossum (home page: http://www.python.org/~guido/)

"Guido van Rossum" <guido@python.org> wrote in message news:200312031610.hB3GA1204029@c-24-5-183-134.client.comcast.net...
In this case, one could argue that no tie breaker is needed. If one accepts that int and long have the same operations but offer different implementations, then the obvious solution is: an abstract base class (Int) and two sub-classes (Short and Long?). Of course, this obvious solution has a problem: a programmer can't easily sub-class Int. They must choose one a sub-class from Short or Long (unless they want to provide a full implementation themselves) and that's probably not a decision they want to make. Perhaps, because of this, the GOF "Bridge Pattern" might be suitable here. (This pattern can be framed as: "adapt multiple implementations to an interface using delegation" -- which, after all, is pretty much what the vtable in previous solution gives you.) If the existence of Short and Long is an implementation detail best hidden from the python programmer, then a Bridge Pattern implementation has a lot going for it. Use of the Bridge pattern might even allow for three different implementations: PreDefined, Short, Long ... where PreDefined is one of the numbers between 0 and 256 which I'm given to understand are pre-allocated by the runtime environment. -- Mike

Mike Thompson <mike.thompson@day8.com.au>:
Of course, this obvious solution has a problem: a programmer can't easily sub-class Int.
I can't see that subclassing int is a particularly useful thing to do, even as matters stand today. As soon as you do any operation on your int subclass, you get a result which is not an instance of your subclass any more, which makes using it rather fragile. Does anyone have a real-life use case for subclassing int? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

On Thu, Dec 04, 2003 at 03:54:21PM +1300, Greg Ewing wrote:
Does anyone have a real-life use case for subclassing int?
Not me. Dylan does not allow it's concrete numeric classes to be subclassed. That allows efficient method dispatch given restrictive enough declarations. Here is the standard numberic classes: class number(object): # Open Abstract Class class complex(number): # Sealed Abstract Class class real(complex): # Sealed Abstract Class class float(real): # Sealed Abstract Class class single_float(float) # Sealed Class class double_float(float) # Sealed Class class rational(real): # Sealed Abstract Class class integer(rational): # Sealed Class complex is sealed which basically means that it cannot be subclassed by users. That allows Sealing is described here: http://www.gwydiondylan.org/drm/drm_70.htm. The language reference does not describe a "long" integer type but the implementation is free to provide one. Neil

Right, this seems to be one of the preferred solutions.
(It has been pointed out that subclassing int isn't very useful, so maybe this is a moot point. Does anybody have a real use case?)
Hmm... I'm not very familiar with the Bridge pattern (and my GoF book is on one of 65 boxes still in the garage waiting until we move into a larger house :-). Can you give a little more detail about how this would be done? Is it something that a user wishing to subclass Int should do, or something that the Int class (or its subclasses) should provide?
I don't think so -- the beauty is that (most of) the implementation doesn't know that certain int values have eternal life; their representation is exactly the same as that of regular ints. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Mike Thompson]
[Guido van Rossum]
(It has been pointed out that subclassing int isn't very useful, so maybe this is a moot point. Does anybody have a real use case?)
I don't. It just rubs my intuition up the wrong way for it not to be subclassable, but that'll be because I'm not a fully reformed OO Bigot ;-) [Mike Thompson]
[Guido van Rossum]
To me, Bridge Pattern is most useful when either: 1. There is a need to sub-class the abstraction (Int) other than for implementational purposes. 2. The implementation of the abstraction (Int) needs to change at runtime. So, given you doubt the need for 1. and that 2. was never needed because of Int-immutability, I should withdraw my suggestion in favour of the more simple abstract-Int-base-with-Long-And-Short-Iimplementation-in-sub-classes approach previously mentioned. BTW, a summary of most patterns can be found on-line at http://patterndigest.com/ [Guido van Rossum]
Its something that the writer of the Int class would use iff they wanted to ensure that programmers were oblivious to the two or three possible implementations of Int AND one of the two conditions I mentioned above (needed subclassability or runtime-change in implementation) held true. [Mike Thompson]
[Guido van Rossum]
That beauty could remain. At the risk of lapsing completely into pattern vocabulary ... an AbstractFactory would create Ints and bind them to a subsequently hidden implementation. When the int involved was tiny (-1 to 256?) this AbstractFactory would use one of the pre-allocated, shared (see FlyWeight) implementations. Outside of this range distinct Short, Long or DamnHuge implementations would be created. -- Mike

Or, since we're talking implementation, we could use the following hack, which I borrowed from Standard ML of New Jersey: There is just one type, namely int. An int is a 31-bit integer, INCLUDING sign. One extra bit indicates whether the integer is really a number or, alternatively, a pointer to the rest of the representation. Now, you may object that implementing this strategy in C will require lots of shifting and masking. I would have thought so, too. However, every C implementation of which I am aware puts all but the most trivial data structures on boundaries of two or more bytes. This fact frees the LOW-order bit of the integer to indicate whether it is a pointer. So here's the strategy: If the low-order bit of an integer is *off*, it's really a pointer to the rest of the implementation. If the low-order bit is *on*, then it represents an integral value that can be obtained by doing a one-bit arithmetic right shift. Yes, it's sleazy. But I imagine it would be much faster than using inheritance.

"Phillip J. Eby" <pje@telecommunity.com> writes:
It wouldn't have to be that bad if you put the pointer/int thingy in the ob_ival slot. Cheers, mwh -- The only problem with Microsoft is they just have no taste. -- Steve Jobs, (From _Triumph of the Nerds_ PBS special) and quoted by Aahz on comp.lang.python

[Andrew Koenig]
[Phillip J. Eby]
I imagine it wouldn't, because it'd add an extra test to not only every Py_INCREF and Py_DECREF, but every PyObject_something call.
[Michael Hudson]
It wouldn't have to be that bad if you put the pointer/int thingy in the ob_ival slot.
Not all HW Python runs on is byte-addressed, so the base idea that "the last bit" of a pointer-to-non-trivial-structure is always 0 doesn't get off the ground. C doesn't have an arithmetic right-shift operator either, but that one is easier to worm around (pyport.h already has a Py_ARITHMETIC_RIGHT_SHIFT macro to "do the right thing"). OTOH, the only current Python platform I know of that's word-addressed is the Cray T3E, which also happens to be the only one I know of where C's signed ">>" zero-fills (and is also the only one I know of that has no 16-bit integral type -- the T3E porters had an interesting <wink> time).

Andrew Koenig writes:
I recently mentioned this approach to coaxing an extra bit from an existing structure to Tim, and he pointed out (rightly!) that this only works for machines with byte-addressable memories. On word-addressable memories, there's no chance to coax the extra bit from a pointer at all, and Python still runs on at least one such platform (some Cray thing, IIRC). Otherwise I like the approach. ;-) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

"Andrew Koenig" <ark-mlist@att.net> writes:
This is exactly how it's done in Ruby as well. Search for FIXNUM in http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/ruby/ruby.h?rev=1.95&content-type=text/x-cvsweb-markup Regards, Gisle

This is a well-known scheme. We used it in ABC 20 years ago. My experience with it was negative though: there were too many core dumps because some code was dereferencing a pointer without checking the low bit. So I'd rather not try this. I also somehow doubt that the speed gains are that significant. --Guido van Rossum (home page: http://www.python.org/~guido/)

(It has been pointed out that subclassing int isn't very useful, so maybe this is a moot point. Does anybody have a real use case?)
isinstance(x, int) That doesn't require an inheritance relationship between the integral types, but it does require that long inherit from int. So int could be a base class with long and short inheriting from it, or long could inherit from int.

Andrew Koenig <ark-mlist@att.net>:
Having int be an abstract class doesn't prevent it from being subclassed - it just means the subclass needs to to more work in order to behave like other concrete int subclasses. Along with short and long, there could be a UserInt subclass which delegates all the operations to an implementation object, for people who really want an int subclass for some reason. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

[Guido van Rossum]
I've submitted a patch (http://python.org/sf/849227). And yes, somebody should probably take a good look at it before applying. The (modified) test suite does pass on my machine, but that's all. I may well have forgotten to add tests for new special cases, and I'm not the most experienced C programmer on the block either. As a side note, I think that line 233 in Lib/test/test_format.py if sys.maxint == 2**32-1: should be if sys.maxint == 2**31-1: but I didn't include that in the patch or submit a bug report. Should I? Peace, Kalle -- Kalle Svensson, http://www.juckapan.org/~kalle/ Student, root and saint in the Church of Emacs.

Thanks!
This definitely smells like a bug (I've never seen a machine with 33-bit ints :-) so feel free to submit a separate patch to SF. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Kalle Svensson]
Well, it looks like you got everything right. Congratulations! I've checked your code into CVS. There are now two pieces of PEP 237 unimplemented (apart from the complete and total eradication of long literals, which won't happen until 3.0). (1) PEP 237 promises that after the new semantics are introduced for hex/oct literals and conversions, and left shifts, operations that cause a different result than before will produce a warning that is on by default. Given the pain we've suffered through the warnings in 2.3 about this stuff, I propose to forget about these warnings. The new semantics are clear and consistent, warnings would just cause more distress, and code first ported to 2.3 will already have silenced the warnings. (2) PEP 237 promises that repr() of a long should no longer show a trailing 'L'. This is not yet implemented (i.e., repr() of a long still has a trailing 'L'). First, past experience suggests that quite a bit of end user code will break, and it may easily break silently: there used to be code that did str(x)[:-1] (knowing x was a long) to strip the 'L', which broke when str() of a long no longer returned a trailing 'L'. Apparently some of this code was "fixed" by changing str() into repr(), and this code will now break again. Second, I *like* seeing a trailing L on longs, especially when there's no reason for it to be a long: if some expression returns 1L, I know something fishy may have gone on. Any comments on these? Should I update PEP 237 to reflect this?
Fixed that too. But somebody might want to backport it to 2.3.3. --Guido van Rossum (home page: http://www.python.org/~guido/)

+1, The warnings cause more pain than they save. Part of the purpose of a warning is to leave you feeling unsettled -- I don't think that is a worthy goal when the code is going to work fine anyway. Let PyChecker or some such warn about prior version compatibility issues like that.
-0, The reasons are good but this one has been promised for several years. It's time for an L free python -- one less thing to have to learn. If there is transition difficultly, let it be a prompt to consider applying the forthcoming Decimal module. If necessary, we could add a debug mode switch for L's to be on or off. By putting it the debug build, we keep people from using it in production code. The purpose is to allow code to be run twice to see if different results are obtained. Also, we can put migration advice in PEP 290 and whatsnew24.tex to grep for indicators like [:-1] on the same line as long() or repr().
Should I update PEP 237 to reflect this?
Yes, that's better than surprising people later. Raymond

Yes, but people using type() or isinstance() or __class__ will still have to remember that there are two types of integers: int and long. And both built-ins will be with us for years, and they aren't quite aliases for each other (long('12') returns a long, but int('12') an int).
If there is transition difficultly, let it be a prompt to consider applying the forthcoming Decimal module.
This I don't understand.
But making a debug build is far from trivial (especially on Windows). Perhaps it should be a switch on the regular build but also produce a warning, to annoy. :-)
Also, we can put migration advice in PEP 290 and whatsnew24.tex to grep for indicators like [:-1] on the same line as long() or repr().
Can you take care of that?
Should I update PEP 237 to reflect this?
Yes, that's better than surprising people later.
I'll do that (in due time). --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
+1, and especially since it looks like 2.3 is going to become the next 1.5.2 (i.e., the version everyone flocks to, and then badgers you about for the next 20 years <wink>).
+1. Changing string representations is always traumatic (lots of programs rely on parsing them), and I have a hard time imagining what positive good could come from stripping the 'L'. Making that change for str(long) seemed like pure loss from my POV (broke stuff and helped nothing).
Any comments on these? Should I update PEP 237 to reflect this?
The PEP should reflect The Plan, sure.

On Sat, Nov 29, 2003, Guido van Rossum wrote:
That makes sense to me; there should be an easy way from Python to detect what kind of object you've got (as a string representation), and repr() is precisely the place for it. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.

[me]
[Aahz]
OK. I've got one weak (-0 from Raymond) opposition to this idea, and two strong agreements (+1 from Tim and Aahz), so I'm going to go ahead and make this change to the PEP. This pretty much makes PEP 237 finished except for the complete and utter eradication of the trailing 'L' (and probably of the 'long' type altogether) in Python 3.0. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Monday 01 December 2003 08:10 pm, Guido van Rossum wrote: This pretty much makes PEP 237
finished except for the complete and utter eradication of the trailing 'L' (and probably of the 'long' type altogether) in Python 3.0.
Would there still be an int type and a long type in Python 3.0, or would the notion of a long be be dropped. If it were dropped, then the int representation would be transparently represented as a long if the size of the number could not fit in an int. If long is dropped then the long function could be added to the list of builtins that will disappear in Python 3.0.

I'm not sure about that yet. I'd *like* to find a hack that lets the int type change representations, but the fact is that it's much easier to use different types to indicate different representations. But you're right, even if there are two types, there's probably no reason to expose the 'long' type as a builtin. So long should go on the list as *likely* to disappear in 3.0. --Guido van Rossum (home page: http://www.python.org/~guido/)

Aren't integers immutable? If so, I would think it doesn't make sense for them to change representation, as they don't change value. Anyway, if you want to use type to encode representation, I would think that the various integer types should be related by inheritance. As a long can always substitute for an int, at least in theory, I would think that long should be derived from int. Then isinstance(42L, int) would yield True. If integers are related this way, LSP implies that converting a long to a string should not put an L at the end.

I was using shorthand-speak meaning that different instances of the same class would use a different representation (which the class can somehow recognize by looking at the instance, of course).
Or should int be a subclass of long? I believe that OO theorists consider the base class the set with the largest number of elements (since it is the least constrained). Now, all ints are longs, but all longs are not ints, since ints can only represent values in [-sys.maxint-1, sys.maxint]. According to this reasoning, long should be the base class. But the naming suggests different: 'int' suggests no particular size (except to C/C++ users :-) so should be the more general class, and that pleads for your version. I don't particularly like either approach, because the 'long' type is an implementation detail over which the user has no control (this is a change of heart since Python's original design!). I guess that means I have to work harder and make the single int type support both representations. I'm sure it can be done.
If integers are related this way, LSP implies that converting a long to a string should not put an L at the end.
Well, they aren't in Python 2.x, which is why the L stays until 3.0. --Guido van Rossum (home page: http://www.python.org/~guido/)

Aren't integers immutable? If so, I would think it doesn't make sense for them to change representation, as they don't change value.
Got it.
I think that int should be the base class, because I can imagine long supporting operations that int does not support, but not vice versa.

Andrew Koenig <ark-mlist@att.net>
I think that int should be the base class, because I can imagine long supporting operations that int does not support, but not vice versa.
Perhaps 'int' should be an abstract class, with concrete subclasses 'long' and 'short'. And 'bool' would fit into this family as a sort of "extremely short int". Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

MWH:
That's quite a nice idea -- a bit like an Objective C class cluster.
I've been pondering this one too. The only downside I can think of is that code that inherits from class int to add some method (e.g. a class hexint whose repr() and str() call hex() instead) will no longer inherit any implementation, and thus won't be very useful. Inheriting from short or long doesn't quite solve the problem either. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Isn't the 'long' format the more general representation, with the 'int' format then being a performance hack to take advantage of the speed of C integer arithmetic? I'm just wondering if a different way of thinking about it might help with figuring out how to handle a combined implementation. Cheers, Nick.

Nick Coghlan:
I proposed that too, but Andrew Koenig didn't like it. --Guido van Rossum (home page: http://www.python.org/~guido/)

It's not me, it's Barbara Liskov at MIT.
:-) (For non-OO wizards, this is called "Liskov substitutability".)
So the question is, does long have operations that int doesn't have? And if so, why can't those operations be added to int? And if there's a reason, is it good enough? If the sets of operations are identical, is there a way to break the tie? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wednesday 03 December 2003 11:10 am, Guido van Rossum wrote:
Taking into account their difference in representation, a long can support 1<<32, but an int can't. I'm not saying that the "proper" (long) behavior can't be unified into a single type, just giving an example of an operation long supports but int doesn't. Jeremy

At 12:15 PM 12/3/03 -0500, Jeremy Fincher wrote:
Not so. Both '1' and '32' can be represented by int, so only the operation *result* needs to be a long. Further, if the idea here is that 'int' will be a subclass of 'long', then it's perfectly valid to return an int from any operation declared as returning a long. Further, since it's acceptable to *pass* an int for any argument declared 'long', it should suffice to use 'long' for all integer inputs and outputs of methods on 'long'. Anyway, if the idea is that 'long' will be the base class, IMO the name 'long' is confusing. It should probably be called 'integer', with the subclass being either 'int' or 'short'.

We're talking about a hypothetical int here where that operation returns 4294967296L. (Not so hypothetical, it's implemented in Python 2.4 in CVS.)
Sorry, your counterexample is rejected. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
Guido van Rossum wrote:
The 'performance hack' point of view I was trying to suggest was along the lines of: "Python integers are capable of storing values of arbitrary magnitude, subject only to the memory capacity of the machine. As a matter of performance, Python will use native C integers (and native arithmetic) when the stored value is small enough to fit." That is, I was agreeing with Guido's first point above.

Right. The implementation uses the type as the vtable of a C++ class, and the functions in the vtable "know" the representation, which is very different for short and long ints. This is all nice and efficient, except that the identity of the vtable is returned as the type of the object, so two different vtables means two different types. The alternative implementation technique is to have a single type/vtable, embed a flag in each instance that tells the implementation which representation is used, and have an "if" at the top of each implementation function switching between the two. The downslide should be obvious: not only an extra test+branch for each operation, but also extra space in the object (the short representation has no spare bits). --Guido van Rossum (home page: http://www.python.org/~guido/)

"Guido van Rossum" <guido@python.org> wrote in message news:200312031610.hB3GA1204029@c-24-5-183-134.client.comcast.net...
In this case, one could argue that no tie breaker is needed. If one accepts that int and long have the same operations but offer different implementations, then the obvious solution is: an abstract base class (Int) and two sub-classes (Short and Long?). Of course, this obvious solution has a problem: a programmer can't easily sub-class Int. They must choose one a sub-class from Short or Long (unless they want to provide a full implementation themselves) and that's probably not a decision they want to make. Perhaps, because of this, the GOF "Bridge Pattern" might be suitable here. (This pattern can be framed as: "adapt multiple implementations to an interface using delegation" -- which, after all, is pretty much what the vtable in previous solution gives you.) If the existence of Short and Long is an implementation detail best hidden from the python programmer, then a Bridge Pattern implementation has a lot going for it. Use of the Bridge pattern might even allow for three different implementations: PreDefined, Short, Long ... where PreDefined is one of the numbers between 0 and 256 which I'm given to understand are pre-allocated by the runtime environment. -- Mike

Mike Thompson <mike.thompson@day8.com.au>:
Of course, this obvious solution has a problem: a programmer can't easily sub-class Int.
I can't see that subclassing int is a particularly useful thing to do, even as matters stand today. As soon as you do any operation on your int subclass, you get a result which is not an instance of your subclass any more, which makes using it rather fragile. Does anyone have a real-life use case for subclassing int? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

On Thu, Dec 04, 2003 at 03:54:21PM +1300, Greg Ewing wrote:
Does anyone have a real-life use case for subclassing int?
Not me. Dylan does not allow it's concrete numeric classes to be subclassed. That allows efficient method dispatch given restrictive enough declarations. Here is the standard numberic classes: class number(object): # Open Abstract Class class complex(number): # Sealed Abstract Class class real(complex): # Sealed Abstract Class class float(real): # Sealed Abstract Class class single_float(float) # Sealed Class class double_float(float) # Sealed Class class rational(real): # Sealed Abstract Class class integer(rational): # Sealed Class complex is sealed which basically means that it cannot be subclassed by users. That allows Sealing is described here: http://www.gwydiondylan.org/drm/drm_70.htm. The language reference does not describe a "long" integer type but the implementation is free to provide one. Neil

Right, this seems to be one of the preferred solutions.
(It has been pointed out that subclassing int isn't very useful, so maybe this is a moot point. Does anybody have a real use case?)
Hmm... I'm not very familiar with the Bridge pattern (and my GoF book is on one of 65 boxes still in the garage waiting until we move into a larger house :-). Can you give a little more detail about how this would be done? Is it something that a user wishing to subclass Int should do, or something that the Int class (or its subclasses) should provide?
I don't think so -- the beauty is that (most of) the implementation doesn't know that certain int values have eternal life; their representation is exactly the same as that of regular ints. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Mike Thompson]
[Guido van Rossum]
(It has been pointed out that subclassing int isn't very useful, so maybe this is a moot point. Does anybody have a real use case?)
I don't. It just rubs my intuition up the wrong way for it not to be subclassable, but that'll be because I'm not a fully reformed OO Bigot ;-) [Mike Thompson]
[Guido van Rossum]
To me, Bridge Pattern is most useful when either: 1. There is a need to sub-class the abstraction (Int) other than for implementational purposes. 2. The implementation of the abstraction (Int) needs to change at runtime. So, given you doubt the need for 1. and that 2. was never needed because of Int-immutability, I should withdraw my suggestion in favour of the more simple abstract-Int-base-with-Long-And-Short-Iimplementation-in-sub-classes approach previously mentioned. BTW, a summary of most patterns can be found on-line at http://patterndigest.com/ [Guido van Rossum]
Its something that the writer of the Int class would use iff they wanted to ensure that programmers were oblivious to the two or three possible implementations of Int AND one of the two conditions I mentioned above (needed subclassability or runtime-change in implementation) held true. [Mike Thompson]
[Guido van Rossum]
That beauty could remain. At the risk of lapsing completely into pattern vocabulary ... an AbstractFactory would create Ints and bind them to a subsequently hidden implementation. When the int involved was tiny (-1 to 256?) this AbstractFactory would use one of the pre-allocated, shared (see FlyWeight) implementations. Outside of this range distinct Short, Long or DamnHuge implementations would be created. -- Mike

Or, since we're talking implementation, we could use the following hack, which I borrowed from Standard ML of New Jersey: There is just one type, namely int. An int is a 31-bit integer, INCLUDING sign. One extra bit indicates whether the integer is really a number or, alternatively, a pointer to the rest of the representation. Now, you may object that implementing this strategy in C will require lots of shifting and masking. I would have thought so, too. However, every C implementation of which I am aware puts all but the most trivial data structures on boundaries of two or more bytes. This fact frees the LOW-order bit of the integer to indicate whether it is a pointer. So here's the strategy: If the low-order bit of an integer is *off*, it's really a pointer to the rest of the implementation. If the low-order bit is *on*, then it represents an integral value that can be obtained by doing a one-bit arithmetic right shift. Yes, it's sleazy. But I imagine it would be much faster than using inheritance.

"Phillip J. Eby" <pje@telecommunity.com> writes:
It wouldn't have to be that bad if you put the pointer/int thingy in the ob_ival slot. Cheers, mwh -- The only problem with Microsoft is they just have no taste. -- Steve Jobs, (From _Triumph of the Nerds_ PBS special) and quoted by Aahz on comp.lang.python

[Andrew Koenig]
[Phillip J. Eby]
I imagine it wouldn't, because it'd add an extra test to not only every Py_INCREF and Py_DECREF, but every PyObject_something call.
[Michael Hudson]
It wouldn't have to be that bad if you put the pointer/int thingy in the ob_ival slot.
Not all HW Python runs on is byte-addressed, so the base idea that "the last bit" of a pointer-to-non-trivial-structure is always 0 doesn't get off the ground. C doesn't have an arithmetic right-shift operator either, but that one is easier to worm around (pyport.h already has a Py_ARITHMETIC_RIGHT_SHIFT macro to "do the right thing"). OTOH, the only current Python platform I know of that's word-addressed is the Cray T3E, which also happens to be the only one I know of where C's signed ">>" zero-fills (and is also the only one I know of that has no 16-bit integral type -- the T3E porters had an interesting <wink> time).

Andrew Koenig writes:
I recently mentioned this approach to coaxing an extra bit from an existing structure to Tim, and he pointed out (rightly!) that this only works for machines with byte-addressable memories. On word-addressable memories, there's no chance to coax the extra bit from a pointer at all, and Python still runs on at least one such platform (some Cray thing, IIRC). Otherwise I like the approach. ;-) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

"Andrew Koenig" <ark-mlist@att.net> writes:
This is exactly how it's done in Ruby as well. Search for FIXNUM in http://www.ruby-lang.org/cgi-bin/cvsweb.cgi/ruby/ruby.h?rev=1.95&content-type=text/x-cvsweb-markup Regards, Gisle

This is a well-known scheme. We used it in ABC 20 years ago. My experience with it was negative though: there were too many core dumps because some code was dereferencing a pointer without checking the low bit. So I'd rather not try this. I also somehow doubt that the speed gains are that significant. --Guido van Rossum (home page: http://www.python.org/~guido/)

(It has been pointed out that subclassing int isn't very useful, so maybe this is a moot point. Does anybody have a real use case?)
isinstance(x, int) That doesn't require an inheritance relationship between the integral types, but it does require that long inherit from int. So int could be a base class with long and short inheriting from it, or long could inherit from int.

Andrew Koenig <ark-mlist@att.net>:
Having int be an abstract class doesn't prevent it from being subclassed - it just means the subclass needs to to more work in order to behave like other concrete int subclasses. Along with short and long, there could be a UserInt subclass which delegates all the operations to an implementation object, for people who really want an int subclass for some reason. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
participants (17)
-
Aahz
-
Andrew Koenig
-
Fred L. Drake, Jr.
-
Gisle Aas
-
Greg Ewing
-
Guido van Rossum
-
Jeremy Fincher
-
John J Lee
-
Kalle Svensson
-
Michael Hudson
-
Michael McLay
-
Mike Thompson
-
Neil Schemenauer
-
Nick Coghlan
-
Phillip J. Eby
-
Raymond Hettinger
-
Tim Peters