Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101', just the converse of the way int('101',1) gives 5? I'm not sure why str has never allowed this obvious use -- any bright beginner assumes it's there and it's awkward to explain why it's not!-). I'll be happy to propose a patch if the BDFL blesses this, but I don't even think it's worth a PEP... it's an inexplicable though long-standing omission (given the argumentative nature of this crowd I know I'll get pushback, but I still hope the BDFL can Pronounce about it anyway;-). Alex
On 1/16/06, Alex Martelli
Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101', just the converse of the way int('101',1)
I think you mean ``int('101' 2)``. =)
gives 5? I'm not sure why str has never allowed this obvious use -- any bright beginner assumes it's there and it's awkward to explain why it's not!-). I'll be happy to propose a patch if the BDFL blesses this, but I don't even think it's worth a PEP... it's an inexplicable though long-standing omission (given the argumentative nature of this crowd I know I'll get pushback, but I still hope the BDFL can Pronounce about it anyway;-).
I'm +0. Not a big thing for me, but having the symmetry seems reasonable. -Brett
It never occured to me that str() would behave like int() for this
case. It makes complete sense to me that a factory for numbers would
ask about the base of the number. What would the base of a string be,
except in a few limited cases? str([1, 2], 4) doesn't make any sense.
You might argue that I wasn't all that bright as a beginner <0.5
wink>.
I think it shouldn't be changed, because the second positional
argument only works for a small number of the panoply types that can
be passed to str(). It would be fine to have a function for this
hiding somewhere, perhaps even as a method on numbers, but str() is
too generic.
Jeremy
On 1/16/06, Alex Martelli
Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101', just the converse of the way int('101',1) gives 5? I'm not sure why str has never allowed this obvious use -- any bright beginner assumes it's there and it's awkward to explain why it's not!-). I'll be happy to propose a patch if the BDFL blesses this, but I don't even think it's worth a PEP... it's an inexplicable though long-standing omission (given the argumentative nature of this crowd I know I'll get pushback, but I still hope the BDFL can Pronounce about it anyway;-).
Alex
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
On 1/16/06, Alex Martelli
Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101', just the converse of the way int('101',1) [I'm sure you meant int('101', 2) here] gives 5? I'm not sure why str has never allowed this obvious use -- any bright beginner assumes it's there and it's awkward to explain why it's not!-). I'll be happy to propose a patch if the BDFL blesses this, but I don't even think it's worth a PEP... it's an inexplicable though long-standing omission (given the argumentative nature of this crowd I know I'll get pushback, but I still hope the BDFL can Pronounce about it anyway;-).
I wish you had an argument better than "every bright beginner assumes it exists". :-) But (unlike for some other things that bright beginners might assume) I don't think there's a deep reason why this couldn't exist. The only reasons I can come up with is "because input and output are notoriously asymmetric in Python" and "because nobody submitted a patch". :-) There are some corner cases to consider though. - Should repr() support the same convention? I think not. - Should str(3.1, n) be allowed? I think not. - Should str(x, n) call x.__str__(n)? Neither. - Should bases other than 2..36 be considered? I don't think so. Unfortunately this doesn't obsolete oct() and hex() -- oct(10) returns '012', while str(10, 8) soulc return '12'; hex(10) returns '0xa' while str(10, 16) would return 'a'. I do think that before we add this end-user nicety, it's more important to implement __index__() in Python 2.5. This behaves like __int__() for integral types, but is not defined for float or Decimal. Operations that intrinsically require an integral argument, like indexing and slicing, should call __index__() on their argument rather than __int__(), so as to support non-built-in integral arguments while still complaining about float arguments. This is currently implemented by explicitly checking for float in a few places, which I find repulsing. __index__() won't be requested by bright beginners, but it is important e.g. to the Numeric Python folks, who like to implement their own integral types but are suffering from that their integers aren't usable everywhere. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, Jan 16, 2006 at 07:44:44PM -0800, Alex Martelli wrote:
Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101',
My reaction having read this far was "huh?". It took some time (several seconds) before it occurred to me what you wanted str(5,2) to mean, and why it should give '101'. If you'd proposed, say (5).as_binary() == '101', or "5".encode("base2"), I wouldn't have been as baffled. Or perhaps even str(5, base=2), but frankly the idea of the string type doing numeric base conversions seems weird to me, rather than symmetric. I wouldn't mind seeing arbitrary base encoding of integers included somewhere, but as a method of str -- let alone the constructor! -- it feels quite wrong. -Andrew.
I wish you had an argument better than "every bright beginner assumes it exists". :-)
But (unlike for some other things that bright beginners might assume) I don't think there's a deep reason why this couldn't exist.
The only reasons I can come up with is "because input and output are notoriously asymmetric in Python" and "because nobody submitted a patch". :-)
My reason is that I've rolled-my-own more times than I can count but infrequently enough to where it was easier to re-write than to search for the previous use. Another quick thought: I presume that only the str() builtin would change and that the underlying __str__ slot would continue to be hard-wired to the (reprfunc) signature. Raymond
On Mon, 16 Jan 2006 19:44:44 -0800, Alex Martelli
Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101', just the converse of the way int('101',1) gives 5? I'm not sure why str has never allowed this obvious use -- any bright beginner assumes it's there and it's awkward to explain why it's not!-). I'll be happy to propose a patch if the BDFL blesses this, but I don't even think it's worth a PEP... it's an inexplicable though long-standing omission (given the argumentative nature of this crowd I know I'll get pushback, but I still hope the BDFL can Pronounce about it anyway;-).
-1. Confusing and non-obvious. The functionality may be valuable but it is mis-placed as a feature of str() or a method of the str type. I work with a lot of Python beginners too, and while they occassionally ask for this functionality, I've never heard anyone wonder why str() didn't provide it or suggest that it should. Jean-Paul
On Tue, 2006-01-17 at 15:08 +1100, Andrew Bennetts wrote:
My reaction having read this far was "huh?". It took some time (several seconds) before it occurred to me what you wanted str(5,2) to mean, and why it should give '101'.
If you'd proposed, say (5).as_binary() == '101', or "5".encode("base2"), I wouldn't have been as baffled. Or perhaps even str(5, base=2), but frankly the idea of the string type doing numeric base conversions seems weird to me, rather than symmetric.
I wouldn't mind seeing arbitrary base encoding of integers included somewhere, but as a method of str -- let alone the constructor! -- it feels quite wrong.
Hear, hear. I was similarly perplexed when I first read that! -Barry
On Jan 16, 2006, at 8:03 PM, Jeremy Hylton wrote:
It never occured to me that str() would behave like int() for this case. It makes complete sense to me that a factory for numbers would ask about the base of the number. What would the base of a string be, except in a few limited cases? str([1, 2], 4) doesn't make any sense. You might argue that I wasn't all that bright as a beginner <0.5 wink>.
I think it shouldn't be changed, because the second positional argument only works for a small number of the panoply types that can be passed to str().
Identically the same situation as for int: the base argument is only accepted if the first argument is a str (not a float, etc). Just the same way, the base argument to str will only be accepted if the first argument is an int (not a float, etc). Alex
On 1/16/06, Alex Martelli
On Jan 16, 2006, at 8:03 PM, Jeremy Hylton wrote:
I think it shouldn't be changed, because the second positional argument only works for a small number of the panoply types that can be passed to str().
Identically the same situation as for int: the base argument is only accepted if the first argument is a str (not a float, etc). Just the same way, the base argument to str will only be accepted if the first argument is an int (not a float, etc).
The concept of base is closely related to ints, and the base argument is useful for a large percentage of the types that int accepts. It is not related to strings, in general, and applies to only one of the types it accepts. In one case "not a float, etc." applies to a very limited set of types, in the other case it applies to every conceivable type (except int). If str() were to take two argument, the analogy with int suggests it should be an encoding, where the second argument describes how to interpret the representation of the first (it's base 7 or it's utf-8). Jeremy
On Jan 16, 2006, at 8:18 PM, Barry Warsaw wrote:
On Tue, 2006-01-17 at 15:08 +1100, Andrew Bennetts wrote:
My reaction having read this far was "huh?". It took some time (several seconds) before it occurred to me what you wanted str(5,2) to mean, and why it should give '101'.
If you'd proposed, say (5).as_binary() == '101', or "5".encode ("base2"), I wouldn't have been as baffled. Or perhaps even str(5, base=2), but frankly the idea of the string type doing numeric base conversions seems weird to me, rather than symmetric.
I wouldn't mind seeing arbitrary base encoding of integers included somewhere, but as a method of str -- let alone the constructor! -- it feels quite wrong.
Hear, hear. I was similarly perplexed when I first read that!
The only bases I've ever really had a good use for are 2, 8, 10, and 16. There are currently formatting codes for 8 (o), 10 (d, u), and 16 (x, X). Why not just add a string format code for unsigned binary? The obvious choice is probably "b". For example:
'%08b' % (12) '00001100' '%b' % (12) '1100'
I'd probably expect "5".encode("base2") to return '00110101', because "5".encode("hex") returns '35' -bob
[Jeremy Hylton]
The concept of base is closely related to ints, and the base argument is useful for a large percentage of the types that int accepts. It is not related to strings, in general, and applies to only one of the types it accepts. In one case "not a float, etc." applies to a very limited set of types, in the other case it applies to every conceivable type (except int).
That suggests that it would be better to simply add an int method: x.convert_to_base(7) Raymond
On Mon, Jan 16, 2006 at 11:54:05PM -0500, Raymond Hettinger wrote: [...]
That suggests that it would be better to simply add an int method:
x.convert_to_base(7)
This seems clear and simple to me. I like it. I strongly suspect the "bright beginners" Alex is interested in would have no trouble using it or finding it. -Andrew.
On Jan 16, 2006, at 9:12 PM, Andrew Bennetts wrote:
On Mon, Jan 16, 2006 at 11:54:05PM -0500, Raymond Hettinger wrote: [...]
That suggests that it would be better to simply add an int method:
x.convert_to_base(7)
This seems clear and simple to me. I like it. I strongly suspect the "bright beginners" Alex is interested in would have no trouble using it or finding it.
I don't know about that, all of the methods that int and long currently have are __special__. They'd really need to start with Python 2.5 (assuming int/long grow "public methods" in 2.5) to even think to look there. A format code or a built-in would be more likely to be found, since that's how you convert integers to hex and oct string representations with current Python.
[name for name in dir(0)+dir(0L) if not name.startswith('__')] []
-bob
On Mon, Jan 16, 2006 at 09:28:10PM -0800, Bob Ippolito wrote:
On Jan 16, 2006, at 9:12 PM, Andrew Bennetts wrote: [...]
x.convert_to_base(7)
This seems clear and simple to me. I like it. I strongly suspect the "bright beginners" Alex is interested in would have no trouble using it or finding it.
I don't know about that, all of the methods that int and long currently have are __special__. They'd really need to start with Python 2.5 (assuming int/long grow "public methods" in 2.5) to even think to look there. A format code or a built-in would be more likely to be found, since that's how you convert integers to hex and oct string representations with current Python.
[name for name in dir(0)+dir(0L) if not name.startswith('__')] []
I should have said, I'm equally happy with the format code as well (although it doesn't allow arbitary base conversions, I've never had need for that, so I'm not too worried about that case). Either option is better than making the str constructor do relatively rarely used mathematics! -Andrew.
On Mon, 2006-01-16 at 20:49 -0800, Bob Ippolito wrote:
The only bases I've ever really had a good use for are 2, 8, 10, and 16. There are currently formatting codes for 8 (o), 10 (d, u), and 16 (x, X). Why not just add a string format code for unsigned binary? The obvious choice is probably "b".
For example:
'%08b' % (12) '00001100' '%b' % (12) '1100'
+1 -Barry
On 1/16/06, Guido van Rossum
On 1/16/06, Alex Martelli
wrote: Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101', just the converse of the way int('101',1) [I'm sure you meant int('101', 2) here] gives 5? I'm not sure why str has never allowed this obvious use -- any bright beginner assumes it's there and it's awkward to explain why it's not!-). I'll be happy to propose a patch if the BDFL blesses this, but I don't even think it's worth a PEP... it's an inexplicable though long-standing omission (given the argumentative nature of this crowd I know I'll get pushback, but I still hope the BDFL can Pronounce about it anyway;-).
I wish you had an argument better than "every bright beginner assumes it exists". :-)
But (unlike for some other things that bright beginners might assume) I don't think there's a deep reason why this couldn't exist.
The only reasons I can come up with is "because input and output are notoriously asymmetric in Python" and "because nobody submitted a patch". :-)
There are some corner cases to consider though.
- Should repr() support the same convention? I think not. - Should str(3.1, n) be allowed? I think not. - Should str(x, n) call x.__str__(n)? Neither. - Should bases other than 2..36 be considered? I don't think so.
Unfortunately this doesn't obsolete oct() and hex() -- oct(10) returns '012', while str(10, 8) soulc return '12'; hex(10) returns '0xa' while str(10, 16) would return 'a'.
I do think that before we add this end-user nicety, it's more important to implement __index__() in Python 2.5. This behaves like __int__() for integral types, but is not defined for float or Decimal. Operations that intrinsically require an integral argument, like indexing and slicing, should call __index__() on their argument rather than __int__(), so as to support non-built-in integral arguments while still complaining about float arguments. This is currently implemented by explicitly checking for float in a few places, which I find repulsing. __index__() won't be requested by bright beginners, but it is important e.g. to the Numeric Python folks, who like to implement their own integral types but are suffering from that their integers aren't usable everywhere.
+1 from me (feel like this has come up before, but not totally sure). Be nice to add an abstraction for indexing. Added to the PyCon wiki as a possible sprint topic. -Brett
On 1/16/06, Bob Ippolito
On Jan 16, 2006, at 9:12 PM, Andrew Bennetts wrote:
On Mon, Jan 16, 2006 at 11:54:05PM -0500, Raymond Hettinger wrote: [...]
That suggests that it would be better to simply add an int method:
x.convert_to_base(7)
This seems clear and simple to me. I like it. I strongly suspect the "bright beginners" Alex is interested in would have no trouble using it or finding it.
I don't know about that, all of the methods that int and long currently have are __special__. They'd really need to start with Python 2.5 (assuming int/long grow "public methods" in 2.5) to even think to look there. A format code or a built-in would be more likely to be found, since that's how you convert integers to hex and oct string representations with current Python.
[name for name in dir(0)+dir(0L) if not name.startswith('__')]
If a method is the best solution, then fine, 2.5 is the beginning of methods on int/long. We could do a static method like int.from_str("101", 2) and str.from_int(5, 2) if people don't like the overloading of the constructors. Otherwise add methods like '101'.to_int(2) or 5 .to_str(2) . -Brett
Alex Martelli wrote:
Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101', just the converse of the way int('101',1) gives 5? I'm not sure why str has never allowed this obvious use -- any bright beginner assumes it's there and it's awkward to explain why it's not!-).
My main concern is what the impact on __str__ would be. It seems "obvious" that def str(obj, *args): return obj.__str__(*args) because it is ultimately int's responsibility to interpret the base argument, not str's. People would then come up with use cases like class Color: msg = {'en':['red', 'green', 'blue'], 'de':['rot','grün','blau']} def __str__(self, language='en'): return self.msg[language][self.value] red = Color(0) so you could say print str(red, 'de') I don't think I like this direction. Regards, Martin
On Tue, 2006-01-17 at 01:03 -0500, Barry Warsaw wrote:
On Mon, 2006-01-16 at 20:49 -0800, Bob Ippolito wrote:
The only bases I've ever really had a good use for are 2, 8, 10, and 16. There are currently formatting codes for 8 (o), 10 (d, u), and 16 (x, X). Why not just add a string format code for unsigned binary? The obvious choice is probably "b".
For example:
'%08b' % (12) '00001100' '%b' % (12) '1100'
+1
+1 me too.
--
Donovan Baarda
On Mon, Jan 16, 2006 at 11:13:27PM -0500, Raymond Hettinger wrote:
My reason is that I've rolled-my-own more times than I can count but infrequently enough to where it was easier to re-write than to search for the previous use.
Me too! The assymetry is annoying. Its easy to consume base 2..36
integers, but its hard to generate them.
However str() seems far too important to monkey with to me.
I like a method on int that would be great. That keeps all the base
conversions in int (either in __init__ or in as_yet_unnamed_method()).
Another suggestion would be to give hex() and oct() another parameter,
base, so you'd do hex(123123123, 2). Perhaps a little
counter-intuitive, but if you were looking for base conversion
functions you'd find hex() pretty quickly and the documentation would
mention the other parameter.
--
Nick Craig-Wood
Alex Martelli wrote:
Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101', just the converse of the way int('101',1) gives 5? I'm not sure why str has never allowed this obvious use -- any bright beginner assumes it's there and it's awkward to explain why it's not!-). I'll be happy to propose a patch if the BDFL blesses this, but I don't even think it's worth a PEP... it's an inexplicable though long-standing omission (given the argumentative nature of this crowd I know I'll get pushback, but I still hope the BDFL can Pronounce about it anyway;-).
Hmm, how about this: str(obj, ifisunicode_decode_using_encoding='ascii', ifisinteger_use_base=10, ifisfile_open_and_read_it=False, isdecimal_use_precision=10, ismoney_use_currency='EUR', isdatetime_use_format='%c') and so on ?! Or even better: str(obj, **kws) and then call obj.__str__(**kws) instead of just obj.__str__() ?! Seriously, shouldn't these more specific "convert to a string" functions be left to specific object methods or helper functions ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 17 2006)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
Bob Ippolito wrote:
On Jan 16, 2006, at 9:12 PM, Andrew Bennetts wrote:
On Mon, Jan 16, 2006 at 11:54:05PM -0500, Raymond Hettinger wrote: [...]
That suggests that it would be better to simply add an int method:
x.convert_to_base(7)
This seems clear and simple to me. I like it. I strongly suspect the "bright beginners" Alex is interested in would have no trouble using it or finding it.
I don't know about that, all of the methods that int and long currently have are __special__. They'd really need to start with Python 2.5 (assuming int/long grow "public methods" in 2.5) to even think to look there. A format code or a built-in would be more likely to be found, since that's how you convert integers to hex and oct string representations with current Python.
How about just stuffing some function in the math module? Everything in that module works on floats, but it seems incidental to me; I'm pretty sure I've even looked in that module for such a function before. But it's such an obscure need that only comes up in the kind of algorithms students write, that it just seems odd and unnecessary to put it in str() which is *so* much more general than int() is. -- Ian Bicking | ianb@colorstudy.com | http://blog.ianbicking.org
On Jan 17, 2006, at 2:36 AM, Ian Bicking wrote:
Bob Ippolito wrote:
On Jan 16, 2006, at 9:12 PM, Andrew Bennetts wrote:
On Mon, Jan 16, 2006 at 11:54:05PM -0500, Raymond Hettinger wrote: [...]
That suggests that it would be better to simply add an int method:
x.convert_to_base(7)
This seems clear and simple to me. I like it. I strongly suspect the "bright beginners" Alex is interested in would have no trouble using it or finding it. I don't know about that, all of the methods that int and long currently have are __special__. They'd really need to start with Python 2.5 (assuming int/long grow "public methods" in 2.5) to even think to look there. A format code or a built-in would be more likely to be found, since that's how you convert integers to hex and oct string representations with current Python.
How about just stuffing some function in the math module? Everything in that module works on floats, but it seems incidental to me; I'm pretty sure I've even looked in that module for such a function before. But it's such an obscure need that only comes up in the kind of algorithms students write, that it just seems odd and unnecessary to put it in str() which is *so* much more general than int() is.
I want binary all the time when I'm dealing with bitflags and such. Of course, I'm trained to be able to read bits in hex format, but it would be nicer to see the flags as-is. Even worse when you have to deal with some kind of file format where fields are N bits long, where N is not a multiple of 8. -bob
Bob Ippolito wrote:
I want binary all the time when I'm dealing with bitflags and such. Of course, I'm trained to be able to read bits in hex format, but it would be nicer to see the flags as-is. Even worse when you have to deal with some kind of file format where fields are N bits long, where N is not a multiple of 8.
so you want flags for bit order and fill order too, I assume ? </F>
On Jan 17, 2006, at 2:48 AM, Fredrik Lundh wrote:
Bob Ippolito wrote:
I want binary all the time when I'm dealing with bitflags and such. Of course, I'm trained to be able to read bits in hex format, but it would be nicer to see the flags as-is. Even worse when you have to deal with some kind of file format where fields are N bits long, where N is not a multiple of 8.
so you want flags for bit order and fill order too, I assume ?
Not really, big endian covers almost everything I need.. and when it doesn't, then I can just flip and/or pad the string accordingly. -bob
On Tue, 2006-01-17 at 10:05 +0000, Nick Craig-Wood wrote:
On Mon, Jan 16, 2006 at 11:13:27PM -0500, Raymond Hettinger wrote: [...] Another suggestion would be to give hex() and oct() another parameter, base, so you'd do hex(123123123, 2). Perhaps a little counter-intuitive, but if you were looking for base conversion functions you'd find hex() pretty quickly and the documentation would mention the other parameter.
Ugh!
I still favour extending % format strings. I really like '%b' for
binary, but if arbitary bases are really wanted, then perhaps also
leverage off the "precision" value for %d to indicate base such that '%
3.3d' % 5 = " 12"
If people think that using "." is for "precision" and is too ambiguous
for "base", you could do something like extend the whole conversion
specifier to (in EBNF)
conversion=%[mapping][flags][width][.precision][@base][modifier]type
which would allow for weird things like "%8.4@3f" % 5.5 == " 12.1111"
Note: it is possible for floats to be represented in non-decimal number
systems, its just extremely rare for anyone to do it. I have in my
distant past used base 16 float notation for fixed-point numbers.
I personally think %b would be adding enough. The other suggestions are
just me being silly :-)
--
Donovan Baarda
Donovan Baarda
I personally think %b would be adding enough. The other suggestions are just me being silly :-)
Yeah, the whole area is just crying out for the simplicity and restraint that is common lisp's #'format function :) Cheers, mwh -- <exarkun> INEFFICIENT CAPITALIST YOUR OPULENT TOILET WILL BE YOUR UNDOING -- from Twisted.Quotes
It seems dumb to support *parsing* integers in weird bases, but not *formatting* them in weird bases. Not a big deal, but if you're going to give me a toy, at least give me the whole toy! The %b idea is a little disappointing in two ways. Even with %b, Python is still dumb by the above criterion. And, I suspect users that don't know about %b are unlikely to find it when they want it. I know I've never looked for it there. I think a method 5664400.to_base(13) sounds nice. -j
On Tue, Jan 17, 2006 at 09:23:29AM -0500, Jason Orendorff wrote:
It seems dumb to support *parsing* integers in weird bases, but not *formatting* them in weird bases. Not a big deal, but if you're going to give me a toy, at least give me the whole toy!
The %b idea is a little disappointing in two ways. Even with %b, Python is still dumb by the above criterion. And, I suspect users that don't know about %b are unlikely to find it when they want it. I know I've never looked for it there.
I think a method 5664400.to_base(13) sounds nice.
It's also a SyntaxError. With the current syntax, it would need to be "(5664400).to_base(13)" or "5664400 .to_base(13)". Why not just make the %b take a base, e.g.: '%13b' % 5664400 <wink> -Andrew.
Raymond> My reason is that I've rolled-my-own more times than I can Raymond> count but infrequently enough to where it was easier to Raymond> re-write than to search for the previous use. Maybe a bin() builtin would be better. Even better it seems to me would be to add a method to ints and longs that returns a string formatted in a base between 2 and 36 (then deprecate hex and oct). Like Jeremy, I wonder what str([1,2], 4) means. Skip
Alex> Identically the same situation as for int: the base argument is Alex> only accepted if the first argument is a str (not a float, etc). Alex> Just the same way, the base argument to str will only be accepted Alex> if the first argument is an int (not a float, etc). A shortcoming in int() hardly seems like a good reason to mess with str(). Skip
On Tuesday 2006-01-17 15:19, skip@pobox.com wrote:
Alex> Identically the same situation as for int: the base argument is Alex> only accepted if the first argument is a str (not a float, etc). Alex> Just the same way, the base argument to str will only be accepted Alex> if the first argument is an int (not a float, etc).
A shortcoming in int() hardly seems like a good reason to mess with str().
How's it a shortcoming in int() that it doesn't do anything with, say, int(2.345,19)? (What would you like it to do?) Or are you saying that the fact that int(<a string>) lets you specify a base to interpret the string in is itself a shortcoming, and if so why? -- g
Alex> Identically the same situation as for int: the base argument is Alex> only accepted if the first argument is a str (not a float, etc). Alex> Just the same way, the base argument to str will only be accepted Alex> if the first argument is an int (not a float, etc). >> Skip> A shortcoming in int() hardly seems like a good reason to mess Skip> with str(). Gareth> How's it a shortcoming in int() that it doesn't do anything Gareth> with, say, int(2.345,19)? My reasoning was that just because int() was written to ignore the second arg depending on type (the "shortcoming") doesn't mean that str() should as well. Skip
skip@pobox.com wrote:
Skip> A shortcoming in int() hardly seems like a good reason to mess Skip> with str().
Gareth> How's it a shortcoming in int() that it doesn't do anything Gareth> with, say, int(2.345,19)?
My reasoning was that just because int() was written to ignore the second arg depending on type (the "shortcoming") doesn't mean that str() should as well.
"ignore" is perhaps the wrong word:
int(1.0, 1) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: int() can't convert non-string with explicit base
</F>
On 1/17/06, "Martin v. Löwis"
class Color: msg = {'en':['red', 'green', 'blue'], 'de':['rot','grün','blau']} def __str__(self, language='en'): return self.msg[language][self.value]
red = Color(0)
so you could say
print str(red, 'de')
I don't think I like this direction.
I agree that *args makes the code non-obvious. However, if **kwargs is used instead: def str(obj, **kwargs): return obj.__str__(**kwargs) class Color: msg = {'en':['red', 'green', 'blue'], 'de':['rot','grün','blau']} def __str__(self, language='en'): return self.msg[language][self.value] red = Color(0) print str(red, language='de') I find that quite readable. -- Adam Olsen, aka Rhamphoryncus
On Mon, Jan 16, 2006, Alex Martelli wrote:
Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101', just the converse of the way int('101',1) gives 5? I'm not sure why str has never allowed this obvious use -- any bright beginner assumes it's there and it's awkward to explain why it's not!-). I'll be happy to propose a patch if the BDFL blesses this, but I don't even think it's worth a PEP... it's an inexplicable though long-standing omission (given the argumentative nature of this crowd I know I'll get pushback, but I still hope the BDFL can Pronounce about it anyway;-).
-1 I agree with all the other comments about the functional asymmetry between int() and str() in the Python universe, and that therefore str() shouldn't necessarily mimic int()'s API. Propose some other mechanism; I so far haven't seen a good reasons to prefer any of the ones already proposed. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "19. A language that doesn't affect the way you think about programming, is not worth knowing." --Alan Perlis
On 1/16/06, Guido van Rossum
On 1/16/06, Alex Martelli
wrote: Is it finally time in Python 2.5 to allow the "obvious" use of, say, str(5,2) to give '101', just the converse of the way int('101',1) [I'm sure you meant int('101', 2) here]
Yep.
gives 5? I'm not sure why str has never allowed this obvious use -- any bright beginner assumes it's there and it's awkward to explain why it's not!-). I'll be happy to propose a patch if the BDFL blesses this, but I don't even think it's worth a PEP... it's an inexplicable though long-standing omission (given the argumentative nature of this crowd I know I'll get pushback, but I still hope the BDFL can Pronounce about it anyway;-).
I wish you had an argument better than "every bright beginner assumes it exists". :-)
What about "it should obviously be there"?-)
But (unlike for some other things that bright beginners might assume) I don't think there's a deep reason why this couldn't exist.
The only reasons I can come up with is "because input and output are notoriously asymmetric in Python" and "because nobody submitted a patch". :-)
OK, so, should I just submit a patch?
There are some corner cases to consider though.
- Should repr() support the same convention? I think not. - Should str(3.1, n) be allowed? I think not. - Should str(x, n) call x.__str__(n)? Neither. - Should bases other than 2..36 be considered? I don't think so.
Agreed on all scores.
Unfortunately this doesn't obsolete oct() and hex() -- oct(10) returns '012', while str(10, 8) soulc return '12'; hex(10) returns '0xa' while str(10, 16) would return 'a'.
Sure. hex(x) is like '0x'+str(x,16) and oct(x) is like '0'+str(x,8) but hex and oct are minutely more concise.
I do think that before we add this end-user nicety, it's more important to implement __index__() in Python 2.5. This behaves like
More important, sure, but also proportionally more work, so I don't see the two issues as "competing" against each other.
__int__() for integral types, but is not defined for float or Decimal. Operations that intrinsically require an integral argument, like indexing and slicing, should call __index__() on their argument rather than __int__(), so as to support non-built-in integral arguments while still complaining about float arguments.
Hear, hear. Multiplication of sequences, too.
This is currently implemented by explicitly checking for float in a few places, which I find repulsing. __index__() won't be requested by bright beginners, but it
You might be surprised by just how bright some (Python) beginners with deep maths background can be, though they might prefer to spell it differently (__int_and_i_really_mean_it__ for example'-) because it may not be obvious (if they're not Dutch) that multiplying a sequence has to do with 'indexing';-).
is important e.g. to the Numeric Python folks, who like to implement their own integral types but are suffering from that their integers aren't usable everywhere.
As the author and maintainer of gmpy I entirely agree -- I never liked the fact that instances of gmpy.mpq are "second class citizens" and i need to plaster int(...) around them to use them as list indices or to multiply sequences (I vaguely mentioned having in mind a 'baseinteger' check, but a special method returning the integer value, such as the __index__ you're talking about, is obviously better). Alex
On 1/17/06, Alex Martelli
OK, so, should I just submit a patch?
Hmm, there are quite a few people who strongly dislike the particular API you're proposing. The problem is, bright newbies might be led to wanting str(i, base) as an analogy to int(s, base) only because they see str() and int() as each other's opposites, not having seen other uses of either (especially str()). Given the amount of disagreement on this issue and my own lackluster interest I don't want to pronounce str(i, base) to be the right solution. Sorry! -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Tue, Jan 17, 2006 at 09:23:29AM -0500, Jason Orendorff wrote:
I think a method 5664400.to_base(13) sounds nice. [And others suggested int-methods too]
I would like to point out that this is almost, but not quite, entirely as
inapropriate as using str(). Integers don't have a base. String
representations of integers -- and indeed, numbers in general, as the Python
tutorial explains in Appendix B -- have a base. Adding such a method to
integers (and, I presume, longs) would beg the question why floats, Decimals
and complex numbers don't have them.
In-favour-of-%2b-ly y'rs,
--
Thomas Wouters
On 1/17/06, Thomas Wouters
On Tue, Jan 17, 2006 at 09:23:29AM -0500, Jason Orendorff wrote:
I think a method 5664400.to_base(13) sounds nice. [And others suggested int-methods too]
I would like to point out that this is almost, but not quite, entirely as inapropriate as using str(). Integers don't have a base. String representations of integers -- and indeed, numbers in general, as the Python tutorial explains in Appendix B -- have a base. Adding such a method to integers (and, I presume, longs) would beg the question why floats, Decimals and complex numbers don't have them.
I dream of a day when str(3.25, base=2) == '11.01'. That is the number a float really represents. It would be so much easier to understand why floats behave the way they do if it were possible to print them in binary. To be fair, it's not str(x, base=n) I'm after here (although it seems like a clean way to do it.) Rather, I just want SOME way of printing ints and floats in binary.
In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away. -- Adam Olsen, aka Rhamphoryncus
On 1/17/06, Adam Olsen
In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away.
Has this been proposed? What would %b print? -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On 1/17/06, Guido van Rossum
On 1/17/06, Adam Olsen
wrote: In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away.
Has this been proposed? What would %b print?
I don't believe it's been proposed and I don't know what it'd print. Perhaps it indicates the bytes should be passed through without conversion. In any case I only advocate waiting until it's clear that bytes have no need for it before we use it for binary conversions. -- Adam Olsen, aka Rhamphoryncus
On Tue, Jan 17, 2006 at 04:02:43PM -0800, Guido van Rossum wrote:
On 1/17/06, Adam Olsen
wrote: In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away.
Has this been proposed? What would %b print?
It was proposed in this or another thread about the same in the last few days (gmane search doesn't like the % in '%b'). The suggestion is to add 'b' as a sprintf-like format string %[<base>][.<pad>]b Where the optional <base> is the base to print in and <pad> is the optional minimum length of chars to print (as I recall). Default is base 2. Me? I like it. -Jack
On Jan 17, 2006, at 4:09 PM, Adam Olsen wrote:
On 1/17/06, Guido van Rossum
wrote: On 1/17/06, Adam Olsen
wrote: In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away.
Has this been proposed? What would %b print?
I don't believe it's been proposed and I don't know what it'd print. Perhaps it indicates the bytes should be passed through without conversion.
That doesn't make any sense. What is "without conversion"? Does that mean UTF-8, UCS-2, UCS-4, latin-1, Shift-JIS? You can't have unicode without some kind of conversion.
In any case I only advocate waiting until it's clear that bytes have no need for it before we use it for binary conversions.
I don't see what business a byte type has mingling with string formatters other than the normal str and repr coercions via %s and %r respectively. -bob
On Jan 17, 2006, at 3:38 PM, Adam Olsen wrote:
On 1/17/06, Thomas Wouters
wrote: On Tue, Jan 17, 2006 at 09:23:29AM -0500, Jason Orendorff wrote:
I think a method 5664400.to_base(13) sounds nice. [And others suggested int-methods too]
I would like to point out that this is almost, but not quite, entirely as inapropriate as using str(). Integers don't have a base. String representations of integers -- and indeed, numbers in general, as the Python tutorial explains in Appendix B -- have a base. Adding such a method to integers (and, I presume, longs) would beg the question why floats, Decimals and complex numbers don't have them.
I dream of a day when str(3.25, base=2) == '11.01'. That is the number a float really represents. It would be so much easier to understand why floats behave the way they do if it were possible to print them in binary.
Actually if you wanted something that closely represents what a floating point number is then you would want to see this:: >>> str(3.25, base=2) '1.101e1' >>> str(0.25, base=2) '1.0e-10' Printing the bits without an exponent is nearly as misleading as printing them in decimal. -bob
On Jan 17, 2006, at 5:01 PM, Jack Diederich wrote:
On Tue, Jan 17, 2006 at 04:02:43PM -0800, Guido van Rossum wrote:
On 1/17/06, Adam Olsen
wrote: In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away.
Has this been proposed? What would %b print?
It was proposed in this or another thread about the same in the last few days (gmane search doesn't like the % in '%b').
The suggestion is to add 'b' as a sprintf-like format string %[<base>][.<pad>]b
Where the optional <base> is the base to print in and <pad> is the optional minimum length of chars to print (as I recall). Default is base 2.
Me? I like it.
Personally I would prefer the "b" format code to behave similarly to "o", "d", and "d", except for binary instead of octal, decimal, and hexadecimal. Something that needs to account for three factors (zero pad, space pad, base) should probably be a function (maybe a builtin). Hell, maybe it could take a fourth argument to specify how a negative number should be printed (e.g. a number of bits to use for the 2's complement). However... if %b were to represent arbitrary bases, I think that's backwards. It should be %[<pad>][.<base>]b, which would do this: >>> '%08b %08o %08d %08x' % 12 '00001100 00000014 00000012 0000000C' Where your suggestion would have this behavior (or something close to it): >>> '%08b %08o %08d %08x' % 12 '14 00000014 00000012 0000000C' -bob
On 1/17/06, Bob Ippolito
On Jan 17, 2006, at 3:38 PM, Adam Olsen wrote:
I dream of a day when str(3.25, base=2) == '11.01'. That is the number a float really represents. It would be so much easier to understand why floats behave the way they do if it were possible to print them in binary.
Actually if you wanted something that closely represents what a floating point number is then you would want to see this::
>>> str(3.25, base=2) '1.101e1' >>> str(0.25, base=2) '1.0e-10'
Printing the bits without an exponent is nearly as misleading as printing them in decimal.
I disagree. The exponent is involved in rounding to fit in compact storage but once that is complete the value can be represented exactly without it. -- Adam Olsen, aka Rhamphoryncus
On 1/17/06, Bob Ippolito
On Jan 17, 2006, at 4:09 PM, Adam Olsen wrote:
On 1/17/06, Guido van Rossum
wrote: On 1/17/06, Adam Olsen
wrote: In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away.
Has this been proposed? What would %b print?
I don't believe it's been proposed and I don't know what it'd print. Perhaps it indicates the bytes should be passed through without conversion.
That doesn't make any sense. What is "without conversion"? Does that mean UTF-8, UCS-2, UCS-4, latin-1, Shift-JIS? You can't have unicode without some kind of conversion.
In any case I only advocate waiting until it's clear that bytes have no need for it before we use it for binary conversions.
I don't see what business a byte type has mingling with string formatters other than the normal str and repr coercions via %s and %r respectively.
Is the byte type intended to be involved in string formatters at all? Does byte("%i") % 3 have the obvious effect, or is it an error? Although upon further consideration I don't see any case where %s and %b would have different effects.. *shrug* I never said it did have a purpose, just that it *might* be given a purpose when byte was spec'd out. -- Adam Olsen, aka Rhamphoryncus
On Tue, Jan 17, 2006 at 06:11:36PM -0800, Bob Ippolito wrote:
On Jan 17, 2006, at 5:01 PM, Jack Diederich wrote:
On Tue, Jan 17, 2006 at 04:02:43PM -0800, Guido van Rossum wrote:
On 1/17/06, Adam Olsen
wrote: In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away.
Has this been proposed? What would %b print?
It was proposed in this or another thread about the same in the last few days (gmane search doesn't like the % in '%b').
The suggestion is to add 'b' as a sprintf-like format string %[<base>][.<pad>]b
Where the optional <base> is the base to print in and <pad> is the optional minimum length of chars to print (as I recall). Default is base 2.
Me? I like it.
Personally I would prefer the "b" format code to behave similarly to "o", "d", and "d", except for binary instead of octal, decimal, and hexadecimal. Something that needs to account for three factors (zero pad, space pad, base) should probably be a function (maybe a builtin). Hell, maybe it could take a fourth argument to specify how a negative number should be printed (e.g. a number of bits to use for the 2's complement).
However... if %b were to represent arbitrary bases, I think that's backwards. It should be %[<pad>][.<base>]b, which would do this:
'%08b %08o %08d %08x' % 12 '00001100 00000014 00000012 0000000C'
Were I BDFAD (not to be confused with BD-FOAD) I'd add %b, %B and, binary() to match %x, %X, and hex(). The arbitrary base case isn't even academic or we would see homework questions about it on c.l.py. No one asks about hex or octal because they are there. No one asks about base seven formatting because everyone knows numerologists prefer Perl. -Jack nb, that's "For A Day."
On Jan 17, 2006, at 7:12 PM, Jack Diederich wrote:
On Tue, Jan 17, 2006 at 06:11:36PM -0800, Bob Ippolito wrote:
On Jan 17, 2006, at 5:01 PM, Jack Diederich wrote:
On Tue, Jan 17, 2006 at 04:02:43PM -0800, Guido van Rossum wrote:
On 1/17/06, Adam Olsen
wrote: In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away.
Has this been proposed? What would %b print?
It was proposed in this or another thread about the same in the last few days (gmane search doesn't like the % in '%b').
The suggestion is to add 'b' as a sprintf-like format string %[<base>][.<pad>]b
Where the optional <base> is the base to print in and <pad> is the optional minimum length of chars to print (as I recall). Default is base 2.
Me? I like it.
Personally I would prefer the "b" format code to behave similarly to "o", "d", and "d", except for binary instead of octal, decimal, and hexadecimal. Something that needs to account for three factors (zero pad, space pad, base) should probably be a function (maybe a builtin). Hell, maybe it could take a fourth argument to specify how a negative number should be printed (e.g. a number of bits to use for the 2's complement).
However... if %b were to represent arbitrary bases, I think that's backwards. It should be %[<pad>][.<base>]b, which would do this:
'%08b %08o %08d %08x' % 12 '00001100 00000014 00000012 0000000C'
Were I BDFAD (not to be confused with BD-FOAD) I'd add %b, %B and, binary() to match %x, %X, and hex(). The arbitrary base case isn't even academic or we would see homework questions about it on c.l.py. No one asks about hex or octal because they are there. No one asks about base seven formatting because everyone knows numerologists prefer Perl.
There shouldn't be a %B for the same reason there isn't an %O or %D -- they're all just digits, so there's not a need for an uppercase variant. The difference between hex() and oct() and the proposed binary() is that hex() and oct() return valid Python expressions in that base. In order for it to make sense, Python would need to grow some syntax. If Python were to have syntax for binary literals, I'd propose a trailing b: "1100b". It would be convenient at times to represent bit flags, but I'm not sure it's worth the syntax change. binarydigit ::= ("0" | "1") binaryinteger ::= binarydigit+ "b" integer ::= decimalinteger | octinteger | hexinteger | binaryinteger -bob
On 1/17/06, Bob Ippolito
There shouldn't be a %B for the same reason there isn't an %O or %D -- they're all just digits, so there's not a need for an uppercase variant.
Right.
The difference between hex() and oct() and the proposed binary() is
I'd propose bin() to stay in line with the short abbreviated names.
that hex() and oct() return valid Python expressions in that base. In order for it to make sense, Python would need to grow some syntax.
Fair enough. So let's define it.
If Python were to have syntax for binary literals, I'd propose a trailing b: "1100b". It would be convenient at times to represent bit flags, but I'm not sure it's worth the syntax change.
Typically, suffixes are used to indicated *types*: 12L, 12j, and even 12e0 in some sense. The binary type should have a 0b prefix. Perhaps this could be implemented at the PyCon sprint? -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On 1/17/06, Guido van Rossum
On 1/17/06, Bob Ippolito
wrote: There shouldn't be a %B for the same reason there isn't an %O or %D -- they're all just digits, so there's not a need for an uppercase variant.
Right.
The difference between hex() and oct() and the proposed binary() is
I'd propose bin() to stay in line with the short abbreviated names.
that hex() and oct() return valid Python expressions in that base. In order for it to make sense, Python would need to grow some syntax.
Fair enough. So let's define it.
If Python were to have syntax for binary literals, I'd propose a trailing b: "1100b". It would be convenient at times to represent bit flags, but I'm not sure it's worth the syntax change.
Typically, suffixes are used to indicated *types*: 12L, 12j, and even 12e0 in some sense.
The binary type should have a 0b prefix.
0b101 for 5?
Perhaps this could be implemented at the PyCon sprint?
Added to the wiki along with possibly hashing out the bytes type. -Brett
On 1/17/06, Guido van Rossum
The difference between hex() and oct() and the proposed binary() is
I'd propose bin() to stay in line with the short abbreviated names.
Are these features used enough to have 3 builtins? Would format(number, base) suffice? format(5, base=2) == '101' format(5, base=8) == '5' format(5, base=8, prefix=True) == '05' format(5, base=16) == '5' format(5, base=16, prefix=True) == '0x5' Or something like that. Then there can be symmetry with int() (arbitrary bases) and we get rid of 2 other builtins eventually. Not sure if there are/should be uses other than number formating.
that hex() and oct() return valid Python expressions in that base. In order for it to make sense, Python would need to grow some syntax.
Fair enough. So let's define it.
If Python were to have syntax for binary literals, I'd propose a trailing b: "1100b". It would be convenient at times to represent bit flags, but I'm not sure it's worth the syntax change.
Typically, suffixes are used to indicated *types*: 12L, 12j, and even 12e0 in some sense.
The binary type should have a 0b prefix.
-0. Is this common enough to add (even in 3k)? For the instances I could have used this, it would have been completely impractical since the hex strings were generally over 80 characters. n
Guido van Rossum wrote: [...]
I'd propose bin() to stay in line with the short abbreviated names.
[...]
The binary type should have a 0b prefix.
It seems odd to me to add both a builtin *and* new syntax for something that's occasionally handy, but only occasionally. If we're going to clutter a module with this function, why not e.g. the math module instead of builtins? I thought the consensus was that we had too many builtins already. Similarly, the need for 0b101 syntax seems pretty low to me when you can already do int("101", 2). -Andrew.
Jack Diederich
However... if %b were to represent arbitrary bases, I think that's backwards. It should be %[<pad>][.<base>]b, which would do this:
'%08b %08o %08d %08x' % 12 '00001100 00000014 00000012 0000000C'
Were I BDFAD (not to be confused with BD-FOAD) I'd add %b, %B and, binary() to match %x, %X, and hex(). The arbitrary base case isn't even academic or we would see homework questions about it on c.l.py. No one asks about hex or octal because they are there. No one asks about base seven formatting because everyone knows numerologists prefer Perl.
BTW, Perl already do binary literals and %b formatting so there is some precedence for it: $ perl -e '$a = 0b1100; printf "%08b %08o %08d %08x\n", $a, $a, $a, $a' 00001100 00000014 00000012 0000000c --Gisle
Adam Olsen wrote:
On 1/17/06, Bob Ippolito
wrote: On Jan 17, 2006, at 4:09 PM, Adam Olsen wrote:
On 1/17/06, Guido van Rossum
wrote: On 1/17/06, Adam Olsen
wrote: In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away.
Has this been proposed? What would %b print?
I don't believe it's been proposed and I don't know what it'd print. Perhaps it indicates the bytes should be passed through without conversion.
That doesn't make any sense. What is "without conversion"? Does that mean UTF-8, UCS-2, UCS-4, latin-1, Shift-JIS? You can't have unicode without some kind of conversion.
In any case I only advocate waiting until it's clear that bytes have no need for it before we use it for binary conversions.
I don't see what business a byte type has mingling with string formatters other than the normal str and repr coercions via %s and %r respectively.
Is the byte type intended to be involved in string formatters at all? Does byte("%i") % 3 have the obvious effect, or is it an error?
Although upon further consideration I don't see any case where %s and %b would have different effects.. *shrug* I never said it did have a purpose, just that it *might* be given a purpose when byte was spec'd out.
I suppose we'd better reserve "%q" for 'quirky types we just invented', too? ;-) regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/
Adam Olsen wrote:
On 1/17/06, Bob Ippolito
wrote: On Jan 17, 2006, at 3:38 PM, Adam Olsen wrote:
I dream of a day when str(3.25, base=2) == '11.01'. That is the number a float really represents. It would be so much easier to understand why floats behave the way they do if it were possible to print them in binary.
Actually if you wanted something that closely represents what a floating point number is then you would want to see this::
>>> str(3.25, base=2) '1.101e1' >>> str(0.25, base=2) '1.0e-10'
Printing the bits without an exponent is nearly as misleading as printing them in decimal.
I disagree. The exponent is involved in rounding to fit in compact storage but once that is complete the value can be represented exactly without it.
Albeit with excessively long representations for the larger values one sometimes sees represented in float form. Personally I wouldn't even be interested in seeing 1.3407807929942597e+154 written in fixed point form *in decimal*, let alone in binary where the representation, though unambiguous, would have over 500 bits, most of them zeros. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/
Steve Holden wrote: [...]
Personally I wouldn't even be interested in seeing 1.3407807929942597e+154 written in fixed point form *in decimal*, let alone in binary where the representation, though unambiguous, would have over 500 bits, most of them zeros.
Well, shot myself in the foot there of course, since the number I meant was actually 2.0 ** 512 (or 13407807929942597099574024998205846127479365820592393377723561443721764030073546 976801874298166903427690031858186486050853753882811946569946433649006084096.0) rather than the decimal approximation above. But I'm sure you get the point that fixed-point representations aren't always appropriate. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/
On Tue, 2006-01-17 at 16:38 -0700, Adam Olsen wrote:
On Tue, Jan 17, 2006 at 09:23:29AM -0500, Jason Orendorff wrote: [...] I dream of a day when str(3.25, base=2) == '11.01'. That is the number a float really represents. It would be so much easier to understand why floats behave the way they do if it were possible to
On 1/17/06, Thomas Wouters
wrote: print them in binary. [...]
Heh... that's pretty much why I used base16 float notation when doing fixed point stuff in assembler... uses less digits than binary, but easily visualised as bits. However, I do think that we could go overboard here... I don't know that we really need arbitrary base string formatting for all numeric types. I think this is a case of "very little gained for too much added complexity". If we really do, and someone is prepared to implement it, then I think adding "@base" is the best way to do it (see my half joking post earlier). If we only want arbitrary bases for integer types, the best way would be to leverage off the existing ".precision" so that it means ".base" for "%d".
In-favour-of-%2b-ly y'rs,
My only opposition to this is that the byte type may want to use it. I'd rather wait until byte is fully defined, implemented, and released in a python version before that option is taken away.
There's always "B" for bytes and "b" for bits... though I can't imagine
why byte would need it's own conversion type.
I'm not entirely sure everyone is on the same page for "%b" here... it
would only be a shorthand for "binary" in the same way that "%x" is for
"hexidecimal". It would not support arbitrary bases, and thus "%2b"
would mean a binary string with minimum length of 2 characters.
--
Donovan Baarda
On Tue, 2006-01-17 at 20:25 -0800, Guido van Rossum wrote:
On 1/17/06, Bob Ippolito
wrote: There shouldn't be a %B for the same reason there isn't an %O or %D -- they're all just digits, so there's not a need for an uppercase [...]
so %b is "binary", +1
The difference between hex() and oct() and the proposed binary() is
I'd propose bin() to stay in line with the short abbreviated names. [...]
+1
The binary type should have a 0b prefix. [...]
+1
For those who argue "who would ever use it?", I would :-)
Note that this does not support and is independent of supporting
arbitrary bases. I don't think we need to support arbitrary bases, but
if we did I would vote for ".precision" to mean ".base" for "%d"... ie;
"%3.3d" % 5 == " 12"
I think supporting arbitrary bases for floats is way overkill and not
worth considering.
--
Donovan Baarda
On Tue, Jan 17, 2006, Guido van Rossum wrote:
On 1/17/06, Bob Ippolito
wrote: The difference between hex() and oct() and the proposed binary() is
I'd propose bin() to stay in line with the short abbreviated names.
There has been some previous discussion about removing hex()/oct() from builtins for Python 3.0, IIRC. I sure don't think bin() belongs there.
The binary type should have a 0b prefix.
-0 on adding a new prefix; +1 on this syntax if we do. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "19. A language that doesn't affect the way you think about programming, is not worth knowing." --Alan Perlis
I'd propose bin() to stay in line with the short abbreviated names.
There has been some previous discussion about removing hex()/oct() from builtins for Python 3.0, IIRC. I sure don't think bin() belongs there.
Perhaps introduce a single function, base(val, radix=10, prefix=''), as a universal base converter that could replace bin(), hex(), oct(), etc. That would give us fewer builtins and provide an inverse for all the int() conversions (i.e. arbitrary bases). Also, it would allow an unprefixed output which is what I usually need. Raymond
Raymond> Perhaps introduce a single function, base(val, radix=10, Raymond> prefix=''), as a universal base converter that could replace Raymond> bin(), hex(), oct(), etc. Would it (should it) work with floats, decimals, complexes? I presume it would work with ints and longs. Skip
On 1/18/06, Donovan Baarda
I think supporting arbitrary bases for floats is way overkill and not worth considering.
If you mean actual base-3 floating-point arithmetic, I agree. That's outlandish. But if there were a stdlib function to format floats losslessly in hex or binary, Tim Peters would use it at least once every six weeks to illustrate the finer points of floating point arithmetic. <0.00390625 wink> +1.0 -j
[Raymond]
Perhaps introduce a single function, base(val, radix=10, prefix=''), as a universal base converter that could replace bin(), hex(), oct(), etc.
+1 on introducing base() [Skip]
Would it (should it) work with floats, decimals, complexes? I presume it would work with ints and longs.
While support for floats, decimals, etc. might be nice (and I certainly wouldn't complain if someone wanted to supply the patch) I don't think those features should be necessary for base()'s initial introduction. If they're there, great, but if not, I don't think that should hold up the patch... STeVe -- You can wordify anything if you just verb it. --- Bucky Katt, Get Fuzzy
Jason Orendorff wrote:
On 1/18/06, Donovan Baarda
wrote: I think supporting arbitrary bases for floats is way overkill and not worth considering.
If you mean actual base-3 floating-point arithmetic, I agree. That's outlandish.
But if there were a stdlib function to format floats losslessly in hex or binary, Tim Peters would use it at least once every six weeks to illustrate the finer points of floating point arithmetic. <0.00390625 wink>
+1.0
Nah, Tim's got the chops to use the struct model to get his point across. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/
On 1/18/06, Raymond Hettinger
I'd propose bin() to stay in line with the short abbreviated names.
There has been some previous discussion about removing hex()/oct() from builtins for Python 3.0, IIRC. I sure don't think bin() belongs there.
Perhaps introduce a single function, base(val, radix=10, prefix=''), as a universal base converter that could replace bin(), hex(), oct(), etc.
That would give us fewer builtins and provide an inverse for all the int() conversions (i.e. arbitrary bases). Also, it would allow an unprefixed output which is what I usually need.
+1. Differs from Neal's format() function by not magically determining the prefix from the radix which I like. -Brett
Brett Cannon wrote:
On 1/18/06, Raymond Hettinger
wrote: I'd propose bin() to stay in line with the short abbreviated names. There has been some previous discussion about removing hex()/oct() from builtins for Python 3.0, IIRC. I sure don't think bin() belongs there.
Perhaps introduce a single function, base(val, radix=10, prefix=''), as a universal base converter that could replace bin(), hex(), oct(), etc.
That would give us fewer builtins and provide an inverse for all the int() conversions (i.e. arbitrary bases). Also, it would allow an unprefixed output which is what I usually need.
+1. Differs from Neal's format() function by not magically determining the prefix from the radix which I like.
+1 here, too, particularly if hex/oct acquire Deprecation (or even just PendingDeprecation) warnings at the same time. I have my own reason for wanting to avoid the name format() - I'd still like to see it used one day to provide a builtin way to use string.Template syntax for arbitrary string formatting. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
On Jan 18, 2006, at 11:09 AM, Brett Cannon wrote:
On 1/18/06, Raymond Hettinger
wrote: I'd propose bin() to stay in line with the short abbreviated names.
There has been some previous discussion about removing hex()/oct() from builtins for Python 3.0, IIRC. I sure don't think bin() belongs there.
Perhaps introduce a single function, base(val, radix=10, prefix=''), as a universal base converter that could replace bin(), hex(), oct(), etc.
That would give us fewer builtins and provide an inverse for all the int() conversions (i.e. arbitrary bases). Also, it would allow an unprefixed output which is what I usually need.
+1. Differs from Neal's format() function by not magically determining the prefix from the radix which I like.
I'm not sure I see the advantage of, say, print base(x, radix=2, prefix='0b') versus print '0b'+base(x, radix=2) IOW, if the prefix needs to be explicitly specified anyway, what's the advantage of specifying it as an argument to base, rather than just string-concatenating it? Apart from that quibble, the base function appears to cover all the use cases for my proposed str-with-base, so, since it appears to attract less arguments, I'm definitely +1 on it. Alex
On 1/18/06, Alex Martelli
On Jan 18, 2006, at 11:09 AM, Brett Cannon wrote:
On 1/18/06, Raymond Hettinger
wrote: I'd propose bin() to stay in line with the short abbreviated names.
There has been some previous discussion about removing hex()/oct() from builtins for Python 3.0, IIRC. I sure don't think bin() belongs there.
Perhaps introduce a single function, base(val, radix=10, prefix=''), as a universal base converter that could replace bin(), hex(), oct(), etc.
That would give us fewer builtins and provide an inverse for all the int() conversions (i.e. arbitrary bases). Also, it would allow an unprefixed output which is what I usually need.
+1. Differs from Neal's format() function by not magically determining the prefix from the radix which I like.
I'm not sure I see the advantage of, say,
print base(x, radix=2, prefix='0b')
versus
print '0b'+base(x, radix=2)
IOW, if the prefix needs to be explicitly specified anyway, what's the advantage of specifying it as an argument to base, rather than just string-concatenating it?
It collects the data that is expected to be used in the common case in a single location/operation. This would allow you to do something like ``base(x, **radix_and_prefix_dict)`` and have everythihng in a nice, neat package. Plus the operation would be faster if base() is written in C. =) The other option is to go with Neal's solution for automatically including the prefix for known prefix types, but instead of it being a boolean, let it be a string argument. That means if you want no prefix you would just set the argument to the empty string. Not setting it will just use the most sensible prefix or none if one is not known for the specified radix. Could have something somewhere, like string or math, where more radix/prefix pairs can be added by the user and have base() reference that for its prefix values. IOW I am +0 on prefix in one of these forms. -Brett
Guido, we may be converging on a consensus for my proposal: base(value, radix=2) So far no one has shot at it, and it has gathered +1's from Steven, Alex, Brett, and Nick. To keep it simple, the proposal is for the value to be any int or long. With an underlying __base__ method call, it wouldn't be hard for someone to build it out to support other numeric types if the need arises. The output would have no prefixes. As Alex pointed out, it is easier and more direct to add those after the fact if needed. Care to pronounce on it? Raymond
On 1/18/06, Raymond Hettinger
Guido, we may be converging on a consensus for my proposal:
base(value, radix=2)
So far no one has shot at it, and it has gathered +1's from Steven, Alex, Brett, and Nick.
+1 for me too, but I'd also like to deprecate hex() and oct() and slate them for removal in 3k. To expand, valid radix values would be 2..36 (ie, same as for int). It was discussed putting base() in some module. Was there consensus about builtin vs a module? I'd prefer a module, but builtin is ok with me.
To keep it simple, the proposal is for the value to be any int or long. With an underlying __base__ method call, it wouldn't be hard for someone to build it out to support other numeric types if the need arises.
The output would have no prefixes. As Alex pointed out, it is easier and more direct to add those after the fact if needed.
+1
Care to pronounce on it?
Raymond
On Jan 18, 2006, at 11:37 PM, Neal Norwitz wrote:
On 1/18/06, Raymond Hettinger
wrote: Guido, we may be converging on a consensus for my proposal:
base(value, radix=2)
So far no one has shot at it, and it has gathered +1's from Steven, Alex, Brett, and Nick.
+1 for me too, but I'd also like to deprecate hex() and oct() and slate them for removal in 3k.
To expand, valid radix values would be 2..36 (ie, same as for int). It was discussed putting base() in some module. Was there consensus about builtin vs a module? I'd prefer a module, but builtin is ok with me.
I'd drop the default radix, or make it something common like 16... especially if hex and oct are to be py3k deprecated. +1 for: base(value, radix) +1 for: "%b" % (integer,) +0 for binary literals: 0b01101 -bob
On Wednesday 2006-01-18 16:55, Steven Bethard wrote:
[Raymond]
Perhaps introduce a single function, base(val, radix=10, prefix=''), as a universal base converter that could replace bin(), hex(), oct(), etc.
+1 on introducing base()
Introducing a new builtin with a name that's a common, short English word is a bit disagreeable. The other thing about the name "base" is that it's not entirely obvious which way it converts: do you say base(123,5) to get a string representing 123 in base 5, or base("123",5) to get the integer whose base 5 representation is "123"? Well, one option would be to have both of those work :-). (Some people may need to do some deep breathing while reciting the mantra "practicality beats purity" in order to contemplate that with equanimity.) Alternatively, a name like "to_base" that clarifies the intent and is less likely to clash with variable names might be an improvement. Or there's always %b, whether that ends up standing for "binary" or "base". Or %b for binary and %r for radix, not forgetting the modifiers to get numbers formatted as Roman numerals. -- Gareth McCaughan
On Thu, Jan 19, 2006 at 10:23:30AM +0000, Gareth McCaughan wrote:
+1 on introducing base()
Introducing a new builtin with a name that's a common, short English word is a bit disagreeable.
While I don't particularly mind the new function in either the builtin module or another, like math, I don't understand the problem with the name. Most builtin names are short and english-ish words. I like that, I'm glad they are that way, and the two names I dislike most are 'isinstance' and 'issubclass'.
The other thing about the name "base" is that it's not entirely obvious which way it converts: do you say
base(123,5)
to get a string representing 123 in base 5, or
base("123",5)
to get the integer whose base 5 representation is "123"?
This is an argument for 'str(123, 5)', but I don't agree. Not _everything_ has to be obvious at first glance. The very same could be said about hex(), oct(), dir(), even names like list() (what does it list?), str() (stripping something?), etc. Having int() do it one way and base() the other makes fine sense to me, and I don't see it as any harder to explain than, say, why hex("123") doesn't return 291. I've personally never had to explain hex/oct's behaviour. While I think 'str' would be a slightly better name than 'base' (despite the specialcasing of numbers,) I don't mind either. I do mind names like 'to_base' or 'as_str' or 'shouldbestr' or other names that make me turn on autocompletion in my editor.
Alternatively, a name like "to_base" that clarifies the intent and is less likely to clash with variable names might be an improvement.
Builtins aren't reserved names, so the clash is minimal.
--
Thomas Wouters
On Thursday 2006-01-19 11:15, Thomas Wouters wrote:
On Thu, Jan 19, 2006 at 10:23:30AM +0000, Gareth McCaughan wrote: ...
Introducing a new builtin with a name that's a common, short English word is a bit disagreeable.
While I don't particularly mind the new function in either the builtin module or another, like math, I don't understand the problem with the name. Most builtin names are short and english-ish words. I like that, I'm glad they are that way, and the two names I dislike most are 'isinstance' and 'issubclass'.
"issubclass" is horrible because wordsjammedtogetherlikethisarehardtoread, especially when they're misleading as to pronunciation ("iss-ubclass"). Short English words are nice because they're easy to type and (at least sometimes) their meanings are immediately obvious. For the same reason, they're useful as variable names. Of course the world doesn't end if a builtin name is the same as a variable name you'd like to use, but it's ... well, "a bit disagreeable" probably expresses about the right degree of nuisance.
The other thing about the name "base" is that it's not entirely obvious which way it converts: do you say
base(123,5)
to get a string representing 123 in base 5, or
base("123",5)
to get the integer whose base 5 representation is "123"?
This is an argument for 'str(123, 5)', but I don't agree.
It's not (intended as) an argument *for* any particular form.
Not _everything_ has to be obvious at first glance. The very same could be said about hex(), oct(), dir(), even names like list() (what does it list?), str() (stripping something?), etc.
Actually, I happen to dislike hex() slightly -- I never use or see oct(), so don't much care about that -- for exactly that reason.
Having int() do it one way and base() the other makes fine sense to me, and I don't see it as any harder to explain than, say, why hex("123") doesn't return 291. I've personally never had to explain hex/oct's behaviour.
To me, base() is less obvious than hex(), which itself is just ambiguous enough to cost me maybe one second per month. Not a big deal at all, but not zero.
While I think 'str' would be a slightly better name than 'base' (despite the specialcasing of numbers,) I don't mind either. I do mind names like 'to_base' or 'as_str' or 'shouldbestr' or other names that make me turn on autocompletion in my editor.
You need to turn off the Python Mind Control feature. :-) I think math.base (apart from sounding like it ought to be a variable that controls the base in which numbers are represented, or something of the sort) is about as much typing as to_base, so I'm not sure how the latter can be much worse in this respect.
Alternatively, a name like "to_base" that clarifies the intent and is less likely to clash with variable names might be an improvement.
Builtins aren't reserved names, so the clash is minimal.
Sure, it's not disabling. But in practice it's nice to be able to avoid builtin names, and "base" is a word I'd rather not have to take measures to avoid: too many meanings, some of them quite common. (I don't care much about this, and if base() gets introduced I shan't complain.) -- g
On 1/18/06, Raymond Hettinger
Guido, we may be converging on a consensus for my proposal:
base(value, radix=2)
So far no one has shot at it, and it has gathered +1's from Steven, Alex, Brett, and Nick.
I think we ought to let this sit for a while and come back to it in a few week's time. Is 'base' really the right name? It could just as well be considered a conversion in the other direction. In common usage, 'base' and 'radix' are about synonymous (except no-one uses radix). Pethaps the 2nd argument should not be a keyword argument? Also, this discussion made me wonder if conversions using other bases than 10 should even be built-ins. A library module seems a more appropriate place. The prevalence here of people who actually use hex numbers on a regular basis is probably easily explained by a combination of old-timers, language implementers, and super-geeks; hardly the typical Python user. The desire of (bright) beginners to do any kind of non-decimal conversion probably stems from a misguided meme (dating back to the invention of computers) that in order to learn about computers you ought to begin by learning about Boolean algebra and binary numbers. That might be true long ago, but today, binary, octal and hexadecimal numbers are mostly a curiosity used in obscure low-level APIs like ioctl().
To keep it simple, the proposal is for the value to be any int or long. With an underlying __base__ method call, it wouldn't be hard for someone to build it out to support other numeric types if the need arises.
Let's not. What would 3.14 be expressed in base 3?
The output would have no prefixes. As Alex pointed out, it is easier and more direct to add those after the fact if needed.
Agreed.
Care to pronounce on it?
Rather not yet. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
I'm not sure I believe this should be a builtin. I think the
threshold for new builtins ought to be nearly as high as the threshold
for new keywords. Or the proposer ought to make an argument about
what the function should not go in a module.
Jeremy
On 1/19/06, Guido van Rossum
On 1/18/06, Raymond Hettinger
wrote: Guido, we may be converging on a consensus for my proposal:
base(value, radix=2)
So far no one has shot at it, and it has gathered +1's from Steven, Alex, Brett, and Nick.
I think we ought to let this sit for a while and come back to it in a few week's time. Is 'base' really the right name? It could just as well be considered a conversion in the other direction. In common usage, 'base' and 'radix' are about synonymous (except no-one uses radix). Pethaps the 2nd argument should not be a keyword argument?
Also, this discussion made me wonder if conversions using other bases than 10 should even be built-ins. A library module seems a more appropriate place. The prevalence here of people who actually use hex numbers on a regular basis is probably easily explained by a combination of old-timers, language implementers, and super-geeks; hardly the typical Python user. The desire of (bright) beginners to do any kind of non-decimal conversion probably stems from a misguided meme (dating back to the invention of computers) that in order to learn about computers you ought to begin by learning about Boolean algebra and binary numbers. That might be true long ago, but today, binary, octal and hexadecimal numbers are mostly a curiosity used in obscure low-level APIs like ioctl().
To keep it simple, the proposal is for the value to be any int or long. With an underlying __base__ method call, it wouldn't be hard for someone to build it out to support other numeric types if the need arises.
Let's not. What would 3.14 be expressed in base 3?
The output would have no prefixes. As Alex pointed out, it is easier and more direct to add those after the fact if needed.
Agreed.
Care to pronounce on it?
Rather not yet.
-- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
Guido van Rossum wrote:
I think we ought to let this sit for a while and come back to it in a few week's time. Is 'base' really the right name? It could just as well be considered a conversion in the other direction.
the same applies to hex and oct, of course. as for base itself, I'm more concerned about the google product place- ment here. what's next? a froogle builtin? </F>
On Jan 19, 2006, at 10:31, Guido van Rossum wrote:
To keep it simple, the proposal is for the value to be any int or long. With an underlying __base__ method call, it wouldn't be hard for someone to build it out to support other numeric types if the need arises.
Let's not. What would 3.14 be expressed in base 3?
10.010210001 It turned out to be a fun aside, and I've attached my quick and dirty script as a strawman. For what it's worth, I don't like the name base() because it sounds like something I would call on a class to get it's bases. Perhaps nbase? And maybe fbase for the floating point one... Thanks, -Shane Holloway
On 1/19/06, Fredrik Lundh
Guido van Rossum wrote:
I think we ought to let this sit for a while and come back to it in a few week's time. Is 'base' really the right name? It could just as well be considered a conversion in the other direction.
the same applies to hex and oct, of course.
Right. And this is not a hypothetical issue either -- in Perl, hex and oct *do* work the other way I believe. More reasons to get rid of these in Python 3000. Perhaps we should also get rid of hex/oct lterals?
as for base itself, I'm more concerned about the google product place- ment here. what's next? a froogle builtin?
The default __import__ will use Google Code to locate an appropriate module to import instead of restricting itself to the boring and predictable sys.path. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Thu, Jan 19, 2006, Jeremy Hylton wrote:
I'm not sure I believe this should be a builtin. I think the threshold for new builtins ought to be nearly as high as the threshold for new keywords. Or the proposer ought to make an argument about what the function should not go in a module.
The way I'd put it, any function that wants to go in builtins should require a formal PEP. And in case it isn't clear, I'm +1 on deprecating oct()/hex() (or moving them into another module as convenience functions for base() -- just to make conversion easier). -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "19. A language that doesn't affect the way you think about programming, is not worth knowing." --Alan Perlis
Raymond Hettinger wrote:
That suggests that it would be better to simply add an int method: x.convert_to_base(7)
I'd suggest allowing: x.convert_to_base('0123456') Where the (str or unicode) argument is the list of digits in order. This would allow easy converting to base-64 and other weird formats, as well as providing decimal conversion into some unicode number ranges outside the ASCII group. --Scott David Daniels Scott.Daniels@Acm.Org
On 1/19/06, Aahz
On Thu, Jan 19, 2006, Jeremy Hylton wrote:
I'm not sure I believe this should be a builtin. I think the threshold for new builtins ought to be nearly as high as the threshold for new keywords. Or the proposer ought to make an argument about what the function should not go in a module.
The way I'd put it, any function that wants to go in builtins should require a formal PEP.
I'm suggesting a criterion for evaluating the choice of builtin vs. module, not making a suggestion about the process. Jeremy
On Jan 19, 2006, at 11:12 AM, Guido van Rossum wrote:
On 1/19/06, Fredrik Lundh
wrote: Guido van Rossum wrote:
I think we ought to let this sit for a while and come back to it in a few week's time. Is 'base' really the right name? It could just as well be considered a conversion in the other direction.
the same applies to hex and oct, of course.
Right. And this is not a hypothetical issue either -- in Perl, hex and oct *do* work the other way I believe. More reasons to get rid of these in Python 3000. Perhaps we should also get rid of hex/oct lterals?
In Perl, hex(n) is like int(n, 16) and oct(n) is like int(n, 8) -- but they "try very hard" to make sense out of the given scalar (e.g. more like int(n, 0) with a suggestion for base). $ perl -e 'print hex("12") . " " . oct("12") . " " . oct("0x12") . " " . hex("fubar")' 18 10 18 15 If you notice, oct("0x12") gives the hexadecimal result, and the functions will try and make a value even out of garbage data. In Ruby, you have the optional radix argument to_s and to_i, where to_i will just return 0 for invalid values. They will take any radix from 2..36. $ irb irb(main):001:0> 12.to_s => "12" irb(main):002:0> 12.to_s(16) => "c" irb(main):003:0> 12.to_s(8) => "14" irb(main):004:0> "12".to_i(8) => 10 irb(main):005:0> "12".to_i(16) => 18 irb(main):006:0> "0x12".to_i(16) => 18 irb(main):007:0> "0x12".to_i(8) => 0 irb(main):008:0> "0x12".to_i => 0 irb(main):009:0> "fubar".to_i => 0 irb(main):010:0> "fubar".to_i(36) => 26608563 -bob
Guido van Rossum wrote:
On 1/19/06, Fredrik Lundh
wrote: Guido van Rossum wrote:
I think we ought to let this sit for a while and come back to it in a few week's time. Is 'base' really the right name? It could just as well be considered a conversion in the other direction. the same applies to hex and oct, of course.
Right. And this is not a hypothetical issue either -- in Perl, hex and oct *do* work the other way I believe. More reasons to get rid of these in Python 3000. Perhaps we should also get rid of hex/oct lterals?
I'm not aware of anyone that would miss octal literals, but there are plenty of hardware weenies like me that would find "int("DEAD", 16)" less convenient than "0xDEAD". Python is a bit too heavyweight for a lot of embedded work, but its *great* for writing host-based test harnesses. I quite like the suggestion of using 'math.base' rather than a builtin, but there are still issues to be figured out there: - the math module is currently a thin wrapper around C's "math.h". Do we really want to change that by adding more methods? - is 'base' the right name? - should we allow a "digits" argument, or just the radix argument? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Nick Coghlan wrote:
Guido van Rossum wrote:
On 1/19/06, Fredrik Lundh
wrote: Guido van Rossum wrote:
I think we ought to let this sit for a while and come back to it in a few week's time. Is 'base' really the right name? It could just as well be considered a conversion in the other direction.
the same applies to hex and oct, of course.
Right. And this is not a hypothetical issue either -- in Perl, hex and oct *do* work the other way I believe. More reasons to get rid of these in Python 3000. Perhaps we should also get rid of hex/oct lterals?
I'm not aware of anyone that would miss octal literals, but there are plenty of hardware weenies like me that would find "int("DEAD", 16)" less convenient than "0xDEAD". Python is a bit too heavyweight for a lot of embedded work, but its *great* for writing host-based test harnesses.
I quite like the suggestion of using 'math.base' rather than a builtin, but there are still issues to be figured out there: - the math module is currently a thin wrapper around C's "math.h". Do we really want to change that by adding more methods? - is 'base' the right name? - should we allow a "digits" argument, or just the radix argument?
Another possibility, since Python 3 can break backward compatibility: we could take a page out of Icon's book and use an "rN" suffix for non-decimal literals. 23 == 27r8 == 17r16 regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/
[Nick Coghlan]
... I quite like the suggestion of using 'math.base' rather than a builtin, but there are still issues to be figured out there: - the math module is currently a thin wrapper around C's "math.h". Do we really want to change that by adding more methods?
That's not an issue. Some math functions go beyond C's (like 2-argument log(), and all flavors of log() returning sensible results for inputs larger than the largest C double), and some functions aren't part of C at all (like math.radians()). A stronger reason to keep it out of `math` is that all functions there operate on, and return, floats: it's a bizarre place to put an integer->string function.
- is 'base' the right name? - should we allow a "digits" argument, or just the radix argument?
Add to_base(self, radix=16) as a new numeric method. End of problem ;-)
On Fri, Jan 20, 2006 at 06:56:23AM +1000, Nick Coghlan wrote:
I'm not aware of anyone that would miss octal literals,
Except anyone who uses os.chmod. I would be mighty sad if we removed octal
and hexadecimal literals for 'cleanliness' reasons alone.
--
Thomas Wouters
On Jan 19, 2006, at 4:17 PM, Thomas Wouters wrote:
On Fri, Jan 20, 2006 at 06:56:23AM +1000, Nick Coghlan wrote:
I'm not aware of anyone that would miss octal literals,
Except anyone who uses os.chmod. I would be mighty sad if we removed octal and hexadecimal literals for 'cleanliness' reasons alone.
I have a LOT of code that has hex literals, and a little code with oct literals (as you say, just os.chmod). I'm -1 on removing hex and oct, and +0 on adding binary. As a point of reference, both Perl and Ruby support 0b110 binary literal syntax $ ruby -e 'print 0b110, "\n"' 6 $ perl -e 'print 0b110 . "\n"' 6 -bob
On Fri, 2006-01-20 at 06:56 +1000, Nick Coghlan wrote:
I'm not aware of anyone that would miss octal literals, but there are plenty of hardware weenies like me that would find "int("DEAD", 16)" less convenient than "0xDEAD".
Although octal literals is handy for things like os.chmod(). Unix weenies shouldn't be totally forgotten in P3K. I'm also for keeping hex() and oct() although if you want to move them out of builtins, that's fine. +0b1 for binary literals and %b. -Barry
"BAW" == Barry Warsaw
writes:
BAW> Unix weenies shouldn't be totally forgotten in P3K. Great idea! Put all this stuff in a "weenie" module. You can have weenie.unix and weenie.vms and weenie.unicode, besides the weenie.math that got all this started. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.
Guido van Rossum
Right. And this is not a hypothetical issue either -- in Perl, hex and oct *do* work the other way I believe. More reasons to get rid of these in Python 3000. Perhaps we should also get rid of hex/oct lterals?
I would like to argue for removing octal literals. This feature has a very bad property: it can cause obscure problems for people who do not know or care about it. I have seen people try to use leading zeroes to make integer literals line up in a table. If they are lucky, they will get a syntax error. If they are unlucky, their program will silently do the wrong thing. It would be rather offputting to have to warn about this in the tutorial. But at present, a learner who isn't familiar with another language using this convention would have no reason to suspect it exists. As far as I can tell, it's documented only in the BNF. I think the safe thing in Python 3000 would be for literals with leading 0 to be syntax errors. Possibly os.chmod and os.umask could be extended to take a string argument so we could write chmod(path, "0640"). -M-
Andrew Koenig
Possibly os.chmod and os.umask could be extended to take a string argument so we could write chmod(path, "0640").
-1.
Would you really want chmod(path, 0640) and chmod(path, "0640") to have different meanings?
I want the former to be a syntax error, as I said in the preceding paragraph. -M-
On Tuesday 31 January 2006 14:55, Andrew Koenig wrote:
Would you really want chmod(path, 0640) and chmod(path, "0640") to have different meanings?
Actually, the proposal that suggested this also proposed that 0640 would raise a SyntaxError, since it was all about getting rid of octal literals. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
On 1/31/06, Andrew Koenig
Possibly os.chmod and os.umask could be extended to take a string argument so we could write chmod(path, "0640").
-1.
Would you really want chmod(path, 0640) and chmod(path, "0640") to have different meanings?
Apart from making 0640 a syntax error (which I think is wrong too), could this be solved by *requiring* the argument to be a string? (Or some other data type, but that's probably overkill.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Apart from making 0640 a syntax error (which I think is wrong too), could this be solved by *requiring* the argument to be a string? (Or some other data type, but that's probably overkill.)
That solves the problem only in that particular context. I would think that if it is deemed undesirable for a leading 0 to imply octal, then it would be best to decide on a different syntax for octal literals and use that syntax consistently everywhere. I am personally partial to allowing an optional radix (in decimal) followed by the letter r at the beginning of a literal, so 19, 8r23, and 16r13 would all represent the same value.
Apart from making 0640 a syntax error (which I think is wrong too), could this be solved by *requiring* the argument to be a string? (Or some other data type, but that's probably overkill.)
That solves the problem only in that particular context.
I would think that if it is deemed undesirable for a leading 0 to imply octal, then it would be best to decide on a different syntax for octal literals and use that syntax consistently everywhere.
I am personally partial to allowing an optional radix (in decimal) followed by the letter r at the beginning of a literal, so 19, 8r23, and 16r13 would all represent the same value. In that case, could I also make a pitch for the letter c which would similarly follow a radix (in decimal) but would introduce the rest of the number as a radix-complement signed number, e.g., -2, 16cfe, 8c76, 2c110, 10c98 would all have the same value, and the sign-digit could be arbitrarily repeated to
On Tue, 31 Jan 2006 17:17:22 -0500, "Andrew Koenig"
2006/1/31, Bengt Richter
In that case, could I also make a pitch for the letter c which would similarly follow a radix (in decimal) but would introduce the rest of the number as a radix-complement signed number, e.g., -2, 16cfe, 8c76, 2c110, 10c98 would all have the same value, and the sign-digit could be arbitrarily repeated to the left without changing the value, e.g., -2, 16cfffe, 8c776, 2c1110, 10c99998 would all have the same value. Likewise the positive values, where the "sign-digit" would be 0 instead of radix-1 (in the particular digit set for the radix). E.g., 2, 16c02, 16c0002, 8c02, 8c0002, 2c010, 2c0010, 10c02, 10c00002, etc. Of course you can put a unary minus in front of any of those, so -16f7 == 1609, and -2c0110 == -6 == 2c1010 etc.
This is getting too complicated. I dont' want to read code and pause myself 5 minutes while doing math to understand a number. I think that the whole point of modifying something is to simplify it. I'm +0 on removing 0-leading literals. But only if we create "d", "h" and "o" suffixes to represent decimal, hex and octal literals (2.35d, 3Fh, 660o). And +0 on keeping the "0x" preffix for hexa (c'mon, it seems so natural....). Regards, . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/
On Tue, 2006-01-31 at 17:17 -0500, Andrew Koenig wrote:
Apart from making 0640 a syntax error (which I think is wrong too), could this be solved by *requiring* the argument to be a string? (Or some other data type, but that's probably overkill.)
That solves the problem only in that particular context.
I would think that if it is deemed undesirable for a leading 0 to imply octal, then it would be best to decide on a different syntax for octal literals and use that syntax consistently everywhere.
+1, and then issue a warning every time the parser sees leading 0 octal constant instead of the new syntax, although the old syntax would continue to work for compatibility reasons.
I am personally partial to allowing an optional radix (in decimal) followed by the letter r at the beginning of a literal, so 19, 8r23, and 16r13 would all represent the same value.
For me, adding the radix to the right instead of left looks nicer:
23r8, 13r16, etc., since a radix is almost like a unit, and units are
always to the right. Plus, we already use suffix characters to the
right, like 10L. And I seem to recall an old assembler (a z80
assembler, IIRC :P) that used a syntax like 10h and 11b for hex an bin
radix.
Hmm.. I'm beginning to think 13r16 or 16r13 look too cryptic to the
casual observer; perhaps a suffix letter is more readable, since we
don't need arbitrary radix support anyway.
/me thinks of some examples:
644o # I _think_ the small 'o' cannot be easily confused with 0 or O,
but..
10h # hex.. hm.. but we already have 0x10
101b # binary
Another possility is to extend the 0x syntax to non-hex,
0xff # hex
0o644 # octal
0b1101 # binary
I'm unsure which one I like better.
Regards,
--
Gustavo J. A. M. Carneiro
On 2/1/06, Gustavo J. A. M. Carneiro
On Tue, 2006-01-31 at 17:17 -0500, Andrew Koenig wrote:
I am personally partial to allowing an optional radix (in decimal) followed by the letter r at the beginning of a literal, so 19, 8r23, and 16r13 would all represent the same value.
For me, adding the radix to the right instead of left looks nicer: 23r8, 13r16, etc., since a radix is almost like a unit, and units are always to the right. Plus, we already use suffix characters to the right, like 10L. And I seem to recall an old assembler (a z80 assembler, IIRC :P) that used a syntax like 10h and 11b for hex an bin radix.
ffr16 #16rff or 255 Iamadeadparrotr36 # 36rIamadeadparrot or 3120788520272999375597 Suffix syntax for bases higher than 10 is ambiguous with variable names. Prefix syntax is not. -- Adam Olsen, aka Rhamphoryncus
On Wed, 01 Feb 2006 12:33:36 +0000, "Gustavo J. A. M. Carneiro"
Hmm.. I'm beginning to think 13r16 or 16r13 look too cryptic to the casual observer; perhaps a suffix letter is more readable, since we don't need arbitrary radix support anyway.
/me thinks of some examples:
644o # I _think_ the small 'o' cannot be easily confused with 0 or O, but.. 10h # hex.. hm.. but we already have 0x10 101b # binary
Another possility is to extend the 0x syntax to non-hex,
0xff # hex 0o644 # octal 0b1101 # binary
I'm unsure which one I like better.
Sorry if I seem to be picking nits, but IMO there's more than a nit here: The trouble with all of these is that they are all literals for integers, but integers are signed, and there is no way to represent the sign bit (wherever it is for a particular platform) along with the others, without triggering a promotion to positive long. So you get stuff like
def i32(i): return int(-(i&0x80000000))+int(i&0x7fffffff) ... MYCONST = i32(0x87654321) MYCONST -2023406815 type(MYCONST)
hex(MYCONST) '-0x789abcdf' Oops ;-/ hex(MYCONST&0xffffffff) '0x87654321L'
instead of MYCONST = 16cf87654321 Hm... maybe an explicit ordinary sign _after_ the prefix would be more mnemonic instead of indicating it with the radix-complement (f or 0 for hex). E.g., MYCONST = 16r-87654321 # all bits above the 8 are ones and MYCONST = 16r+87654321 # explicitly positive, all bits above 8 (none for 32 bits) are zeroes MYCONST = 16r87654321 # implicitly positive, ditto or the above in binary MYCONST = 2r-10000111011001010100001100100001 # leading bits are ones (here all are specified for 32-bit int, but # effect would be noticeable for smaller numbers or wider ints) MYCONST = 2r+10000111011001010100001100100001 # leading bits are zeroes (ditto) MYCONST = 2r10000111011001010100001100100001 # ditto This could also be done as alternative 0x syntax, e.g. using 0h, 0o, and 0b, but I sure don't like that '0o' ;-) BTW, for non-power-of-two radices(?), it should be remembered that the '-' is mnemonic for the symbol for (radix-1), and '+' or no sign is mnemonic for a prefixed 0 (which is 0 in any allowable radix) in order to have this notation have general radix expressivity for free ;-) Regards, Bengt Richter
bokr@oz.net (Bengt Richter) wrote:
On Wed, 01 Feb 2006 12:33:36 +0000, "Gustavo J. A. M. Carneiro"
wrote: [...] Hmm.. I'm beginning to think 13r16 or 16r13 look too cryptic to the casual observer; perhaps a suffix letter is more readable, since we don't need arbitrary radix support anyway.
[snip discussion over radix and compliments] I hope I'm not the only one who thinks that "simple is better than complex", at least when it comes to numeric constants. Certainly it would be _convenient_ to express constants in a radix other than decimal, hexidecimal, or octal, but to me, it all looks like noise. Peronally, I was on board for the removal of octal literals, if only because I find _seeing_ a leading zero without something else (like the 'x' for hexidecimal) to be difficult, and because I've found little use for them in my work (decimals and hex are usually all I need). Should it change for me? Of course not, but I think that adding different ways to spell integer values will tend to confuse new and seasoned python users. Some will like the flexibility that adding new options offers, but I believe such a change will be a net loss for the understandability of those pieces of code which use it. - Josiah
On Wed, 01 Feb 2006 09:47:34 -0800, Josiah Carlson
bokr@oz.net (Bengt Richter) wrote:
On Wed, 01 Feb 2006 12:33:36 +0000, "Gustavo J. A. M. Carneiro"
wrote: [...] Hmm.. I'm beginning to think 13r16 or 16r13 look too cryptic to the casual observer; perhaps a suffix letter is more readable, since we don't need arbitrary radix support anyway.
[snip discussion over radix and compliments]
I hope I'm not the only one who thinks that "simple is better than complex", at least when it comes to numeric constants. Certainly it would be _convenient_ to express constants in a radix other than decimal, hexidecimal, or octal, but to me, it all looks like noise.
You don't have to use any other radix, any more than you have to use all forms of float literals if you are happy with xx.yy. The others just become available through a consistent methodology.
Peronally, I was on board for the removal of octal literals, if only because I find _seeing_ a leading zero without something else (like the 'x' for hexidecimal) to be difficult, and because I've found little use for them in my work (decimals and hex are usually all I need).
I agree that 8r641 is more easily disambiguated than 0641 ;-) But how do you represent a negative int in hex? Or have you never encountered the need? The failure of current formats with respect to negative values whose values you want to specify in a bit-specifying format was my main point. Regards, Bengt Richter
On Wed, 2006-02-01 at 09:47 -0800, Josiah Carlson wrote:
I hope I'm not the only one who thinks that "simple is better than complex", at least when it comes to numeric constants. Certainly it would be _convenient_ to express constants in a radix other than decimal, hexidecimal, or octal, but to me, it all looks like noise.
As a Unix weenie and occasional bit twiddler, I've had needs for octal, hex, and binary literals. +1 for coming up with a common syntax for these. -1 on removing any way to write octal literals. The proposal for something like 0xff, 0o664, and 0b1001001 seems like the right direction, although 'o' for octal literal looks kind of funky. Maybe 'c' for oCtal? (remember it's 'x' for heXadecimal). -Barry
On Wed, 1 Feb 2006, Barry Warsaw wrote:
The proposal for something like 0xff, 0o664, and 0b1001001 seems like the right direction, although 'o' for octal literal looks kind of funky. Maybe 'c' for oCtal? (remember it's 'x' for heXadecimal).
Shouldn't it be 0t644 then, and 0n1001001 for binary ? That would sidestep the issue of 'b' and 'c' being valid hexadecimal digits as well. Regarding negative numbers, I think they're a red herring. If there is any need for a new literal format, it would be to express ~0x0f, not -0x10. 1xf0 has been proposed before, but I think YAGNI. /Paul
bokr@oz.net (Bengt Richter) wrote:
On Wed, 01 Feb 2006 09:47:34 -0800, Josiah Carlson
wrote: bokr@oz.net (Bengt Richter) wrote:
On Wed, 01 Feb 2006 12:33:36 +0000, "Gustavo J. A. M. Carneiro"
wrote: [...] Hmm.. I'm beginning to think 13r16 or 16r13 look too cryptic to the casual observer; perhaps a suffix letter is more readable, since we don't need arbitrary radix support anyway.
[snip discussion over radix and compliments]
I hope I'm not the only one who thinks that "simple is better than complex", at least when it comes to numeric constants. Certainly it would be _convenient_ to express constants in a radix other than decimal, hexidecimal, or octal, but to me, it all looks like noise.
You don't have to use any other radix, any more than you have to use all forms of float literals if you are happy with xx.yy. The others just become available through a consistent methodology.
Peronally, I was on board for the removal of octal literals, if only because I find _seeing_ a leading zero without something else (like the 'x' for hexidecimal) to be difficult, and because I've found little use for them in my work (decimals and hex are usually all I need).
I agree that 8r641 is more easily disambiguated than 0641 ;-)
But how do you represent a negative int in hex? Or have you never encountered the need? The failure of current formats with respect to negative values whose values you want to specify in a bit-specifying format was my main point.
In my experience, I've rarely had the opportunity (or misfortune?) to deal with negative constants, whose exact bit representation I needed to get "just right". For my uses, I find that specifying "-0x..." or "-..." to be sufficient. Certainly it may or may not be the case in what you are doing (hence your exposition on signs, radixes, etc.). Would the i32() function you previously defined, as well as a utility h32() function which does the reverse be a reasonable start? Are there any radixes beyond binary, octal, decimal, and hexidecimal that people want to use? Does it make sense to create YYrXXXXX syntax for integer literals for basically 4 representations, all of which can be handled by int('XXXXXX', YY) (ignoring the runtime overhead)? Does the suffix idea for different types (long, decimal, ...) necessarily suggest that suffixes for radixes for one type (int/long) is a good idea (1011b, 2000o, ...) are a good idea? I'll expand what I said before; there are many things that would make integer literals more convenient for heavy (or experienced) users of non-decimal or non-decimal-non-positive literals, but it wouldn't necessarily increase the understandability of code which uses them. - Josiah
On Wed, 2006-02-01 at 11:07 -0800, Josiah Carlson wrote:
In my experience, I've rarely had the opportunity (or misfortune?) to deal with negative constants, whose exact bit representation I needed to get "just right". For my uses, I find that specifying "-0x..." or "-..." to be sufficient.
I can't remember a time when signed hex, oct, or binary representation wasn't a major inconvenience, let alone something desirable. Don't get me started about hex(id(object()))! I typically use hex for addresses and bit fields, binary for bit flags and other bit twiddling, and oct for OS/file system interfaces. In none of those cases do you actually need or want signed values. IME. -Barry
On Wed, 1 Feb 2006 13:54:49 -0500 (EST), Paul Svensson
On Wed, 1 Feb 2006, Barry Warsaw wrote:
The proposal for something like 0xff, 0o664, and 0b1001001 seems like the right direction, although 'o' for octal literal looks kind of funky. Maybe 'c' for oCtal? (remember it's 'x' for heXadecimal).
Shouldn't it be 0t644 then, and 0n1001001 for binary ? That would sidestep the issue of 'b' and 'c' being valid hexadecimal digits as well.
Regarding negative numbers, I think they're a red herring. If there is any need for a new literal format, it would be to express ~0x0f, not -0x10. 1xf0 has been proposed before, but I think YAGNI.
YMMV re YAGNI, but you have an excellent point re negative numbers vs ~. If you look at examples, the representation digits _are_ actually "~" ;-) I.e., I first proposed 'c' in place of 'r' for 16cf0, where "c" stands for radix _complement_, and 0 and 1 are complements wrt 2, as are hex 0 and f wrt radix 16. So the actual notation has digits that are radix-complement, and are evaluated as such to get the integer value. So ~0x0f is represented r16-f0, which does produce a negative number (but whose integer value BTW is -0x10, not 0x0f. I.e., -16r-f0 == 16r+10, and the sign after the 'r' is a complement-notation indicator, not an algebraic sign. (Perhaps or '^' would be a better indicator, as -16r^f0 == 0x10) Thank you for making the point that the negative value per se is a red herring. Still, that is where the problem shows up: e.g. when we want to define a hex bit mask as an int and the sign bit happens to be set. IMO it's a wart that if you want to define bit masks as integer data, you have to invoke computation for the sign bit, e.g., BIT_0 = 0x1 BIT_1 = 0x02 ... BIT_30 = 0x40000000 BIT_31 = int(-0x80000000) instead of defining true literals all the way, e.g., BIT_0 = 16r1 BIT_1 = 16r2 # or 16r00000002 obviously ... BIT_30 = 16r+40000000 BIT_31 = 16r-80000000) and if you wanted to define the bit-wise complement masks as literals, you could, though radix-2 is certainly easier to see (introducing '_' as transparent elision) CBIT_0 = 16r-f # or 16r-fffffffe or 2r-0 or 2r-11111111_11111111_11111111_11111110 CBIT_1 = 16r-d # or 16r-fffffffd or 2r-01 or 2r-11111111_11111111_11111111_11111101 ... CBIT_30 = 16r-b0000000 or 2r-10111111_11111111_11111111_11111111 CBIT_31 = 16r+7fffffff or 2r+01111111_11111111_11111111_11111111 With constant-folding optimization and some kind of inference-guiding for expressions like -sys.maxint-1, perhaps computation vs true literals will become moot. And practically it already is, since a one-time computation is normally insignificant in time or space. But aren't we also targeting platforms also where space is at a premium, and being able to define constants as literal data without resorting to workaround pre-processing would be nice? BTW, base-complement decoding works by generalized analogy to twos complement decoding, by assuming that the most significant digit is a signed coefficient value for base**digitpos in radix-complement form, where the upper half of the range of digits represents negative values as digit-radix, and the rest positive as digit. The rest of the digits are all positive coefficients for base powers. E.g., to decode our simple example[1] represented as a literal in base-complement form (very little tested):
def bclitval(s, digits='0123456789abcdefghijklmnopqrstuvwxyz'): ... """ ... decode base complement literal of form <base>r<sign><digits> ... where ... <base> is in range(2,37) or more if digits supplied ... <sign> is a mnemonic + for digits[0] and - for digits[<base>-1] or absent ... <digits> are decoded as base-complement notation after <sign> if ... present is changed to appropriate digit. ... The first digit is taken as a signed coefficient with value ... digit-<base> (negative) if the digit*2>=B and digit (positive) otherwise. ... """ ... B, s = s.split('r', 1) ... B = int(B) ... if s[0] =='+': s = digits[0]+s[1:] ... elif s[0] =='-': s = digits[B-1]+s[1:] ... ds = digits.index(s[0]) ... if ds*2 >= B: acc = ds-B ... else: acc = ds ... for c in s[1:]: acc = acc*B + digits.index(c) ... return acc ... bclitval('16r80000004') -2147483644 bclitval('2r10000000000000000000000000000100') -2147483644
BTW, because of the decoding method, extended "sign" bits don't force promotion to a long value:
bclitval('16rffffffff80000004') -2147483644
[1] To reduce all this eye-glazing discussion to a simple example, how do people now use hex notation to define an integer bit-mask constant with bits 31 and 2 set? (assume 32-bit int for target platform, counting bit 0 as LSB and bit 31 as sign). Regards, Bengt Richter
On Feb 2, 2006, at 7:11 PM, Bengt Richter wrote:
[1] To reduce all this eye-glazing discussion to a simple example, how do people now use hex notation to define an integer bit-mask constant with bits 31 and 2 set?
That's easy: 0x80000004 That was broken in python < 2.4, though, so there you need to do: MASK = 2**32 - 1 0x80000004 & MASK
(assume 32-bit int for target platform, counting bit 0 as LSB and bit 31 as sign).
The 31st bit _isn't_ the sign bit in python and the bit-ness of the target platform doesn't matter. Python's integers are arbitrarily long. I'm not sure why you're trying to pretend as if python was C. James
On Thu, 2 Feb 2006 15:26:24 -0500, James Y Knight
On Feb 2, 2006, at 7:11 PM, Bengt Richter wrote:
[1] To reduce all this eye-glazing discussion to a simple example, how do people now use hex notation to define an integer bit-mask constant with bits ^^^^^^^ 31 and 2 set? | | That's easy: | 0x80000004 |
0x80000004 | 2147483652L | ^------------------------'
That didn't meet specs ;-)
That was broken in python < 2.4, though, so there you need to do:
I agree it was broken, but
MASK = 2**32 - 1 0x80000004 & MASK does not solve the problem of doing correctly what it was doing (creating a mask in a signed type int variable, which happened to have the sign bit set). So long as there is a fixed-width int different from long, the problem will reappear.
(assume 32-bit int for target platform, counting bit 0 as LSB and ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ bit 31 as sign). ^^^^^^^^^^^^^^
The 31st bit _isn't_ the sign bit in python and the bit-ness of the target platform doesn't matter. Python's integers are arbitrarily long. I'm not sure why you're trying to pretend as if python was C. Evidently I haven't made myself clear to you, and your mind reading wrt what I am trying to pretend is definitely flawed (and further speculations along that line are likely to be OT ;-)
So long as we have a distinction between int and long, IWT int will be fixed width for any given implementation, and for interfacing with foreign functions it will continue to be useful at times to limit the type of arguments being passed. To do this arms-length C argument type control, it may be important to have constants of int type, knowing what that means on a given platform, and therefore _nice_ to be able to define them directly, understanding full well all the issues, and that there are workarounds ;-) Whatever the fixed width of int, ISTM we'll have predictable type promotion effects such as
width=32 -1*2**(width-2)*2 -2147483648 vs -1*2**(width-1) -2147483648L
and
hex(-sys.maxint-1) '-0x80000000' (-int(hex(-sys.maxint-1)[1:],16)) == (-sys.maxint-1) True (-int(hex(-sys.maxint-1)[1:],16)) , (-sys.maxint-1) (-2147483648L, -2147483648) type(-int(hex(-sys.maxint-1)[1:],16)) == type(-sys.maxint-1) False type(-int(hex(-sys.maxint-1)[1:],16)) , type(-sys.maxint-1) (
, )
[1] Even though BTW you could well define a sign bit position abstractly for any integer value. E.g., the LSB of the arbitrarily repeated sign bits to the left of any integer in a twos complement representation (which can be well defined abstractly too). Code left as exercise ;-) Bottom line: You haven't shown me an existing way to do "16r80000004" and produce the int ;-) Regards, Bengt Richter
Bengt Richter wrote:
[1] To reduce all this eye-glazing discussion to a simple example, how do people now use hex notation to define an integer bit-mask constant with bits
^^^^^^^
31 and 2 set? |
| That's easy: | 0x80000004 |
0x80000004 | 2147483652L | ^------------------------'
That didn't meet specs ;-)
It sure does: 2147483652L is an integer (a long one); it isn't an int. Regards, Martin
On Thu, 02 Feb 2006 23:46:00 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Bengt Richter wrote:
[1] To reduce all this eye-glazing discussion to a simple example, how do people now use hex notation to define an integer bit-mask constant with bits
^^^^^^^
31 and 2 set? |
| That's easy: | 0x80000004 |
0x80000004 | 2147483652L | ^------------------------'
That didn't meet specs ;-)
It sure does: 2147483652L is an integer (a long one); it isn't an int. Aw, shux, dang. I didn't say what I meant ;-/ Apologies to James & all 'round. s/integer/int/ in the above.
Regards, Bengt Richter
On Feb 2, 2006, at 10:36 PM, Bengt Richter wrote:
So long as we have a distinction between int and long, IWT int will be fixed width for any given implementation, and for interfacing with foreign functions it will continue to be useful at times to limit the type of arguments being passed.
We _don't_ have a distinction in any meaningful way, anymore. ints and longs are almost always treated exactly the same, other than the "L" suffix. I expect that suffix will soon go away as well. If there is code that _doesn't_ treat them the same, there is the bug. We don't need strange new syntax to work around buggy code. Note that 10**14/10**13 is also a long, yet any interface that did not accept that as an argument but did accept "10" is simply buggy. Same goes for code that says it takes a 32-bit bitfield argument but won't accept 0x80000000. James
On Thu, 2 Feb 2006 20:39:01 -0500, James Y Knight
On Feb 2, 2006, at 10:36 PM, Bengt Richter wrote:
So long as we have a distinction between int and long, IWT int will be fixed width for any given implementation, and for interfacing with foreign functions it will continue to be useful at times to limit the type of arguments being passed.
We _don't_ have a distinction in any meaningful way, anymore. ints Which will disappear, "int" or "long"? Or both in favor of "integer"? What un-"meaningful" distinction(s) are you hedging your statement about? ;-)
and longs are almost always treated exactly the same, other than the "L" suffix. I expect that suffix will soon go away as well. If there is code that _doesn't_ treat them the same, there is the bug. We If you are looking at them in C code receiving them as args in a call, "treat them the same" would have to mean provide code to coerce long->int or reject it with an exception, IWT. This could be a performance issue that one might like to control by calling strictly with int args, or even an implementation restriction due to lack of space on some microprocessor for unnecessary general coercion code.
don't need strange new syntax to work around buggy code. It's not a matter of "buggy" if you are trying to optimize. (I am aware of premature optimization issues, and IMO "strange" is in the eye of the beholder. What syntax would you suggest? I am not married to any particular syntax, just looking for expressive control over what my programs will do ;-)
Note that 10**14/10**13 is also a long, yet any interface that did not accept that as an argument but did accept "10" is simply buggy.
def foo(i): assert isinstance(i, int); ... # when this becomes illegal, yes.
Same goes for code that says it takes a 32-bit bitfield argument but won't accept 0x80000000. If the bitfield is signed, it can't, unless you are glossing over an assumed coercion rule.
int(0x80000000) 2147483648L int(-0x80000000) -2147483648
BTW, I am usually on the pure-abstraction-view side of discussions ;-) Noticing-kindling-is-wet-and-about-out-of-matches-ly, Regards, Bengt Richter
Bengt Richter wrote:
If you are looking at them in C code receiving them as args in a call, "treat them the same" would have to mean provide code to coerce long->int or reject it with an exception, IWT.
The typical way of processing incoming ints in C is through PyArg_ParseTuple, which already has the code to coerce long->int (which in turn may raise an exception for a range violation). So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.
It's not a matter of "buggy" if you are trying to optimize. (I am aware of premature optimization issues, and IMO "strange" is in the eye of the beholder. What syntax would you suggest?
The question is: what is the problem you are trying to solve? If it is "bit masks", then consider the problem solved already.
Same goes for code that says it takes a 32-bit bitfield argument but won't accept 0x80000000.
If the bitfield is signed, it can't, unless you are glossing over an assumed coercion rule.
Just have a look at the 'k' specifier in PyArg_ParseTuple. Regards, Martin
On Fri, 03 Feb 2006 19:56:20 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
Bengt Richter wrote:
If you are looking at them in C code receiving them as args in a call, "treat them the same" would have to mean provide code to coerce long->int or reject it with an exception, IWT.
The typical way of processing incoming ints in C is through PyArg_ParseTuple, which already has the code to coerce long->int (which in turn may raise an exception for a range violation).
So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4. Ok, I'll take your word that 'k' coercion takes no significant time for longs vs ints. I thought there might be a case in a hot loop where it could make a difference. I confess not having done a C extension since I wrote one to access RDTSC quite some time ago.
It's not a matter of "buggy" if you are trying to optimize. (I am aware of premature optimization issues, and IMO "strange" is in the eye of the beholder. What syntax would you suggest?
The question is: what is the problem you are trying to solve? If it is "bit masks", then consider the problem solved already.
Well, I was visualizing having a homogeneous bunch of bit mask definitions all as int type if they could fit. I can't express them all in hex as literals without some processing. That got me started ;-) Not that some one-time processing at module import time is a big deal. Just that it struck me as a wart not to be able to do it without processing, even if constant folding is on the way.
Same goes for code that says it takes a 32-bit bitfield argument but won't accept 0x80000000.
If the bitfield is signed, it can't, unless you are glossing over an assumed coercion rule.
Just have a look at the 'k' specifier in PyArg_ParseTuple.
Ok, well that's the provision for the coercion then. BTW, is long mandatory for all implementations? Is there a doc that defines minimum features for a conforming Python implementation? E.g., IIRC Scheme has a list naming what's optional and not. Regards, Bengt Richter
Bengt Richter wrote:
The typical way of processing incoming ints in C is through PyArg_ParseTuple, which already has the code to coerce long->int (which in turn may raise an exception for a range violation).
So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.
Ok, I'll take your word that 'k' coercion takes no significant time for longs vs ints.
I didn't say that 'k' takes no significant time for longs vs ints. In fact, I did not make any performance claims. I don't know what the relative performance is.
Well, I was visualizing having a homogeneous bunch of bit mask definitions all as int type if they could fit. I can't express them all in hex as literals without some processing. That got me started ;-)
I still can't see *why* you want to do that. Just write them as hex literals the way you expect it to work, and it typically will work just fine. Some of these literals are longs, some are ints, but there is no need to worry about this. It will all work just fine.
BTW, is long mandatory for all implementations? Is there a doc that defines minimum features for a conforming Python implementation?
The Python language reference is typically considered as a specification of what Python is. There is no "minimal Python" specification: you have to do all of it. Regards, Martin
Bengt Richter wrote:
The typical way of processing incoming ints in C is through PyArg_ParseTuple, which already has the code to coerce long->int (which in turn may raise an exception for a range violation).
So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.
Ok, I'll take your word that 'k' coercion takes no significant time for longs vs ints.
I didn't say that 'k' takes no significant time for longs vs ints. In fact, I did not make any performance claims. I don't know what the relative performance is. Sorry, I apologize for putting words in your mouth.
Well, I was visualizing having a homogeneous bunch of bit mask definitions all as int type if they could fit. I can't express them all in hex as literals without some processing. That got me started ;-)
I still can't see *why* you want to do that. Just write them as hex literals the way you expect it to work, and it typically will work just fine. Some of these literals are longs, some are ints, but there is no need to worry about this. It will all work just fine. Perhaps it's mostly aesthetics. Imagine that I was a tile-setter and my supplier had an order form where I could order square glazed tiles in various colors with dimensions in multiples of 4cm, and I said that I was very happy with the product, except why does the supplier have to send stretchable plastic tiles whenever I order the 32cm size, when I know
On Sat, 04 Feb 2006 11:11:08 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
-2147483648 -2147483648 but -0x80000000 -2147483648L int(-0x80000000) -2147483648 ;-) That minus seems to bind differently in different literal dialects, e.g. to make the point clearer, compare with above: -2147483648 -2147483648 -(2147483648) -2147483648L
BTW, is long mandatory for all implementations? Is there a doc that defines minimum features for a conforming Python implementation?
The Python language reference is typically considered as a specification of what Python is. There is no "minimal Python" specification: you have to do all of it.
Good to know, thanks. Sorry to go OT. If someone wants to add something about supersetting and pypy's facilitation of same, I guess that belongs in another thread ;-) Regards, Bengt Richter
bokr@oz.net (Bengt Richter) wrote:
Martin v. Lowis
wrote: Bengt Richter wrote:
The typical way of processing incoming ints in C is through PyArg_ParseTuple, which already has the code to coerce long->int (which in turn may raise an exception for a range violation).
So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.
Ok, I'll take your word that 'k' coercion takes no significant time for longs vs ints.
I didn't say that 'k' takes no significant time for longs vs ints. In fact, I did not make any performance claims. I don't know what the relative performance is.
Sorry, I apologize for putting words in your mouth.
In regards to the aesthetics and/or inconsistancies of:
-0x80000000 -2147483648L -2147483648 -2147483648 -(2147483648) -2147483648L
1. If your Python code distinguishes between ints and longs, it has a bug. 2. If your C extension to Python isn't using the 'k' format specifier as Martin is telling you to, then your C extension has a bug. 3. If you are concerned about *potential* performance degredation due to a use of 'k' rather than 'i' or 'I', then you've forgotten the fact that Python function calling is orders of magnitude slower than the minimal bit twiddling that PyInt_AsUnsignedLongMask() or PyLong_AsUnsignedLongMask() has to do. Please, just use 'k' and let the list get past this. - Josiah
On Sun, 05 Feb 2006 09:38:35 -0800, Josiah Carlson
bokr@oz.net (Bengt Richter) wrote:
Martin v. Lowis
wrote: Bengt Richter wrote:
The typical way of processing incoming ints in C is through PyArg_ParseTuple, which already has the code to coerce long->int (which in turn may raise an exception for a range violation).
So for typical C code, 0x80000004 is a perfect bit mask in Python 2.4.
Ok, I'll take your word that 'k' coercion takes no significant time for longs vs ints.
I didn't say that 'k' takes no significant time for longs vs ints. In fact, I did not make any performance claims. I don't know what the relative performance is.
Sorry, I apologize for putting words in your mouth.
In regards to the aesthetics and/or inconsistancies of:
-0x80000000 -2147483648L -2147483648 -2147483648 -(2147483648) -2147483648L
1. If your Python code distinguishes between ints and longs, it has a bug.
Are you just lecturing me personally (in which case off list would be more appropriate), or do you include the authors of the 17 files I count under <some prefix>/Lib that have isinstance(<something>, int) in them? Or would you like to rephrase that with suitable qualifications? ;-)
2. If your C extension to Python isn't using the 'k' format specifier as Martin is telling you to, then your C extension has a bug.
I respect Martin's expert knowledge and manner of communication. He said, "Just have a look at the 'k' specifier in PyArg_ParseTuple." Regards, Bengt Richter
On 2/5/06, Bengt Richter
On Sun, 05 Feb 2006 09:38:35 -0800, Josiah Carlson
wrote: 1. If your Python code distinguishes between ints and longs, it has a bug. Are you just lecturing me personally (in which case off list would be more appropriate), or do you include the authors of the 17 files I count under <some prefix>/Lib that have isinstance(<something>, int) in them?
Josiah is correct, and those modules all have bugs. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
bokr@oz.net (Bengt Richter) wrote:
Are you just lecturing me personally (in which case off list would be more appropriate), or do you include the authors of the 17 files I count under <some prefix>/Lib that have isinstance(<something>, int) in them? Or would you like to rephrase that with suitable qualifications? ;-)
I did not mean to sound like I was lecturing you personally. Without taking a peek at the source, I would guess that the various uses of isinstance(<something>, int) are bugs, possibly replacing previous uses of type(<something>) is int, shortly after int subclassing was allowed. But that's just a guess. - Josiah
On Sun, 5 Feb 2006 18:08:58 -0800, Guido van Rossum
On 2/5/06, Bengt Richter
wrote: On Sun, 05 Feb 2006 09:38:35 -0800, Josiah Carlson
wrote: 1. If your Python code distinguishes between ints and longs, it has a bug. Are you just lecturing me personally (in which case off list would be more appropriate), or do you include the authors of the 17 files I count under <some prefix>/Lib that have isinstance(<something>, int) in them?
Josiah is correct, and those modules all have bugs.
It seems I stand incontestably corrected. Sorry, both ways ;-/ Perhaps I missed a py3k assumption in this thread (where I see in the PEP that "Remove distinction between int and long types" is core item number one)? I googled, but could not find that isinstance(<something>,int) was slated for deprecation, so I assumed that Josiah's absolute statement "1. ..." (above) could not be absolutely true, at least in the "has" (present) tense that he used. Is PEP 237 phase C to be implemented sooner than py3k, making isinstance(<something>, int) a transparently distinction-hiding alias for isinstance(<something>, integer), or outright illegal? IOW, will isinstance(<something>, int) be _guaranteed_ to be a bug, thus requiring code change? If so, when? Regards, Bengt Richter
On Sun, 05 Feb 2006 18:47:13 -0800, Josiah Carlson
bokr@oz.net (Bengt Richter) wrote:
Are you just lecturing me personally (in which case off list would be more appropriate), or do you include the authors of the 17 files I count under <some prefix>/Lib that have isinstance(<something>, int) in them? Or would you like to rephrase that with suitable qualifications? ;-)
I did not mean to sound like I was lecturing you personally.
Without taking a peek at the source, I would guess that the various uses of isinstance(<something>, int) are bugs, possibly replacing previous uses of type(<something>) is int, shortly after int subclassing was allowed. But that's just a guess.
Thank you. I didn't look either, but I did notice that most (but not all) of them were under <some prefix>/Lib/test/. Maybe it's excusable for test code ;-) Regards, Bengt Richter
On Mon, Feb 06, 2006 at 05:33:57AM +0000, Bengt Richter wrote:
Perhaps I missed a py3k assumption in this thread (where I see in the PEP that "Remove distinction between int and long types" is core item number one)?
http://www.python.org/peps/pep-0237.html -- an ungoing process, not a
Py3K-eventual one.
--
Thomas Wouters
On Mon, 6 Feb 2006 09:05:01 +0100, Thomas Wouters
On Mon, Feb 06, 2006 at 05:33:57AM +0000, Bengt Richter wrote:
Perhaps I missed a py3k assumption in this thread (where I see in the PEP that "Remove distinction between int and long types" is core item number one)?
http://www.python.org/peps/pep-0237.html -- an ungoing process, not a Py3K-eventual one.
Thanks, I noticed. Hence my question following what you quote: """ Is PEP 237 phase C to be implemented sooner than py3k, making isinstance(<something>, int) a transparently distinction-hiding alias for isinstance(<something>, integer), or outright illegal? IOW, will isinstance(<something>, int) be _guaranteed_ to be a bug, thus requiring code change? If so, when? """ Sorry that my paragraph-packing habit tends to bury things. I'll have to work on that ;-/ Regards, Bengt Richter
On 2/6/06, Bengt Richter
Is PEP 237 phase C to be implemented sooner than py3k, making isinstance(<something>, int) a transparently distinction-hiding alias for isinstance(<something>, integer), or outright illegal? IOW, will isinstance(<something>, int) be _guaranteed_ to be a bug, thus requiring code change? If so, when?
Probably not before Python 3.0. Until then, int and long will be distinct types for backwards compatibilty reasons. But we want as much code as possible to treat longs the same as ints, hence the party line that (barring attenuating circumstances :-) isinstance(x, int) is a bug if the code doesn't also have a similar case for long. If you find standard library code (in Python *or* C!) that treats int preferentially, please submit a patch or bug. What we should do in 3.0 is not entirely clear to me. It would be nice if there was only a single type (named 'int', of course) with two run-time representations, one similar to the current int and one similar to the current long. But that's not so easy, and somewhat contrary to the philosophy that differences in (C-level) representation are best distinguisghed by looking at the type of an object. The next most likely solution is to make long a subclass of int, or perhaps to make int an abstract base class with two subclasses, short and long. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On 2/6/06, Guido van Rossum
What we should do in 3.0 is not entirely clear to me. It would be nice if there was only a single type (named 'int', of course) with two run-time representations, one similar to the current int and one similar to the current long. But that's not so easy, and somewhat contrary to the philosophy that differences in (C-level) representation are best distinguisghed by looking at the type of an object. The next most likely solution is to make long a subclass of int, or perhaps to make int an abstract base class with two subclasses, short and long.
Essentially, you need to decide: does type(x) mostly refer to the protocol that x respects ("interface" plus semantics and pragmatics), or to the underlying implementation? If the latter, as your observation about "the philosophy" suggests, then it would NOT be nice if int was an exception wrt other types. If int is to be a concrete type, then I'd MUCH rather it didn't get subclassed, for all sorts of both pratical and principled reasons. So, to me, the best solution would be the abstract base class with concrete implementation subclasses. Besides being usable for isinstance checks, like basestring, it should also work as a factory when called, returning an instance of the appropriate concrete subclass. AND it would let me have (part of) what I was pining for a while ago -- an abstract base class that type gmpy.mpz can subclass to assert "I _am_ an integer type!", so lists will accept mpz instances as indices, etc etc. Now consider how nice it would be, on occasion, to be able to operate on an integer that's guaranteed to be 8, 16, 32, or 64 bits, to ensured the desired shifting/masking behavior for certain kinds of low-level programming; and also on one that's unsigned, in each of these sizes. Python could have a module offering signed8, unsigned16, and so forth (all combinations of size and signedness supported by the underlying C compiler), all subclassing the abstract int, and guarantee much happiness to people who are, for example, writing a Python prototype of code that's going to become C or assembly... Similarly, it would help a slightly different kind of prototyping a lot if another Python module could offer 32-bit, 64-bit, 80-bit and 128-bit floating point types (if supported by the underlying C compiler) -- all subclassing an ABSTRACT 'float'; the concrete implementation that one gets by calling float or using a float literal would also subclass it... and so would the decimal type (why not? it's floating point -- 'float' doesn't mean 'BINARY fp';-). And I'd be happy, because gmpy.mpf could also subclass the abstract float! And then finally we could have an abstract superclass 'number', whose subclasses are the abstract int and the abstract float (dunno 'bout complex, I'd be happy either way), and Python's typesystem would finally start being nice and cleanly organized instead of grand-prarie-level flat ...!-) Alex
On 2/6/06, Alex Martelli
Now consider how nice it would be, on occasion, to be able to operate on an integer that's guaranteed to be 8, 16, 32, or 64 bits, to ensured the desired shifting/masking behavior for certain kinds of low-level programming; and also on one that's unsigned, in each of these sizes. Python could have a module offering signed8, unsigned16, and so forth (all combinations of size and signedness supported by the underlying C compiler), all subclassing the abstract int, and guarantee much happiness to people who are, for example, writing a Python prototype of code that's going to become C or assembly...
I dearly hope such types do NOT subclass abstract int. The reason is that although they can represent an integral value they do not behave like one. Approximately half of all possible float values are integral, but would you want it to subclass abstract int when possible? Of course not, the behavior is vastly different, and any function doing more than just comparing to it would have to convert it to the true int type before use it. I see little point for more than one integer type. long behaves properly like an integer in all cases I can think of, with the long exception of performance. And given that python tends to be orders of magnitudes slower than C code there is little desire to trade off functionality for performance. That we have two integer types is more of a historical artifact than a consious decision. We may not be willing to trade off functionality for performance, but once we've already made the tradeoff we're reluctant to go back. So it seems the challenge is this: can anybody patch long to have performance sufficiently close to int for small numbers? -- Adam Olsen, aka Rhamphoryncus
(I'm shedding load; cleaning up my inbox in preparation for moving on
to Py3K. I'll try to respond to some old mail in the process.)
On 2/6/06, Alex Martelli
Essentially, you need to decide: does type(x) mostly refer to the protocol that x respects ("interface" plus semantics and pragmatics), or to the underlying implementation? If the latter, as your observation about "the philosophy" suggests, then it would NOT be nice if int was an exception wrt other types.
If int is to be a concrete type, then I'd MUCH rather it didn't get subclassed, for all sorts of both pratical and principled reasons. So, to me, the best solution would be the abstract base class with concrete implementation subclasses. Besides being usable for isinstance checks, like basestring, it should also work as a factory when called, returning an instance of the appropriate concrete subclass.
I like this approach, and I'd like to make it happen. (Not tomorrow. :-)
AND it would let me have (part of) what I was pining for a while ago -- an abstract base class that type gmpy.mpz can subclass to assert "I _am_ an integer type!", so lists will accept mpz instances as indices, etc etc.
I'm still dead set against this. Using type checks instead of interface checks is too big a deviation from the language's philosophy. It would be the end of duck typing as we know it! Using __index__ makes much more sense to me.
Now consider how nice it would be, on occasion, to be able to operate on an integer that's guaranteed to be 8, 16, 32, or 64 bits, to ensured the desired shifting/masking behavior for certain kinds of low-level programming; and also on one that's unsigned, in each of these sizes. Python could have a module offering signed8, unsigned16, and so forth (all combinations of size and signedness supported by the underlying C compiler), all subclassing the abstract int, and guarantee much happiness to people who are, for example, writing a Python prototype of code that's going to become C or assembly...
Why should these have to subclass int? They behave quite differently! I still don't see the incredible value of such types compared to simply doing standard arithmetic and adding "& 0xFF" or "& 0xFFFF" at the end, etc. (Slightly more complicated for signed arithmetic, but who really wants signed clipped arithmetic except if you're simulating a microprocessor?) You can write these things in Python 2.5, and as long as they implement __index__ and do their own mixed-mode arithmetic when combined with regular int or long, all should well. (BTW a difficult design choice may be: if an int8 and an int meet, should the result be an int8 or an int?)
Similarly, it would help a slightly different kind of prototyping a lot if another Python module could offer 32-bit, 64-bit, 80-bit and 128-bit floating point types (if supported by the underlying C compiler) -- all subclassing an ABSTRACT 'float'; the concrete implementation that one gets by calling float or using a float literal would also subclass it... and so would the decimal type (why not? it's floating point -- 'float' doesn't mean 'BINARY fp';-). And I'd be happy, because gmpy.mpf could also subclass the abstract float!
I'd like concrete indications that the implementation of such a module runs into serious obstacles with the current approach. I'm not aware of any, apart from the occasional isinstance(x, float) check in the standard library. If that's all you're fighting, perhaps those occurrences should be fixed? They violate duck typing.
And then finally we could have an abstract superclass 'number', whose subclasses are the abstract int and the abstract float (dunno 'bout complex, I'd be happy either way), and Python's typesystem would finally start being nice and cleanly organized instead of grand-prarie-level flat ...!-)
I think you can have families of numbers separate from subclassing relationships. I'm not at all sure that subclassing doesn't create more problems than it solves here. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (42)
-
"Martin v. Löwis"
-
Aahz
-
Adam Olsen
-
Alex Martelli
-
Andrew Bennetts
-
Andrew Koenig
-
Barry Warsaw
-
Bob Ippolito
-
bokr@oz.net
-
Brett Cannon
-
Donovan Baarda
-
Facundo Batista
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Gareth McCaughan
-
Gisle Aas
-
Guido van Rossum
-
Gustavo J. A. M. Carneiro
-
Ian Bicking
-
Jack Diederich
-
James Y Knight
-
Jason Orendorff
-
Jean-Paul Calderone
-
Jeremy Hylton
-
Josiah Carlson
-
M.-A. Lemburg
-
mattheww@chiark.greenend.org.uk
-
Michael Hudson
-
Neal Norwitz
-
Nick Coghlan
-
Nick Craig-Wood
-
Paul Svensson
-
Raymond Hettinger
-
Raymond Hettinger
-
Scott David Daniels
-
Shane Holloway (IEEE)
-
skip@pobox.com
-
Stephen J. Turnbull
-
Steve Holden
-
Steven Bethard
-
Thomas Wouters
-
Tim Peters