
Fredrik bug report made me dive a little deeper into compares and contains tests. Here is a snapshot of what my current version does:
'1' == None 0 u'1' == None 0 '1' == 'aäöü' 0 u'1' == 'aäöü' Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
'1' in ('a', None, 1) 0 u'1' in ('a', None, 1) 0 '1' in (u'aäöü', None, 1) 0 u'1' in ('aäöü', None, 1) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
The decoding errors occur because 'aäöü' is not a valid UTF-8 string (Unicode comparisons coerce both arguments to Unicode by interpreting normal strings as UTF-8 encodings of Unicode). Question: is this behaviour acceptable or should I go even further and mask decoding errors during compares and contains tests too ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

Fredrik bug report made me dive a little deeper into compares and contains tests.
Here is a snapshot of what my current version does:
'1' == None 0 u'1' == None 0 '1' == 'aäöü' 0 u'1' == 'aäöü' Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
'1' in ('a', None, 1) 0 u'1' in ('a', None, 1) 0 '1' in (u'aäöü', None, 1) 0 u'1' in ('aäöü', None, 1) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
The decoding errors occur because 'aäöü' is not a valid UTF-8 string (Unicode comparisons coerce both arguments to Unicode by interpreting normal strings as UTF-8 encodings of Unicode).
Question: is this behaviour acceptable or should I go even further and mask decoding errors during compares and contains tests too ?
I think this is right -- I expect it will catch more errors than it will cause. This made me go out and see what happens if you compare a numeric class instance (one that defines __int__) to another int -- it doesn't even call the __int__ method! This should be fixed in 1.7 when we do the smart comparisons and rich coercions (or was it the other way around? :-). --Guido van Rossum (home page: http://www.python.org/~guido/)

Random thought (hopefully more sensible than my last one): Would it make sense in P3K to keep using '/' for CS-style division (int/int -> rounded-down-int), and to introduce '�' for math-style division (int�int -> float-when-necessary)? Greg

Random thought (hopefully more sensible than my last one):
Would it make sense in P3K to keep using '/' for CS-style division (int/int -> rounded-down-int), and to introduce 'ö' for math-style division (intöint -> float-when-necessary)?
Careful with your character sets there... The symbol you typed looks like a lowercase o with dieresis to me. :-( Assuming you're proposing something like this: . --- . I'm not so sure that choosing a non-ASCII symbol is going to work. For starters, it's on very few keyboards, and that won't change soon! In the past we've talked about using // for integer division and / for regular (int/int->float) division. This would mean that we have to introduce // now as an alias for /, and encourage people to use it for int division (only); then in 1.7 using / between ints will issue a compatibility warning, and in Py3K int/int will yield a float. It's still going to be painful, though. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Fredrik bug report made me dive a little deeper into compares and contains tests.
Here is a snapshot of what my current version does:
'1' == None 0 u'1' == None 0 '1' == 'aäöü' 0 u'1' == 'aäöü' Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
'1' in ('a', None, 1) 0 u'1' in ('a', None, 1) 0 '1' in (u'aäöü', None, 1) 0 u'1' in ('aäöü', None, 1) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data
The decoding errors occur because 'aäöü' is not a valid UTF-8 string (Unicode comparisons coerce both arguments to Unicode by interpreting normal strings as UTF-8 encodings of Unicode).
Question: is this behaviour acceptable or should I go even further and mask decoding errors during compares and contains tests too ?
I think this is right -- I expect it will catch more errors than it will cause.
Ok, I'll only mask the TypeErrors then. (UnicodeErrors are subclasses of ValueErrors and thus do not get masked.)
This made me go out and see what happens if you compare a numeric class instance (one that defines __int__) to another int -- it doesn't even call the __int__ method! This should be fixed in 1.7 when we do the smart comparisons and rich coercions (or was it the other way around? :-).
Not sure ;-) I think both go hand in hand. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

gvwilson@nevex.com wrote:
Random thought (hopefully more sensible than my last one):
Would it make sense in P3K to keep using '/' for CS-style division (int/int -> rounded-down-int), and to introduce '÷' for math-style division (int÷int -> float-when-necessary)?
where's the ÷ key? (oh, look, my PC keyboard has one. but if I press it, I get a /. hmm...) </F>

Assuming you're proposing something like this:
. --- .
I'm not so sure that choosing a non-ASCII symbol is going to work. For starters, it's on very few keyboards, and that won't change soon!
I realize that, but neither are many of the accented characters used in non-English names (said the Canadian). If we assume 18-24 months until P3K, will it be safe to assume support for non-7-bit characters, or will we continue to be constrained by what was available on PDP-11's in 1975? (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', but harder to track down, since you'll have to scrutinize values very carefully to spot the difference. Haven't done any field tests, though...) Greg

On Tue, 4 Apr 2000 gvwilson@nevex.com wrote:
(BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', but harder to track down, since you'll have to scrutinize values very carefully to spot the difference. Haven't done any field tests, though...)
My favourite symbol for integer division is _/ (read it as "floor-divide"). It makes visually apparent what is going on. -- ?!ng "There's no point in being grown up if you can't be childish sometimes." -- Dr. Who --KAC01325.954869821/skuld.lfw.org-- --KAD01325.954869821/skuld.lfw.org--

"gvwilson" == <gvwilson@nevex.com> writes:
gvwilson> If we assume 18-24 months until P3K, will it be safe to gvwilson> assume support for non-7-bit characters, or will we gvwilson> continue to be constrained by what was available on gvwilson> PDP-11's in 1975? Undoubtedly.

On 04 April 2000, Ka-Ping Yee said:
On Tue, 4 Apr 2000 gvwilson@nevex.com wrote:
(BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', but harder to track down, since you'll have to scrutinize values very carefully to spot the difference. Haven't done any field tests, though...)
My favourite symbol for integer division is _/ (read it as "floor-divide"). It makes visually apparent what is going on.
Gaackk! Why is this even an issue? As I recall, Pascal got it right 30 years ago: / is what you learned in grade school (1/2 = 0.5), div is what you learn in first-year undergrad CS (1/2 = 0). Either add a "div" operator or a "div()" builtin to Python and you take care of the spelling issue. (The fixing-old-code issue is another problem entirely.) I think that means I favour keeping operator.div and the __div__() method as-is, and adding operator.fdiv (?) and __fdiv__ for "floating-point" division. In other words: 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 (where I have used artistic license in applying __div__ to actual numbers -- you know what I mean). -1 on adding any non-7-bit-ASCII characters to the character set required to express Python; +0 on allowing any (alphanumeric) Unicode character in identifiers (all for Py3k). Not sure what "alphanumeric" means in Unicode, but I'm sure someone has worried about this. Greg

On Wed, 5 Apr 2000, Greg Ward wrote:
Gaackk! Why is this even an issue? As I recall, Pascal got it right 30 years ago: / is what you learned in grade school (1/2 = 0.5)
Greg, here's an easy way for you to make money: sue your grade school <wink>. I learned that 1/2 is 1/2. Rationals are a much more natural entities then decimals (just think 1/3). FWIW, I think Python should support Rationals, and have integer division return a rational. I'm still working on the details of my great Python numeric tower change.
Not sure what "alphanumeric" means in Unicode, but I'm sure someone has worried about this.
I think Unicode has a clear definition of a letter and a number. How do you feel about letting arbitrary Unicode whitespace into Python? (Other then the indentation of non-empty lines <wink>) -- Moshe Zadka <mzadka@geocities.com>. http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com

FWIW, I think Python should support Rationals, and have integer division return a rational. I'm still working on the details of my great Python numeric tower change.
Forget it. ABC did this, and the problem is that where you *think* you are doing something simple like calculating interest rates, you are actually manipulating rational numbers with 1000s of digits in their numerator and denumerator. If you want to change it, consider emulating what kids currently use in school: a decimal floating point calculator with N digits of precision. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Moshe]
FWIW, I think Python should support Rationals, and have integer division return a rational. I'm still working on the details of my great Python numeric tower change.
[Guido]
Forget it. ABC did this, and the problem is that where you *think* you are doing something simple like calculating interest rates, you are actually manipulating rational numbers with 1000s of digits in their numerator and denumerator.
Let's not be too hasty about this, cuz I doubt we'll get to change it twice <wink>. You (Guido) & I agreed that ABC's rationals didn't work out well way back when, but a) That has not been my experience in other languages -- ABC was unique. b) Presumably ABC did usability studies that concluded rationals were least surprising. c) TeachScheme! seems delighted with their use of rationals (indeed, one of TeachScheme!'s primary authors beat up on me in email for Python not doing this). d) I'd much rather saddle newbies with time & space surprises than correctness surprises. Last week I took some time to stare at the ABC manual again, & suspect I hit on the cause: ABC was *aggressively* rational. That is, ABC had no notation for floating point (ABC "approximate") literals; even 6.02e23 was taken to mean "exact rational". In my experience ABC was unique this way, and uniquely surprising for it: it's hard to be surprised by 2/3 returning a rational, but hard not to be surprised by 6.02e23/1.0001e-18 doing so. Give it some thought.
If you want to change it, consider emulating what kids currently use in school: a decimal floating point calculator with N digits of precision.
This is what REXX does, and is very powerful even for experts (assuming the user can, as in REXX, specify N; but that means writing a whole slew of arbitrary-precision math libraries too -- btw, that is doable! e.g., I worked w/ Dave Gillespie on some of the algorithms for his amazing Emacs calc). It will run at best 10x slower than native fp of comparable precision, though, so experts will hate it in the cases they don't love it <0.5 wink>. one-case-where-one-size-doesn't-fit-anyone-ly y'rs - tim

[Greg Ward]
... In other words:
5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667
(where I have used artistic license in applying __div__ to actual numbers -- you know what I mean).
+1 from me provided you can sneak the new keyword past Guido <1/3 wink>.

Tim Peters [tim_one@email.msn.com] wrote:
[Greg Ward]
... In other words:
5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667
(where I have used artistic license in applying __div__ to actual numbers -- you know what I mean).
+1 from me provided you can sneak the new keyword past Guido <1/3 wink>.
+1 from me as well. I spent a little time going through all my code, and looking through Zope as well, and I couldn't find any place I used 'div' as a variable, much less any place I depended on this behaviour, so I don't think my code would break in any odd ways. The only thing I can imagine is some printed text formatting issues. Chris -- | Christopher Petrilli | petrilli@amber.org

On Wed, 5 Apr 2000, Tim Peters wrote:
Last week I took some time to stare at the ABC manual again, & suspect I hit on the cause: ABC was *aggressively* rational. That is, ABC had no notation for floating point (ABC "approximate") literals; even 6.02e23 was taken to mean "exact rational". In my experience ABC was unique this way, and uniquely surprising for it: it's hard to be surprised by 2/3 returning a rational, but hard not to be surprised by 6.02e23/1.0001e-18 doing so.
Ouch. There is definitely place for floats in the numeric tower. It's just that those shouldn't be reached accidentally <0.3 wink>
one-case-where-one-size-doesn't-fit-anyone-ly y'rs - tim
but-in-this-case-two-sizes-do-seem-enough-ly y'rs, Z. -- Moshe Zadka <mzadka@geocities.com>. http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com
participants (10)
-
Barry A. Warsaw
-
Christopher Petrilli
-
Fredrik Lundh
-
Greg Ward
-
Guido van Rossum
-
gvwilson@nevex.com
-
Ka-Ping Yee
-
M.-A. Lemburg
-
Moshe Zadka
-
Tim Peters