stupid floating point question...
A *good* compiler won't collapse *any* fp expressions at compile-time, because doing so can change the 754 semantics at runtime (for example, the evaluation of 1./6 triggers the 754 "inexact" signal, and the compiler has no way to know whether the user is expecting that to happen at runtime, so a good compiler will leave it alone
Of course, that doesn't say anything about what *most* compilers do. For example, gcc, on i586-pc-linux-gnu, compiles double foo(){ return (double)1/6; } into .LC0: .long 0x55555555,0x3fc55555 .text .align 4 .globl foo .type foo,@function foo: fldl .LC0 ret when compiling with -fomit-frame-pointer -O2. That still doesn't say anything about what most compilers do - if there is interest, we could perform a comparative study on the subject :-) The "would break 754" argument is pretty weak, IMO - gcc, for example, doesn't claim to comply to that standard. Regards, Martin
[Tim]
A *good* compiler won't collapse *any* fp expressions at compile-time ...
[Martin von Loewis]
Of course, that doesn't say anything about what *most* compilers do.
Doesn't matter in this case; I told /F not to worry about it having taken that all into account. Almost all C compilers do a piss-poor job of taking floating-point seriously, but it doesn't really matter for the purpose /F has in mind. [an example of gcc precomputing the best possible result]
return (double)1/6; ... .long 0x55555555,0x3fc55555
No problem. If you set the HW rounding mode to +infinity during compilation, the first chunk there would end with a 6 instead. Would affect the tail end of the repr(), but not the str().
... when compiling with -fomit-frame-pointer -O2. That still doesn't say anything about what most compilers do - if there is interest, we could perform a comparative study on the subject :-)
No need.
The "would break 754" argument is pretty weak, IMO - gcc, for example, doesn't claim to comply to that standard.
/F's question was about fp. 754 is the only hope he has for any x-platform consistency (C89 alone gives no hope at all, and no basis for answering his question). To the extent that a C compiler ignores 754, it makes x-platform fp consistency impossible (which, btw, Python inherits from C: we can't even manage to get string<->float working consistently across 100% 754-conforming platforms!). Whether that's a weak argument or not depends entirely on how important x-platform consistency is to a given app. In /F's specific case, a sloppy compiler is "good enough". i'm-the-only-compiler-writer-i-ever-met-who-understood-fp<0.5-wink>-ly y'rs - tim
tim wrote:
Of course, that doesn't say anything about what *most* compilers do.
Doesn't matter in this case; I told /F not to worry about it having taken that all into account. Almost all C compilers do a piss-poor job of taking floating-point seriously, but it doesn't really matter for the purpose /F has in mind.
to make it clear for everyone: I'm planning to get rid of the last remaining switch statement in unicodectype.c ("numerical value"), and replace the doubles in there with rationals. the problem here is that MAL's new test suite uses "str" on the return value from that function, and it would a bit annoying if we ended up with a Unicode test that might fail on platforms with lousy floating point support... ::: on the other hand, I'm not sure I think it's a really good idea to have "numeric" return a floating point value. consider this:
import unicodedata unicodedata.numeric(u"\N{VULGAR FRACTION ONE THIRD}") 0.33333333333333331
(the glyph looks like "1/3", and that's also what the numeric property field in the Unicode database says) ::: if I had access to the time machine, I'd change it to:
unicodedata.numeric(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3)
...but maybe we can add an alternate API that returns the *exact* fraction (as a numerator/denominator tuple)?
unicodedata.numeric2(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3)
(hopefully, someone will come up with a better name) </F>
On Thu, 28 Sep 2000, Fredrik Lundh wrote:
if I had access to the time machine, I'd change it to:
unicodedata.numeric(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3)
...but maybe we can add an alternate API that returns the *exact* fraction (as a numerator/denominator tuple)?
unicodedata.numeric2(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3)
(hopefully, someone will come up with a better name)
unicodedata.rational might be an obvious choice. >>> unicodedata.rational(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3) -- ?!ng
[/F]
...but maybe we can add an alternate API that returns the *exact* fraction (as a numerator/denominator tuple)?
unicodedata.numeric2(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3)
(hopefully, someone will come up with a better name)
[The Ping of Death] LOL! Great name, Ping.
unicodedata.rational might be an obvious choice.
>>> unicodedata.rational(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3)
Perfect -- another great name. Beats all heck out of unicodedata.vulgar() too. leaving-it-up-to-/f-to-decide-what-.rational()-should-return-for-pi- ly y'ts - the timmy of death
tim wrote:
leaving-it-up-to-/f-to-decide-what-.rational()-should-return-for-pi- ly y'ts - the timmy of death
oh, the unicode folks have figured that one out:
unicodedata.numeric(u"\N{GREEK PI SYMBOL}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character
</F>
[tim]
leaving-it-up-to-/f-to-decide-what-.rational()-should-return-for-pi- ly y'ts - the timmy of death
[/F]
oh, the unicode folks have figured that one out:
unicodedata.numeric(u"\N{GREEK PI SYMBOL}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character
Ya, except I'm starting to suspect they're not floating-point experts either:
unicodedata.numeric(u"\N{PLANCK CONSTANT OVER TWO PI}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character unicodedata.numeric(u"\N{EULER CONSTANT}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character unicodedata.numeric(u"\N{AIRSPEED OF AFRICAN SWALLOW}") UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name
Tim Peters wrote:
[tim]
leaving-it-up-to-/f-to-decide-what-.rational()-should-return-for-pi- ly y'ts - the timmy of death
[/F]
oh, the unicode folks have figured that one out:
unicodedata.numeric(u"\N{GREEK PI SYMBOL}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character
Ya, except I'm starting to suspect they're not floating-point experts either:
unicodedata.numeric(u"\N{PLANCK CONSTANT OVER TWO PI}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character unicodedata.numeric(u"\N{EULER CONSTANT}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character unicodedata.numeric(u"\N{AIRSPEED OF AFRICAN SWALLOW}") UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name
Perhaps you should submit these for Unicode 4.0 ;-) But really, I don't suspect that anyone is going to do serious character to number conversion on these esoteric characters. Plain old digits will do just as they always have (or does anyone know of ways to represent irrational numbers on PCs by other means than an algorithm which spits out new digits every now and then ?). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
[Tim]
unicodedata.numeric(u"\N{PLANCK CONSTANT OVER TWO PI}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character unicodedata.numeric(u"\N{EULER CONSTANT}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character unicodedata.numeric(u"\N{AIRSPEED OF AFRICAN SWALLOW}") UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name
[MAL]
Perhaps you should submit these for Unicode 4.0 ;-)
Note that the first two are already there; they just don't have an associated numerical value. The last one was a hint that I was trying to write a frivolous msg while giving my "<wink>" key a break <wink>.
But really, I don't suspect that anyone is going to do serious character to number conversion on these esoteric characters. Plain old digits will do just as they always have ...
Which is why I have to wonder whether there's *any* value in exposing the numeric-value property beyond regular old digits.
[Tim]
Which is why I have to wonder whether there's *any* value in exposing the numeric-value property beyond regular old digits.
Running (in IDLE or PythonWin with a font that covers most of Unicode like Tahoma): import unicodedata for c in range(0x10000): x=unichr(c) try: b = unicodedata.numeric(x) #print "numeric:", repr(x) try: a = unicodedata.digit(x) if a != b: print "bad" , repr(x) except: print "Numeric but not digit", hex(c), x.encode("utf8"), "numeric ->", b except: pass Finds about 130 characters. The only ones I feel are worth worrying about are the half, quarters and eighths (0xbc, 0xbd, 0xbe, 0x215b, 0x215c, 0x215d, 0x215e) which are commonly used for expressing the prices of stocks and commodities in the US. This may be rarely used but it is better to have it available than to have people coding up their own translation tables. The 0x302* 'Hangzhou' numerals look like they should be classified as digits. Neil
Neil Hodgson wrote:
The 0x302* 'Hangzhou' numerals look like they should be classified as digits.
Can't change the Unicode 3.0 database... so even though this might be useful in some contexts lets stick to the standard. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
"NH" == Neil Hodgson <nhodgson@bigpond.net.au> writes:
NH> Finds about 130 characters. The only ones I feel are worth NH> worrying about NH> are the half, quarters and eighths (0xbc, 0xbd, 0xbe, 0x215b, NH> 0x215c, 0x215d, 0x215e) which are commonly used for expressing NH> the prices of stocks and commodities in the US. This may be NH> rarely used but it is better to have it available than to have NH> people coding up their own translation tables. The US no longer uses fraction to report stock prices. Example: http://business.nytimes.com/market_summary.asp LEADERS Last Range Change AMERICAN INDL PPTYS REIT (IND) 14.06 13.56 - 14.06 0.25 / 1.81% R G S ENERGY GROUP INC (RGS) 28.19 27.50 - 28.19 0.50 / 1.81% DRESDNER RCM GLBL STRT INC (DSF) 6.63 6.63 - 6.63 0.06 / 0.95% FALCON PRODS INC (FCP) 9.63 9.63 - 9.88 0.06 / 0.65% GENERAL ELEC CO (GE) 59.00 58.63 - 59.75 0.19 / 0.32% Jeremy
tim wrote:
But really, I don't suspect that anyone is going to do serious character to number conversion on these esoteric characters. Plain old digits will do just as they always have ...
Which is why I have to wonder whether there's *any* value in exposing the numeric-value property beyond regular old digits.
the unicode database has three fields dealing with the numeric value: decimal digit value (integer), digit value (integer), and numeric value (integer *or* rational): "This is a numeric field. If the character has the numeric property, as specified in Chapter 4 of the Unicode Standard, the value of that character is represented with an integer or rational number in this field." here's today's proposal: let's claim that it's a bug to return a float from "numeric", and change it to return a string instead. (this will match "decomposition", which is also "broken" -- it really should return a tag followed by a sequence of unicode characters). </F>
Fredrik Lundh wrote:
tim wrote:
But really, I don't suspect that anyone is going to do serious character to number conversion on these esoteric characters. Plain old digits will do just as they always have ...
Which is why I have to wonder whether there's *any* value in exposing the numeric-value property beyond regular old digits.
the unicode database has three fields dealing with the numeric value: decimal digit value (integer), digit value (integer), and numeric value (integer *or* rational):
"This is a numeric field. If the character has the numeric property, as specified in Chapter 4 of the Unicode Standard, the value of that character is represented with an integer or rational number in this field."
here's today's proposal: let's claim that it's a bug to return a float from "numeric", and change it to return a string instead.
Hmm, how about making the return format an option ? unicodedata.numeric(char, format=('float' (default), 'string', 'fraction'))
(this will match "decomposition", which is also "broken" -- it really should return a tag followed by a sequence of unicode characters).
Same here: unicodedata.decomposition(char, format=('string' (default), 'tuple')) I'd opt for making the API more customizable rather than trying to find the one and only true return format ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Tim Peters wrote:
[Tim]
unicodedata.numeric(u"\N{PLANCK CONSTANT OVER TWO PI}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character unicodedata.numeric(u"\N{EULER CONSTANT}") Traceback (most recent call last): File "<stdin>", line 1, in ? ValueError: not a numeric character unicodedata.numeric(u"\N{AIRSPEED OF AFRICAN SWALLOW}") UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character Name
[MAL]
Perhaps you should submit these for Unicode 4.0 ;-)
Note that the first two are already there; they just don't have an associated numerical value. The last one was a hint that I was trying to write a frivolous msg while giving my "<wink>" key a break <wink>.
That's what I meant: you should submit the numeric values for the first two and opt for addition of the last.
But really, I don't suspect that anyone is going to do serious character to number conversion on these esoteric characters. Plain old digits will do just as they always have ...
Which is why I have to wonder whether there's *any* value in exposing the numeric-value property beyond regular old digits.
It is needed for Unicode 3.0 standard compliance and for whoever wants to use this data. Since the Unicode database explicitly contains fractions, I think adding the .rational() API would make sense to provide a different access method to this data. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
tim wrote:
unicodedata.rational might be an obvious choice.
>>> unicodedata.rational(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3)
Perfect -- another great name. Beats all heck out of unicodedata.vulgar() too.
should I interpret this as a +1, or should I write a PEP on this topic? ;-) </F>
[The Ping of Death suggests unicodedata.rational]
>>> unicodedata.rational(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3)
[Timmy replies]
Perfect -- another great name. Beats all heck out of unicodedata.vulgar() too.
[/F inquires]
should I interpret this as a +1, or should I write a PEP on this topic? ;-)
I'm on vacation (but too ill to do much besides alternate sleep & email <snarl>), and I'm not sure we have clear rules about how votes from commercial Python developers count when made on their own time. Perhaps a meta-PEP first to resolve that issue? Oh, all right, just speaking for myself, I'm +1 on The Ping of Death's name suggestion provided this function is needed at all. But not being a Unicode Guy by nature, I have no opinion on whether the function *is* needed (I understand how digits work in American English, and ord(ch)-ord('0') is the limit of my experience; can't say whether even the current .numeric() is useful for Klingons or Lawyers or whoever it is who expects to get a numeric value out of a character for 1/2 or 1/3).
Tim Peters wrote:
[The Ping of Death suggests unicodedata.rational]
>>> unicodedata.rational(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3)
[Timmy replies]
Perfect -- another great name. Beats all heck out of unicodedata.vulgar() too.
[/F inquires]
should I interpret this as a +1, or should I write a PEP on this topic? ;-)
I'm on vacation (but too ill to do much besides alternate sleep & email <snarl>), and I'm not sure we have clear rules about how votes from commercial Python developers count when made on their own time. Perhaps a meta-PEP first to resolve that issue?
Oh, all right, just speaking for myself, I'm +1 on The Ping of Death's name suggestion provided this function is needed at all. But not being a Unicode Guy by nature, I have no opinion on whether the function *is* needed (I understand how digits work in American English, and ord(ch)-ord('0') is the limit of my experience; can't say whether even the current .numeric() is useful for Klingons or Lawyers or whoever it is who expects to get a numeric value out of a character for 1/2 or 1/3).
The reason for "numeric" being available at all is that the UnicodeData.txt file format specifies such a field. I don't believe anyone will make serious use of it though... e.g. 2² would parse as 22 and not evaluate to 4. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Fredrik Lundh wrote:
tim wrote:
unicodedata.rational might be an obvious choice.
>>> unicodedata.rational(u"\N{VULGAR FRACTION ONE THIRD}") (1, 3)
Perfect -- another great name. Beats all heck out of unicodedata.vulgar() too.
should I interpret this as a +1, or should I write a PEP on this topic? ;-)
+1 from here. I really only chose floats to get all possibilities (digit, decimal and fractions) into one type... Python should support rational numbers some day. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (7)
-
Fredrik Lundh
-
Jeremy Hylton
-
M.-A. Lemburg
-
Martin von Loewis
-
Neil Hodgson
-
The Ping of Death
-
Tim Peters