For the benefit of those who read this in ASCII, I will include Unicode translations in the following. I prefer code which is readable in ASCII (as PEP-8 suggests) which is one reason that I a little bit dislike the proposal. I had to go to the archives to even read the subject line. Nevertheless, I think that, in the Unicode world, the proposal is sound. The question was asked earlier why the Python int() and float() functions do not allow Greek numbers, when they do allow numbers from many other language character sets. The answer is in the documentation for int():
The numeric literals accepted include the digits 0 to 9 or any Unicode equivalent (code points with the Nd property).
The "Nd" characters are decimal digits of systems which use positional notation (i.e. Arabic numbers). The Greeks used decimal numbers, but used different symbols for one, ten, hundred, thousand, (etc.) and added them together, much like the system of Roman numbers we are familiar with. The int() parser expects Arabic formatted numbers, so it will not correctly interpret other systems of notation. In order to read such numbers, you need to use a parser which was built for them. PEP 313 suggested that a parser for Roman formatted numbers be included in Python, and it was rejected. Several algorithms for reading Roman numbers encoded using ASCII values ['i','v','x','L', (etc.)] have been published. The one I wrote goes a bit further -- it also tries to read the value of unicodedata.numeric() for each character of its input string, and sums them (sort of). It would, therefore convert all of the Greek and other characters mentioned in this thread and return a value for them. If a Greek author followed Roman formatting rules it would return a _correct_ value, too. If, on the other hand, he put a smaller valued digit on the left side of a larger digit, he would probably not appreciate the resulting subtraction.
import romanclass as Roman
g2 = '\U0001015c'
unicodedata.name(g2) 'GREEK ACROPHONIC THESPIAN TWO' g5000 = '\U00010172' unicodedata.name(g5000) 'GREEK ACROPHONIC THESPIAN FIVE THOUSAND' g5002 = g5000 + g2 # string concatenation (not addition) g5002 '\U00010172\U0001015c' Roman.Roman(g5002) Roman(5002) print(Roman.Roman(g5002)) ↁII # but -- since Roman math subtracts values on the left... print(Roman.Roman(g2 + g5000)) MↁCMXCVIII
This is all an unimportant side effect of my attempt to support actual Unicode Roman numbers:
u'\u2167' 'Ⅷ' eight = Roman.Roman(u'\u2167') print(eight + 10) # NOTE: mathematical addition XVIII
This all assumes that we are talking about Acrophonic (or Herodian or Attic) numerals. The Greeks also used Alphabetic (also called Milesian, Alexandrian, or Ionic) numerals. In that system, the value of pi ('\u03c0') is 80 (and has nothing to do with the circumference of a circle.) That usage, however, is not recognized by Unicode:
'\u03c0' 'π' pi = '\u03c0' unicodedata.name(pi) 'GREEK SMALL LETTER PI' unicodedata.numeric(pi) Traceback (most recent call last): File "<pyshell#113>", line 1, in <module> unicodedata.numeric(pi) ValueError: not a numeric character
[ as a complete side note: Greeks pronounce the name of that letter as "pea" not "pie".] That agrees with Unicode's non-recognition of the numeric value of ASCII letters used in Roman numerals:
unicodedata.numeric('X') Traceback (most recent call last): File "<pyshell#114>", line 1, in <module> unicodedata.numeric('X') ValueError: not a numeric character
Any numeric usage requires a definition of how the string is to be parsed:
Roman.Roman('X') Roman(10) float(Roman.Roman('X')) 10.0
So, forget all of this noise about all of the other possible things that could be done with extended definitions of float(). Any of those would require another definition, and another PEP. This proposal is for only one thing -- to make the following happen:
inf = '\u221e' float(inf) inf
Mark me as +0 -- Vernon Cole