For the benefit of those who read this in ASCII, I will include Unicode translations in the following. I prefer code which is readable in ASCII (as PEP-8 suggests) which is one reason that I a little bit dislike the proposal. I had to go to the archives to even read the subject line. Nevertheless, I think that, in the Unicode world, the proposal is sound.
The question was asked earlier why the Python int() and float() functions do not allow Greek numbers, when they do allow numbers from many other language character sets.
The answer is in the documentation for int():
The numeric literals accepted include the digits 0 to 9 or any Unicode equivalent (code points with the Nd property).
The "Nd" characters are decimal digits of systems which use positional notation (i.e. Arabic numbers). The Greeks used decimal numbers, but used different symbols for one, ten, hundred, thousand, (etc.) and added them together, much like the system of Roman numbers we are familiar with.
The int() parser expects Arabic formatted numbers, so it will not correctly interpret other systems of notation. In order to read such numbers, you need to use a parser which was built for them. PEP 313 suggested that a parser for Roman formatted numbers be included in Python, and it was rejected.
Several algorithms for reading Roman numbers encoded using ASCII values ['i','v','x','L', (etc.)] have been published. The one I wrote goes a bit further -- it also tries to read the value of unicodedata.numeric() for each character of its input string, and sums them (sort of). It would, therefore convert all of the Greek and other characters mentioned in this thread and return a value for them. If a Greek author followed Roman formatting rules it would return a _correct_ value, too. If, on the other hand, he put a smaller valued digit on the left side of a larger digit, he would probably not appreciate the resulting subtraction.
import romanclass as Roman
g2 = '\U0001015c'
unicodedata.name(g2) 'GREEK ACROPHONIC THESPIAN TWO' g5000 = '\U00010172' unicodedata.name(g5000) 'GREEK ACROPHONIC THESPIAN FIVE THOUSAND' g5002 = g5000 + g2 # string concatenation (not addition) g5002 '\U00010172\U0001015c' Roman.Roman(g5002) Roman(5002) print(Roman.Roman(g5002)) ↁII
# but -- since Roman math subtracts values on the left...
print(Roman.Roman(g2 + g5000)) MↁCMXCVIII
This is all an unimportant side effect of my attempt to support actual Unicode Roman numbers:
u'\u2167' 'Ⅷ' eight = Roman.Roman(u'\u2167') print(eight + 10) # NOTE: mathematical addition XVIII
This all assumes that we are talking about Acrophonic (or Herodian or Attic) numerals. The Greeks also used Alphabetic (also called Milesian, Alexandrian, or Ionic) numerals. In that system, the value of pi ('\u03c0') is 80 (and has nothing to do with the circumference of a circle.) That usage, however, is not recognized by Unicode:
'\u03c0' 'π' pi = '\u03c0' unicodedata.name(pi) 'GREEK SMALL LETTER PI' unicodedata.numeric(pi) Traceback (most recent call last): File "<pyshell#113>", line 1, in <module> unicodedata.numeric(pi) ValueError: not a numeric character
[ as a complete side note: Greeks pronounce the name of that letter as "pea" not "pie".]
That agrees with Unicode's non-recognition of the numeric value of ASCII letters used in Roman numerals:
unicodedata.numeric('X') Traceback (most recent call last): File "<pyshell#114>", line 1, in <module> unicodedata.numeric('X') ValueError: not a numeric character
Any numeric usage requires a definition of how the string is to be parsed:
Roman.Roman('X') Roman(10) float(Roman.Roman('X')) 10.0
So, forget all of this noise about all of the other possible things that could be done with extended definitions of float(). Any of those would require another definition, and another PEP. This proposal is for only one thing -- to make the following happen:
inf = '\u221e' float(inf) inf
Vernon Cole
суббота, 13 июля 2013 г., 10:32:59 UTC+4 пользователь Vernon D. Cole написал: >
This proposal is for only one thing -- to make the following happen:
inf = '\u221e' float(inf) inf
Exactly. But to be more complete:
float(u'∞'), inf float(u'-∞') -inf
One could go further and make a string '∞' literal in python to refer to infinity. But this can only speak if this proposal makes sense.
On 13/07/13 16:32, Vernon D. Cole wrote:
For the benefit of those who read this in ASCII, I will include Unicode translations in the following. I prefer code which is readable in ASCII
[rant] It's 2013, not 1963. When oh when are we going to catch up with technology that I had on my Mac in 1984???
Oh, another thing... even in 1963, the ASCII standard was obsolete, since you cannot represent standard American characters in common use in 1963 such as ¢. Ironically, you cannot even say "Copyright © 1963 American Standards Association" in ASCII.
I am aware of the reasons for the limitations on ASCII, but its time is long, long gone. It needs to die in peace.
[/rant]
Nice analysis of the Roman numerals issue though. Thanks for that.
(P.S. are you aware that the practice of subtracting Roman numerals on the left was a medieval innovation, and not one that the ancient Romans themselves did?)
-- Steven