Re: [Python-Dev] Shouldn't I be able to print Unicode objects?
[MAL, to Skip]
Huh ? That should not be possible ! Python literals are still ASCII.
ümlaut = 'ümlaut' File "<stdin>", line 1 ümlaut = 'ümlaut' ^ SyntaxError: invalid syntax
That was Guido's intent, and what the Ref Man says, but the tokenizer uses C's isalpha() so in reality it's locale-dependent. I think at least one German on Python-Dev has already threatened to kill him if he ever fixes this bug <wink>.
Tim Peters wrote:
[MAL, to Skip]
Huh ? That should not be possible ! Python literals are still ASCII.
ümlaut = 'ümlaut' File "<stdin>", line 1 ümlaut = 'ümlaut' ^ SyntaxError: invalid syntax
That was Guido's intent, and what the Ref Man says, but the tokenizer uses C's isalpha() so in reality it's locale-dependent. I think at least one German on Python-Dev has already threatened to kill him if he ever fixes this bug <wink>.
Wasn't me for sure... even in the Unicode age, I believe that Python source code should maintain readability by not allowing all alpha(numeric) characters for use in identifiers (there are lots of them in Unicode). Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
Barry Scott wrote:
Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?!
If you embrace the world then NO. If America is you world then maybe.
Actually, if we were really going to embrace the world we'd need to handle more than a few European languages! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook
if we were really going to embrace the world we'd need to handle more than a few European languages!
-1 on allowing Kanji in python identifiers. :-( I like to be able to at least imagine some sort of pronunciation for variable names! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
[M.-A. Lemburg]
Wasn't me for sure... even in the Unicode age, I believe that Python source code should maintain readability by not allowing all alpha(numeric) characters for use in identifiers (there are lots of them in Unicode).
Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?!
That's certain to break code, and it's certain that some of those whose code gets broken would scream very loudly about it. OTOH, nobody would come to its defense with a hearty "whew! I'm so glad *that* hole finally got plugged!". I'm sure it would cause less trouble to take away <> as an alternative spelling of != (except that Barry is actually close enough to strangle Guido a few days each week <wink>). Is it worth the hassle? I don't know, but I'd *guess* Guido would rather endure the complaints for something more substantial (like, say, breaking 10 lines of an expert's obscure code that relies on int() being a builtin instead of a class <wink>).
Tim Peters wrote:> > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z'
and 'A'...'Z' ?! (same for digits) ?!
That's certain to break code, and it's certain that some of those whose code gets broken would scream very loudly about it.
I don't get it. If people use non-ascii characters, they're clearly not using Python. from the language reference: ... Python uses the 7-bit ASCII character set for program text and string literals. ... Identifiers (also referred to as names) are described by the following lexical definitions: identifier: (letter|"_") (letter|digit|"_")* letter: lowercase | uppercase lowercase: "a"..."z" uppercase: "A"..."Z" digit: "0"..."9" Identifiers are unlimited in length. Case is significant ... either change the specification, and break every single tool written by anyone who actually bothered to read the specification [1], or add a warning to 2.2. </F> 1) I assume the specification didn't exist when GvR wrote the first CPython implementation ;-)
[/F]
I don't get it. If people use non-ascii characters, they're clearly not using Python. from the language reference:
My *first* reply in this thread said the lang ref required this. That doesn't mean people read the ref. IIRC, you were one of the most strident complainers about list.append(1, 2, 3) "breaking", so just rekindle that mindset but intensify it fueled by nationalism <0.5 wink>.
... either change the specification, and break every single tool written by anyone who actually bothered to read the specification [1], or add a warning to 2.2.
This is up to Guido; doesn't affect my code one way or the other (and, yes, e.g., IDLE's parser follows the manual here).
... 1) I assume the specification didn't exist when GvR wrote the first CPython implementation ;-)
Thanks to the magic of CVS, you can see that the BNF for identifiers has remained unchanged since it was first checked in (Thu Nov 21 13:53:03 1991 rev 1.1 of ref1.tex). The problem is that locale was a new-fangled idea then, and I believe Guido simply didn't anticipate isalpha() and isalnum() would vary across non-EBCDIC platforms.
Paul Prescod wrote:
Barry Scott wrote:
Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?!
If you embrace the world then NO. If America is you world then maybe.
Actually, if we were really going to embrace the world we'd need to handle more than a few European languages!
I was just suggesting to make the parser actually do what the language spec defines. And yes: I don't like non-ASCII identifiers (even though I live in Europe). This is just bound to cause trouble, e.g. people forgetting accents on characters, editors displaying code using wild approximations of what the code author intended to write, etc. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
Tim Peters wrote:
[M.-A. Lemburg]
Wasn't me for sure... even in the Unicode age, I believe that Python source code should maintain readability by not allowing all alpha(numeric) characters for use in identifiers (there are lots of them in Unicode).
Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?!
That's certain to break code, and it's certain that some of those whose code gets broken would scream very loudly about it. OTOH, nobody would come to its defense with a hearty "whew! I'm so glad *that* hole finally got plugged!". I'm sure it would cause less trouble to take away <> as an alternative spelling of != (except that Barry is actually close enough to strangle Guido a few days each week <wink>). Is it worth the hassle? I don't know, but I'd *guess* Guido would rather endure the complaints for something more substantial (like, say, breaking 10 lines of an expert's obscure code that relies on int() being a builtin instead of a class <wink>).
Ok, point taken... still, it's funny sometimes how pydevs are willing to break perfectly valid code in some areas while not considering pointing users to clean up invalid code in other areas. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
On Thu, Jun 07, 2001 at 10:42:40AM +0200, M.-A. Lemburg wrote:
still, it's funny sometimes how pydevs are willing to break perfectly valid code in some areas while not considering pointing users to clean up invalid code in other areas.
Well, I consider myself one of the more backward-oriented people on py-dev (or at least a vocal member of that sub-group ;) and I don't think changing int et al to be types/class-constructors is a problem. People who rely on int being a *function*, rather than being a callable, are either writing a python-specific script, a quick hack, or really, really know what they are getting into. I'm also not terribly worried about the use of non-ASCII characters in identifiers in Python, though a warning for the next one or two releases would be a good thing -- if anything, it should warn that that trick won't work for people with different locale settings! -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
[Thomas Wouters]
... I'm also not terribly worried about the use of non-ASCII characters in identifiers in Python, though a warning for the next one or two releases would be a good thing -- if anything, it should warn that that trick won't work for people with different locale settings!
Fine by me! Someone who cares enough to write the warning code and docs should just do so, although it may be wise to secure Guido's blessing first.
participants (8)
-
Barry Scott
-
Fredrik Lundh
-
Greg Ewing
-
M.-A. Lemburg
-
Paul Prescod
-
Thomas Wouters
-
Tim Peters
-
Tim Peters