Mailman 3 Re: [Python-Dev] Shouldn't I be able to print Unicode objects? - Python-Dev

newer
re: %b format (no, really)

Re: [Python-Dev] Shouldn't I be able to print Unicode objects?

older
re: %b format (no, really)

Tim Peters

5 Jun 2001 5 Jun '01

4:16 p.m.

[MAL, to Skip]

...

Huh ? That should not be possible ! Python literals are still ASCII.

...
...
...
ümlaut = 'ümlaut' File "<stdin>", line 1 ümlaut = 'ümlaut' ^ SyntaxError: invalid syntax

That was Guido's intent, and what the Ref Man says, but the tokenizer uses C's isalpha() so in reality it's locale-dependent. I think at least one German on Python-Dev has already threatened to kill him if he ever fixes this bug <wink>.

Show replies by thread

M.-A. Lemburg

6 Jun 6 Jun

3:03 a.m.

New subject: Shouldn't I be able to print Unicode objects?

Tim Peters wrote:

...

[MAL, to Skip]

...
Huh ? That should not be possible ! Python literals are still ASCII.

...
...
...
ümlaut = 'ümlaut' File "<stdin>", line 1 ümlaut = 'ümlaut' ^ SyntaxError: invalid syntax

That was Guido's intent, and what the Ref Man says, but the tokenizer uses C's isalpha() so in reality it's locale-dependent. I think at least one German on Python-Dev has already threatened to kill him if he ever fixes this bug <wink>.

Wasn't me for sure... even in the Unicode age, I believe that Python source code should maintain readability by not allowing all alpha(numeric) characters for use in identifiers (there are lots of them in Unicode). Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

Barry Scott

7:03 p.m.

New subject: Shouldn't I be able to print Unicode objects?

...

Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?!

If you embrace the world then NO. If America is you world then maybe. Barry

Paul Prescod

7:42 p.m.

New subject: Shouldn't I be able to print Unicode objects?

Barry Scott wrote:

...

...
Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?!

If you embrace the world then NO. If America is you world then maybe.

Actually, if we were really going to embrace the world we'd need to handle more than a few European languages! -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook

Greg Ewing

9:19 p.m.

New subject: Shouldn't I be able to print Unicode objects?

...

if we were really going to embrace the world we'd need to handle more than a few European languages!

-1 on allowing Kanji in python identifiers. :-( I like to be able to at least imagine some sort of pronunciation for variable names! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Tim Peters

7 Jun 7 Jun

12:16 a.m.

New subject: Shouldn't I be able to print Unicode objects?

[M.-A. Lemburg]

...

Wasn't me for sure... even in the Unicode age, I believe that Python source code should maintain readability by not allowing all alpha(numeric) characters for use in identifiers (there are lots of them in Unicode).

Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?!

That's certain to break code, and it's certain that some of those whose code gets broken would scream very loudly about it. OTOH, nobody would come to its defense with a hearty "whew! I'm so glad *that* hole finally got plugged!". I'm sure it would cause less trouble to take away <> as an alternative spelling of != (except that Barry is actually close enough to strangle Guido a few days each week <wink>). Is it worth the hassle? I don't know, but I'd *guess* Guido would rather endure the complaints for something more substantial (like, say, breaking 10 lines of an expert's obscure code that relies on int() being a builtin instead of a class <wink>).

Fredrik Lundh

12:50 a.m.

New subject: Shouldn't I be able to print Unicode objects?

Tim Peters wrote:> > Shouldn't we fix the tokenizer to explicitely check for 'a'...'z'

...

...
and 'A'...'Z' ?! (same for digits) ?!

That's certain to break code, and it's certain that some of those whose code gets broken would scream very loudly about it.

I don't get it. If people use non-ascii characters, they're clearly not using Python. from the language reference: ... Python uses the 7-bit ASCII character set for program text and string literals. ... Identifiers (also referred to as names) are described by the following lexical definitions: identifier: (letter|"_") (letter|digit|"_")* letter: lowercase | uppercase lowercase: "a"..."z" uppercase: "A"..."Z" digit: "0"..."9" Identifiers are unlimited in length. Case is significant ... either change the specification, and break every single tool written by anyone who actually bothered to read the specification [1], or add a warning to 2.2. </F> 1) I assume the specification didn't exist when GvR wrote the first CPython implementation ;-)

Tim Peters

1:15 a.m.

New subject: Shouldn't I be able to print Unicode objects?

[/F]

...

I don't get it. If people use non-ascii characters, they're clearly not using Python. from the language reference:

My *first* reply in this thread said the lang ref required this. That doesn't mean people read the ref. IIRC, you were one of the most strident complainers about list.append(1, 2, 3) "breaking", so just rekindle that mindset but intensify it fueled by nationalism <0.5 wink>.

...

... either change the specification, and break every single tool written by anyone who actually bothered to read the specification [1], or add a warning to 2.2.

This is up to Guido; doesn't affect my code one way or the other (and, yes, e.g., IDLE's parser follows the manual here).

...

... 1) I assume the specification didn't exist when GvR wrote the first CPython implementation ;-)

Thanks to the magic of CVS, you can see that the BNF for identifiers has remained unchanged since it was first checked in (Thu Nov 21 13:53:03 1991 rev 1.1 of ref1.tex). The problem is that locale was a new-fangled idea then, and I believe Guido simply didn't anticipate isalpha() and isalnum() would vary across non-EBCDIC platforms.

M.-A. Lemburg

3:29 a.m.

New subject: Shouldn't I be able to print Unicode objects?

Paul Prescod wrote:

...

Barry Scott wrote:

...
...
Shouldn't we fix the tokenizer to explicitly check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?!

If you embrace the world then NO. If America is you world then maybe.

Actually, if we were really going to embrace the world we'd need to handle more than a few European languages!

I was just suggesting to make the parser actually do what the language spec defines. And yes: I don't like non-ASCII identifiers (even though I live in Europe). This is just bound to cause trouble, e.g. people forgetting accents on characters, editors displaying code using wild approximations of what the code author intended to write, etc. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

M.-A. Lemburg

3:42 a.m.

New subject: Shouldn't I be able to print Unicode objects?

Tim Peters wrote:

...

[M.-A. Lemburg]

...
Wasn't me for sure... even in the Unicode age, I believe that Python source code should maintain readability by not allowing all alpha(numeric) characters for use in identifiers (there are lots of them in Unicode).

Shouldn't we fix the tokenizer to explicitely check for 'a'...'z' and 'A'...'Z' ?! (same for digits) ?!

That's certain to break code, and it's certain that some of those whose code gets broken would scream very loudly about it. OTOH, nobody would come to its defense with a hearty "whew! I'm so glad *that* hole finally got plugged!". I'm sure it would cause less trouble to take away <> as an alternative spelling of != (except that Barry is actually close enough to strangle Guido a few days each week <wink>). Is it worth the hassle? I don't know, but I'd *guess* Guido would rather endure the complaints for something more substantial (like, say, breaking 10 lines of an expert's obscure code that relies on int() being a builtin instead of a class <wink>).

Ok, point taken... still, it's funny sometimes how pydevs are willing to break perfectly valid code in some areas while not considering pointing users to clean up invalid code in other areas. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

Thomas Wouters

7:03 a.m.

New subject: Shouldn't I be able to print Unicode objects?

On Thu, Jun 07, 2001 at 10:42:40AM +0200, M.-A. Lemburg wrote:

...

still, it's funny sometimes how pydevs are willing to break perfectly valid code in some areas while not considering pointing users to clean up invalid code in other areas.

Well, I consider myself one of the more backward-oriented people on py-dev (or at least a vocal member of that sub-group ;) and I don't think changing int et al to be types/class-constructors is a problem. People who rely on int being a *function*, rather than being a callable, are either writing a python-specific script, a quick hack, or really, really know what they are getting into. I'm also not terribly worried about the use of non-ASCII characters in identifiers in Python, though a warning for the next one or two releases would be a good thing -- if anything, it should warn that that trick won't work for people with different locale settings! -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Tim Peters

8:39 p.m.

New subject: Shouldn't I be able to print Unicode objects?

[Thomas Wouters]

...

... I'm also not terribly worried about the use of non-ASCII characters in identifiers in Python, though a warning for the next one or two releases would be a good thing -- if anything, it should warn that that trick won't work for people with different locale settings!

Fine by me! Someone who cares enough to write the warning code and docs should just do so, although it may be wise to secure Guido's blessing first.

8366

Age (days ago)

8369

Last active (days ago)

List overview

Download

11 comments

8 participants

participants (8)

Barry Scott
Fredrik Lundh
Greg Ewing
M.-A. Lemburg
Paul Prescod
Thomas Wouters
Tim Peters
Tim Peters

Re: [Python-Dev] Shouldn't I be able to print Unicode objects?

Tim Peters

M.-A. Lemburg

Barry Scott

Paul Prescod

Greg Ewing

Tim Peters

Fredrik Lundh

Tim Peters

M.-A. Lemburg

M.-A. Lemburg

Thomas Wouters

Tim Peters

tags

participants (8)