[PEP 223] Change the Meaning of \x Escapes
data:image/s3,"s3://crabby-images/4c299/4c299dfcd8671c0ce1f071dce620a40b4a7be3e3" alt=""
An HTML version of the attached can be viewed at http://python.sourceforge.net/peps/pep-0223.html This will be adopted for 2.0 unless there's an uproar. Note that it *does* have potential for breaking existing code -- although no real-life instance of incompatibility has yet been reported. This is explained in detail in the PEP; check your code now. although-if-i-were-you-i-wouldn't-bother<0.5-wink>-ly y'rs - tim PEP: 223 Title: Change the Meaning of \x Escapes Version: $Revision: 1.4 $ Author: tpeters@beopen.com (Tim Peters) Status: Active Type: Standards Track Python-Version: 2.0 Created: 20-Aug-2000 Post-History: 23-Aug-2000 Abstract Change \x escapes, in both 8-bit and Unicode strings, to consume exactly the two hex digits following. The proposal views this as correcting an original design flaw, leading to clearer expression in all flavors of string, a cleaner Unicode story, better compatibility with Perl regular expressions, and with minimal risk to existing code. Syntax The syntax of \x escapes, in all flavors of non-raw strings, becomes \xhh where h is a hex digit (0-9, a-f, A-F). The exact syntax in 1.5.2 is not clearly specified in the Reference Manual; it says \xhh... implying "two or more" hex digits, but one-digit forms are also accepted by the 1.5.2 compiler, and a plain \x is "expanded" to itself (i.e., a backslash followed by the letter x). It's unclear whether the Reference Manual intended either of the 1-digit or 0-digit behaviors. Semantics In an 8-bit non-raw string, \xij expands to the character chr(int(ij, 16)) Note that this is the same as in 1.6 and before. In a Unicode string, \xij acts the same as \u00ij i.e. it expands to the obvious Latin-1 character from the initial segment of the Unicode space. An \x not followed by at least two hex digits is a compile-time error, specifically ValueError in 8-bit strings, and UnicodeError (a subclass of ValueError) in Unicode strings. Note that if an \x is followed by more than two hex digits, only the first two are "consumed". In 1.6 and before all but the *last* two were silently ignored. Example In 1.5.2: >>> "\x123465" # same as "\x65" 'e' >>> "\x65" 'e' >>> "\x1" '\001' >>> "\x\x" '\\x\\x' >>> In 2.0: >>> "\x123465" # \x12 -> \022, "3456" left alone '\0223456' >>> "\x65" 'e' >>> "\x1" [ValueError is raised] >>> "\x\x" [ValueError is raised] >>> History and Rationale \x escapes were introduced in C as a way to specify variable-width character encodings. Exactly which encodings those were, and how many hex digits they required, was left up to each implementation. The language simply stated that \x "consumed" *all* hex digits following, and left the meaning up to each implementation. So, in effect, \x in C is a standard hook to supply platform-defined behavior. Because Python explicitly aims at platform independence, the \x escape in Python (up to and including 1.6) has been treated the same way across all platforms: all *except* the last two hex digits were silently ignored. So the only actual use for \x escapes in Python was to specify a single byte using hex notation. Larry Wall appears to have realized that this was the only real use for \x escapes in a platform-independent language, as the proposed rule for Python 2.0 is in fact what Perl has done from the start (although you need to run in Perl -w mode to get warned about \x escapes with fewer than 2 hex digits following -- it's clearly more Pythonic to insist on 2 all the time). When Unicode strings were introduced to Python, \x was generalized so as to ignore all but the last *four* hex digits in Unicode strings. This caused a technical difficulty for the new regular expression engine: SRE tries very hard to allow mixing 8-bit and Unicode patterns and strings in intuitive ways, and it no longer had any way to guess what, for example, r"\x123456" should mean as a pattern: is it asking to match the 8-bit character \x56 or the Unicode character \u3456? There are hacky ways to guess, but it doesn't end there. The ISO C99 standard also introduces 8-digit \U12345678 escapes to cover the entire ISO 10646 character space, and it's also desired that Python 2 support that from the start. But then what are \x escapes supposed to mean? Do they ignore all but the last *eight* hex digits then? And if less than 8 following in a Unicode string, all but the last 4? And if less than 4, all but the last 2? This was getting messier by the minute, and the proposal cuts the Gordian knot by making \x simpler instead of more complicated. Note that the 4-digit generalization to \xijkl in Unicode strings was also redundant, because it meant exactly the same thing as \uijkl in Unicode strings. It's more Pythonic to have just one obvious way to specify a Unicode character via hex notation. Development and Discussion The proposal was worked out among Guido van Rossum, Fredrik Lundh and Tim Peters in email. It was subsequently explained and disussed on Python-Dev under subject "Go \x yourself", starting 2000-08-03. Response was overwhelmingly positive; no objections were raised. Backward Compatibility Changing the meaning of \x escapes does carry risk of breaking existing code, although no instances of incompabitility have yet been discovered. The risk is believed to be minimal. Tim Peters verified that, except for pieces of the standard test suite deliberately provoking end cases, there are no instances of \xabcdef... with fewer or more than 2 hex digits following, in either the Python CVS development tree, or in assorted Python packages sitting on his machine. It's unlikely there are any with fewer than 2, because the Reference Manual implied they weren't legal (although this is debatable!). If there are any with more than 2, Guido is ready to argue they were buggy anyway <0.9 wink>. Guido reported that the O'Reilly Python books *already* document that Python works the proposed way, likely due to their Perl editing heritage (as above, Perl worked (very close to) the proposed way from its start). Finn Bock reported that what JPython does with \x escapes is unpredictable today. This proposal gives a clear meaning that can be consistently and easily implemented across all Python implementations. Effects on Other Tools Believed to be none. The candidates for breakage would mostly be parsing tools, but the author knows of none that worry about the internal structure of Python strings beyond the approximation "when there's a backslash, swallow the next character". Tim Peters checked python-mode.el, the std tokenize.py and pyclbr.py, and the IDLE syntax coloring subsystem, and believes there's no need to change any of them. Tools like tabnanny.py and checkappend.py inherit their immunity from tokenize.py. Reference Implementation The code changes are so simple that a separate patch will not be produced. Fredrik Lundh is writing the code, is an expert in the area, and will simply check the changes in before 2.0b1 is released. BDFL Pronouncements Yes, ValueError, not SyntaxError. "Problems with literal interpretations traditionally raise 'runtime' exceptions rather than syntax errors." Copyright This document has been placed in the public domain.
data:image/s3,"s3://crabby-images/163a8/163a80a2f5bd494435f25db087401841370a66e9" alt=""
An HTML version of the attached can be viewed at
Nice PEP!
Effects on Other Tools
Believed to be none. [...]
I believe that Fredrik also needs to fix SRE's interpretation of \xhh. Unless he's already done that. --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
data:image/s3,"s3://crabby-images/4c299/4c299dfcd8671c0ce1f071dce620a40b4a7be3e3" alt=""
[Guido]
Nice PEP!
Thanks! I thought the kids could stand a simple example of what you'd like to read <wink>.
I believe that Fredrik also needs to fix SRE's interpretation of \xhh. Unless he's already done that.
I'm sure he's acutely aware of that, since that's how this started! And he's implementing \x in strings too. I knew you wouldn't read it to the end <0.9 wink>. put-the-refman-stuff-briefly-at-the-front-and-save-the-blather-for- the-end-ly y'rs - tim
data:image/s3,"s3://crabby-images/264c7/264c722c1287d99a609fc1bdbf93320e2d7663ca" alt=""
In support of the argument that bad literals should raise ValueError (or a derived exception) rather than SyntaxError, Guido once said:
"Problems with literal interpretations traditionally raise 'runtime' exceptions rather than syntax errors."
This is currently true of overflowing integers and string literals, and hence it has also been so implemented for Unicode literals. But i want to propose a break with tradition, because some more recent thinking on this has led me to become firmly convinced that SyntaxError is really the right thing to do in all of these cases. The strongest reason is that a long file with a typo in a string literal somewhere in hundreds of lines of code generates only ValueError: invalid \x escape with no indication to where the error is -- not even which file! I realize this could be hacked upon and fixed, but i think it points to a general inconsistency that ought to be considered and addressed. 1. SyntaxErrors are for compile-time errors. A problem with a literal happens before the program starts running, and it is useful for me, as the programmer, to know whether the error occurred because of some computational process, possibly depending on inputs, or whether it's a permanent mistake that's literally in my source code. In other words, will a debugger do me any good? 2. SyntaxErrors pinpoint the exact location of the problem. In principle, an error is a SyntaxError if and only if you can point to an exact character position as being the cause of the problem. 3. A ValueError means "i got a value that wasn't allowed or expected here". That is not at all what is happening. There is *no* defined value at all. It's not that there was a value and it was wrong -- the value was never even brought into existence. 4. The current implementation of ValueErrors is very unhelpful about what to do about an invalid literal, as explained in the example above. A SyntaxError would be much more useful. I hope you will agree with me that solving only #4 by changing ValueErrors so they behave a little more like SyntaxErrors in certain particular situations isn't the best solution. Also, switching to SyntaxError is likely to break very few things. You can't depend on catching a SyntaxError, precisely because it's a compile-time error. No one could possibly be using "except ValueError" to try to catch invalid literals in their code; that usage, just like "except SyntaxError:", makes sense only when someone is using "eval" or "exec" to interpret code that was generated or read from input. In fact, i bet switching to SyntaxError would actually make some code of the form "try: eval ... except SyntaxError" work better, since the single except clause would catch all possible compilation problems with the input to eval. -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose
data:image/s3,"s3://crabby-images/1887d/1887d74aefa167e0775932ca2e5e1ad229548651" alt=""
All good points, except that I still find it hard to flag overflow errors as syntax errors, especially since overflow is platform defined. On one platform, 1000000000000 is fine; on another it's a SyntaxError. That could be confusing. But you're absolutely right about string literals, and maybe it's OK if 1000000000000000000000000000000000000000000000000000000000000000000 is flagged as a syntax error. (After all it's missing a trailing 'L'.) Another solution (borrowing from C): automatically promote int literals to long if they can't be evaluated as ints. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
Guido:
How about introducing the following hierarchy: CompileTimeError SyntaxError LiteralRangeError LiteralRangeError could inherit from ValueError as well if you want. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/264c7/264c722c1287d99a609fc1bdbf93320e2d7663ca" alt=""
On Wed, 14 Feb 2001, Greg Ewing wrote:
I suppose that's all right, and i wouldn't complain, but i don't think it's all that necessary either. Compile-time errors *are* syntax errors. What else could they be? (Aside from fatal errors or limitations of the compiler implementation, that is, but again that's outside of the abstraction we're presenting to the Python user.) Think of it this way: if there's a problem with your Python program, it's either a problem with *how* it expresses something (syntax), or with *what* it expresses (semantics). The syntactic errors occur at compile-time and the semantic errors occur at run-time. -- ?!ng
data:image/s3,"s3://crabby-images/264c7/264c722c1287d99a609fc1bdbf93320e2d7663ca" alt=""
On Tue, 13 Feb 2001, Guido van Rossum wrote:
I know it may seem weird. I tend to see it as a consequence of the language definition, though, not as the wrong choice of error. If you had to write a truly platform-independent Python language definition (a worthwhile endeavour, by the way, especially given that there are already at least CPython, JPython, and stackless), the decision about this would have to be made there.
On one platform, 1000000000000 is fine; on another it's a SyntaxError. That could be confusing.
So far, Python is effectively defined in such a way that 100000000000 has a meaning on one platform and has no meaning on another. <shrug> So, yeah, that's the way it is.
Another solution (borrowing from C): automatically promote int literals to long if they can't be evaluated as ints.
Quite reasonable, yes. But i'd go further than that. I think everyone so far has been in agreement that the division between ints and long ints should eventually be abolished, and we're just waiting for someone brave enough to come along and make it happen. I know i've got my fingers crossed. :) (And maybe after we deprecate 'L', we can deprecate capital 'J' on numbers and 'R', 'U' on strings too...) toowtdi-ly yours, -- ?!ng
data:image/s3,"s3://crabby-images/5ae7c/5ae7c201824b37c3633187431441e0f369a52a1a" alt=""
On Tue, Feb 13, 2001 at 03:11:10PM -0800, Ka-Ping Yee wrote:
The strongest reason is that a long file with a typo in a string literal somewhere in hundreds of lines of code generates only
ValueError: invalid \x escape
This has nothing to do with the error being a ValueError, but with some (compile-time) errors not being promoted to 'full' errors. See https://sourceforge.net/patch/?func=detailpatch&patch_id=101782&group_id=5470 The same issue came up when importing modules that did 'from foo import *' in a function scope.
Agreed. That could possibly be solved by a better description of the valueerrors in question, though. (The 'invalid \x escape' message seems pretty obvious a compiletime-error to me, but others might not.)
See above.
Not quite true. It wasn't *compiled*, but it's a literal, so it does exist. The problem is not the value of a compiled \x escape, but the value after the \x.
See #1 :)
I don't, really. The name 'ValueError' is exactly right: what is wrong (in the \x escape example) is the *value* of something (of the \x escape in question.) If a syntax error was raised, I would think something was wrong with the syntax. But the \x is placed in the right spot, inside a string literal. The string literal itself is placed right. Why would it be a syntax error ?
I'd say you want a 'CompilerError' superclass instead. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Thomas Wouters wrote:
Right and I think this touches the core of the problem. SyntaxErrors produce a proper traceback while ValueErrors (and others) just print a single line which doesn't even have the filename or line number. I wonder why the PyErr_PrintEx() (pythonrun.c) error handler only tries to parse SyntaxErrors for .filename and .lineno parameters. Looking at compile.c these should be settable on all exception object (since these are now proper instances). Perhaps lifting the restriction in PyErr_PrintEx() and making the parse_syntax_error() API a little more robust might do the trick. Then the various direct PyErr_SetString() calls in compile.c should be converted to use com_error() instead (if possible). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/264c7/264c722c1287d99a609fc1bdbf93320e2d7663ca" alt=""
I wrote:
Thomas Wouters wrote:
This has nothing to do with the error being a ValueError, but with some (compile-time) errors not being promoted to 'full' errors. See
I think they are entirely related. All ValueErrors should be run-time errors; a ValueError should never occur during compilation. The key issue is communicating clearly with the user, and that's just not what ValueError *means*. M.-A. Lemburg wrote:
This follows sensibly from the fact that SyntaxErrors are always compile-time errors (and therefore have no traceback or frame at the level where the error occurred). ValueErrors are usually run-time errors, so .filename and .lineno attributes would be redundant; this information is already available in the associated frame object.
That sounds like a significant amount of work, and i'm not sure it's the right answer. If we just clarify the boundary by making sure make sure that all, and only, compile-time errors are SyntaxErrors, everything would work properly and the meaning of the various exception classes would be clearer. The only exceptions that don't currently conform, as far as i know, have to do with invalid literals. -- ?!ng
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Ka-Ping Yee wrote:
Those attributes are added to the error object by set_error_location() in compile.c. Since the error objects are Python instances, the function will set those attribute on any error which the compiler raises and IMHO, this would be a good thing.
Changing all compile time errors to SyntaxError requires much the same amount of work... you'd have to either modify the code to use com_error() or check for errors and then redirect them to com_error() (e.g. for codec errors).
Well, there are also system and memory errors and the codecs are free to raise any other kind of error as well. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/1b296/1b296e86cd8b01ddca1413c4cc5ae7c186edc52a" alt=""
[Ka-Ping Yee]
Pretty much, but nothing's *that* easy. Other examples: + If there are too many nested blocks, it raises SystemError(!). + MemoryError is raised if a dotted name is too long. + OverflowError is raised if a string is too long. Note that those don't have to do with syntax, they're arbitrary implementation limits. So that's the rule: raise SystemError if something is bigger than 20 MemoryError if it's bigger than 1000 OverflowError if it's bigger than an int Couldn't be clearer <wink>. + SystemErrors are raised in many other places in the role of internal assertions failing. Those needn't be changed.
data:image/s3,"s3://crabby-images/1887d/1887d74aefa167e0775932ca2e5e1ad229548651" alt=""
[Tim]
Note that MemoryErrors are also raised whenever new objects are created, which happens all the time during the course of compilation (both Jeremy's symbol table code and of course code objects). These needn't be changed either. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/264c7/264c722c1287d99a609fc1bdbf93320e2d7663ca" alt=""
On Wed, 14 Feb 2001, Thomas Wouters wrote:
No, it doesn't exist -- not in the Python world, anyway. There is no Python object corresponding to the literal. That's what i meant by not existing. I think this is an okay choice of meaning for "exist", since, after all, the point of the language is to abstract away lower levels so programmers can think in that higher-level "Python world".
The previous paragraph pretty much answers this, but i'll clarify. My understanding of ValueError, as it holds in all other situations but this one, is that a Python value of the right type was supplied but it was otherwise wrong -- illegal, or unexpected, or something of that sort. The documentation on the exceptions module says: ValueError Raised when a built-in operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError. That doesn't apply to "\xgh" or 1982391879487124.
If a syntax error was raised, I would think something was wrong with the syntax.
But there is. "\x45" is syntax for the letter E. It generates the semantics "the character object with ordinal 69 (corresponding to the uppercase letter E in ASCII)". "\xgh" doesn't generate any semantics -- we stop before we get there, because the syntax is wrong. -- ?!ng
data:image/s3,"s3://crabby-images/1b296/1b296e86cd8b01ddca1413c4cc5ae7c186edc52a" alt=""
[Thomas Wouters]
Oh, why not <wink>. The syntax of an \x escape is "\\" "x" hexdigit hexdigit and to call something that doesn't match that syntax a SyntaxError isn't much of a stretch. Neither is calling it a ValueError. [Guido]
Another solution (borrowing from C): automatically promote int literals to long if they can't be evaluated as ints.
Yes! The user-visible distinction between ints and longs causes more problems than it solves. Would also get us one step closer to punting the incomprehensible "because the grammar implies it" answer to the FAQlet: Yo, Phyton d00dz! What's up with this? >>> x = "-2147483648" >>> int(x) -2147483648 >>> eval(x) Traceback (most recent call last): File "<stdin>", line 1, in ? OverflowError: integer literal too large >>>
data:image/s3,"s3://crabby-images/163a8/163a80a2f5bd494435f25db087401841370a66e9" alt=""
An HTML version of the attached can be viewed at
Nice PEP!
Effects on Other Tools
Believed to be none. [...]
I believe that Fredrik also needs to fix SRE's interpretation of \xhh. Unless he's already done that. --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
data:image/s3,"s3://crabby-images/4c299/4c299dfcd8671c0ce1f071dce620a40b4a7be3e3" alt=""
[Guido]
Nice PEP!
Thanks! I thought the kids could stand a simple example of what you'd like to read <wink>.
I believe that Fredrik also needs to fix SRE's interpretation of \xhh. Unless he's already done that.
I'm sure he's acutely aware of that, since that's how this started! And he's implementing \x in strings too. I knew you wouldn't read it to the end <0.9 wink>. put-the-refman-stuff-briefly-at-the-front-and-save-the-blather-for- the-end-ly y'rs - tim
data:image/s3,"s3://crabby-images/264c7/264c722c1287d99a609fc1bdbf93320e2d7663ca" alt=""
In support of the argument that bad literals should raise ValueError (or a derived exception) rather than SyntaxError, Guido once said:
"Problems with literal interpretations traditionally raise 'runtime' exceptions rather than syntax errors."
This is currently true of overflowing integers and string literals, and hence it has also been so implemented for Unicode literals. But i want to propose a break with tradition, because some more recent thinking on this has led me to become firmly convinced that SyntaxError is really the right thing to do in all of these cases. The strongest reason is that a long file with a typo in a string literal somewhere in hundreds of lines of code generates only ValueError: invalid \x escape with no indication to where the error is -- not even which file! I realize this could be hacked upon and fixed, but i think it points to a general inconsistency that ought to be considered and addressed. 1. SyntaxErrors are for compile-time errors. A problem with a literal happens before the program starts running, and it is useful for me, as the programmer, to know whether the error occurred because of some computational process, possibly depending on inputs, or whether it's a permanent mistake that's literally in my source code. In other words, will a debugger do me any good? 2. SyntaxErrors pinpoint the exact location of the problem. In principle, an error is a SyntaxError if and only if you can point to an exact character position as being the cause of the problem. 3. A ValueError means "i got a value that wasn't allowed or expected here". That is not at all what is happening. There is *no* defined value at all. It's not that there was a value and it was wrong -- the value was never even brought into existence. 4. The current implementation of ValueErrors is very unhelpful about what to do about an invalid literal, as explained in the example above. A SyntaxError would be much more useful. I hope you will agree with me that solving only #4 by changing ValueErrors so they behave a little more like SyntaxErrors in certain particular situations isn't the best solution. Also, switching to SyntaxError is likely to break very few things. You can't depend on catching a SyntaxError, precisely because it's a compile-time error. No one could possibly be using "except ValueError" to try to catch invalid literals in their code; that usage, just like "except SyntaxError:", makes sense only when someone is using "eval" or "exec" to interpret code that was generated or read from input. In fact, i bet switching to SyntaxError would actually make some code of the form "try: eval ... except SyntaxError" work better, since the single except clause would catch all possible compilation problems with the input to eval. -- ?!ng Happiness comes more from loving than being loved; and often when our affection seems wounded it is is only our vanity bleeding. To love, and to be hurt often, and to love again--this is the brave and happy life. -- J. E. Buchrose
data:image/s3,"s3://crabby-images/1887d/1887d74aefa167e0775932ca2e5e1ad229548651" alt=""
All good points, except that I still find it hard to flag overflow errors as syntax errors, especially since overflow is platform defined. On one platform, 1000000000000 is fine; on another it's a SyntaxError. That could be confusing. But you're absolutely right about string literals, and maybe it's OK if 1000000000000000000000000000000000000000000000000000000000000000000 is flagged as a syntax error. (After all it's missing a trailing 'L'.) Another solution (borrowing from C): automatically promote int literals to long if they can't be evaluated as ints. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/ede6d/ede6d2cca33d547132f95e3f8c841d9976575f77" alt=""
Guido:
How about introducing the following hierarchy: CompileTimeError SyntaxError LiteralRangeError LiteralRangeError could inherit from ValueError as well if you want. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
data:image/s3,"s3://crabby-images/264c7/264c722c1287d99a609fc1bdbf93320e2d7663ca" alt=""
On Wed, 14 Feb 2001, Greg Ewing wrote:
I suppose that's all right, and i wouldn't complain, but i don't think it's all that necessary either. Compile-time errors *are* syntax errors. What else could they be? (Aside from fatal errors or limitations of the compiler implementation, that is, but again that's outside of the abstraction we're presenting to the Python user.) Think of it this way: if there's a problem with your Python program, it's either a problem with *how* it expresses something (syntax), or with *what* it expresses (semantics). The syntactic errors occur at compile-time and the semantic errors occur at run-time. -- ?!ng
data:image/s3,"s3://crabby-images/264c7/264c722c1287d99a609fc1bdbf93320e2d7663ca" alt=""
On Tue, 13 Feb 2001, Guido van Rossum wrote:
I know it may seem weird. I tend to see it as a consequence of the language definition, though, not as the wrong choice of error. If you had to write a truly platform-independent Python language definition (a worthwhile endeavour, by the way, especially given that there are already at least CPython, JPython, and stackless), the decision about this would have to be made there.
On one platform, 1000000000000 is fine; on another it's a SyntaxError. That could be confusing.
So far, Python is effectively defined in such a way that 100000000000 has a meaning on one platform and has no meaning on another. <shrug> So, yeah, that's the way it is.
Another solution (borrowing from C): automatically promote int literals to long if they can't be evaluated as ints.
Quite reasonable, yes. But i'd go further than that. I think everyone so far has been in agreement that the division between ints and long ints should eventually be abolished, and we're just waiting for someone brave enough to come along and make it happen. I know i've got my fingers crossed. :) (And maybe after we deprecate 'L', we can deprecate capital 'J' on numbers and 'R', 'U' on strings too...) toowtdi-ly yours, -- ?!ng
data:image/s3,"s3://crabby-images/5ae7c/5ae7c201824b37c3633187431441e0f369a52a1a" alt=""
On Tue, Feb 13, 2001 at 03:11:10PM -0800, Ka-Ping Yee wrote:
The strongest reason is that a long file with a typo in a string literal somewhere in hundreds of lines of code generates only
ValueError: invalid \x escape
This has nothing to do with the error being a ValueError, but with some (compile-time) errors not being promoted to 'full' errors. See https://sourceforge.net/patch/?func=detailpatch&patch_id=101782&group_id=5470 The same issue came up when importing modules that did 'from foo import *' in a function scope.
Agreed. That could possibly be solved by a better description of the valueerrors in question, though. (The 'invalid \x escape' message seems pretty obvious a compiletime-error to me, but others might not.)
See above.
Not quite true. It wasn't *compiled*, but it's a literal, so it does exist. The problem is not the value of a compiled \x escape, but the value after the \x.
See #1 :)
I don't, really. The name 'ValueError' is exactly right: what is wrong (in the \x escape example) is the *value* of something (of the \x escape in question.) If a syntax error was raised, I would think something was wrong with the syntax. But the \x is placed in the right spot, inside a string literal. The string literal itself is placed right. Why would it be a syntax error ?
I'd say you want a 'CompilerError' superclass instead. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Thomas Wouters wrote:
Right and I think this touches the core of the problem. SyntaxErrors produce a proper traceback while ValueErrors (and others) just print a single line which doesn't even have the filename or line number. I wonder why the PyErr_PrintEx() (pythonrun.c) error handler only tries to parse SyntaxErrors for .filename and .lineno parameters. Looking at compile.c these should be settable on all exception object (since these are now proper instances). Perhaps lifting the restriction in PyErr_PrintEx() and making the parse_syntax_error() API a little more robust might do the trick. Then the various direct PyErr_SetString() calls in compile.c should be converted to use com_error() instead (if possible). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/264c7/264c722c1287d99a609fc1bdbf93320e2d7663ca" alt=""
I wrote:
Thomas Wouters wrote:
This has nothing to do with the error being a ValueError, but with some (compile-time) errors not being promoted to 'full' errors. See
I think they are entirely related. All ValueErrors should be run-time errors; a ValueError should never occur during compilation. The key issue is communicating clearly with the user, and that's just not what ValueError *means*. M.-A. Lemburg wrote:
This follows sensibly from the fact that SyntaxErrors are always compile-time errors (and therefore have no traceback or frame at the level where the error occurred). ValueErrors are usually run-time errors, so .filename and .lineno attributes would be redundant; this information is already available in the associated frame object.
That sounds like a significant amount of work, and i'm not sure it's the right answer. If we just clarify the boundary by making sure make sure that all, and only, compile-time errors are SyntaxErrors, everything would work properly and the meaning of the various exception classes would be clearer. The only exceptions that don't currently conform, as far as i know, have to do with invalid literals. -- ?!ng
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Ka-Ping Yee wrote:
Those attributes are added to the error object by set_error_location() in compile.c. Since the error objects are Python instances, the function will set those attribute on any error which the compiler raises and IMHO, this would be a good thing.
Changing all compile time errors to SyntaxError requires much the same amount of work... you'd have to either modify the code to use com_error() or check for errors and then redirect them to com_error() (e.g. for codec errors).
Well, there are also system and memory errors and the codecs are free to raise any other kind of error as well. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/1b296/1b296e86cd8b01ddca1413c4cc5ae7c186edc52a" alt=""
[Ka-Ping Yee]
Pretty much, but nothing's *that* easy. Other examples: + If there are too many nested blocks, it raises SystemError(!). + MemoryError is raised if a dotted name is too long. + OverflowError is raised if a string is too long. Note that those don't have to do with syntax, they're arbitrary implementation limits. So that's the rule: raise SystemError if something is bigger than 20 MemoryError if it's bigger than 1000 OverflowError if it's bigger than an int Couldn't be clearer <wink>. + SystemErrors are raised in many other places in the role of internal assertions failing. Those needn't be changed.
data:image/s3,"s3://crabby-images/1887d/1887d74aefa167e0775932ca2e5e1ad229548651" alt=""
[Tim]
Note that MemoryErrors are also raised whenever new objects are created, which happens all the time during the course of compilation (both Jeremy's symbol table code and of course code objects). These needn't be changed either. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/264c7/264c722c1287d99a609fc1bdbf93320e2d7663ca" alt=""
On Wed, 14 Feb 2001, Thomas Wouters wrote:
No, it doesn't exist -- not in the Python world, anyway. There is no Python object corresponding to the literal. That's what i meant by not existing. I think this is an okay choice of meaning for "exist", since, after all, the point of the language is to abstract away lower levels so programmers can think in that higher-level "Python world".
The previous paragraph pretty much answers this, but i'll clarify. My understanding of ValueError, as it holds in all other situations but this one, is that a Python value of the right type was supplied but it was otherwise wrong -- illegal, or unexpected, or something of that sort. The documentation on the exceptions module says: ValueError Raised when a built-in operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError. That doesn't apply to "\xgh" or 1982391879487124.
If a syntax error was raised, I would think something was wrong with the syntax.
But there is. "\x45" is syntax for the letter E. It generates the semantics "the character object with ordinal 69 (corresponding to the uppercase letter E in ASCII)". "\xgh" doesn't generate any semantics -- we stop before we get there, because the syntax is wrong. -- ?!ng
data:image/s3,"s3://crabby-images/1b296/1b296e86cd8b01ddca1413c4cc5ae7c186edc52a" alt=""
[Thomas Wouters]
Oh, why not <wink>. The syntax of an \x escape is "\\" "x" hexdigit hexdigit and to call something that doesn't match that syntax a SyntaxError isn't much of a stretch. Neither is calling it a ValueError. [Guido]
Another solution (borrowing from C): automatically promote int literals to long if they can't be evaluated as ints.
Yes! The user-visible distinction between ints and longs causes more problems than it solves. Would also get us one step closer to punting the incomprehensible "because the grammar implies it" answer to the FAQlet: Yo, Phyton d00dz! What's up with this? >>> x = "-2147483648" >>> int(x) -2147483648 >>> eval(x) Traceback (most recent call last): File "<stdin>", line 1, in ? OverflowError: integer literal too large >>>
participants (9)
-
Fred L. Drake, Jr.
-
Greg Ewing
-
Guido van Rossum
-
Guido van Rossum
-
Ka-Ping Yee
-
M.-A. Lemburg
-
Thomas Wouters
-
Tim Peters
-
Tim Peters