
Hi, I looked through the man page for python's interpreter and appears that there is no way to properly distinguish between error messages output to stderr by the interpreter and output produced the by a user-program to stderr. What I would really like to have are two things: 1) an option to output interpreter generated messages to a specified file, whether these messages are uncatchable syntax errors, or catchable runtime errors that result in the termination of the interpreter. This feature would allow a wrapper program to distinguish between user-output and python interpreter output. 2) an option to provide a structured error output in some common easy-to-parse and extendable format that can be used to associate the file, line number, error type/number in some post-processing error handler. This feature would make the parsing of error messages more deterministic, and would be of significant benefit if other compilers/interpreters also provide the same functionality in the same common format. Does anyone know if there is already such a way to do what I've asked? If not, do you think having such features added to python would be something that would actually be included? Thanks, Bryce Boe

On 4/25/2012 3:27 PM, Bryce Boe wrote:
That should be a false distinction. User programs should only print error messages to stderr. Some modify error messages before they get printed. Some raise exceptions themselves with messages. The interpreter makes makes no distinction between user code, 3rd party code, and stdlib code.
'Raw' interpreter error messages start with 'SyntaxError' or 'Traceback'. Runtime errors do not seem to go to the normal stderr channel.
Exception instances have a .__traceback__ instance that is used to print the default traceback message. So it has or can generate much of what you request. I believe traceback objects are documented somewhere. Some apps wrap everything in try: run_app except Exception as e: custom_handle(e) -- Terry Jan Reedy

On Wed, Apr 25, 2012 at 2:33 PM, Terry Reedy <tjreedy@udel.edu> wrote:
The sys module also has an excepthook (1) which can be overridden to customize the exception handling. Note that it does not function with the threading module, however. (1) http://docs.python.org/library/sys.html#sys.excepthook

Perhaps I wasn't very clear. I want to write a tool to collect error messages when I run a program. Ideally the tool should be agnostic to what language is used and should be able to identify syntax errors, parser errors, and runtime errors. While I can parse both the stdout and stderr streams to find this information, from what I can tell there is no way to distinguish between a real syntax error (output to stderr): File "./test.py", line 5 class ^ SyntaxError: invalid syntax and a program that outputs that exact output to stderr and exits with status 1. This "channel" sharing of control (error messages) and data is a problem that affects more than just the python interpreter. I am hoping to start with python and provide a way to separate the control and data information so I can be certain that output on the "control" file descriptor is guaranteed to be generated by the interpreter.
Exception instances have a .__traceback__ instance that is used to print the default traceback message
I am aware I can obtain this information and output it however I want from my own program (less syntax errors), however, the goal is to run third party code and provide a more detailed error report. -Bryce

On Wed, 25 Apr 2012 15:06:16 -0700 Bryce Boe <bboe@cs.ucsb.edu> wrote:
Correct. And it isn't a problem, because it shouldn't matter if the programmer wrote a bit of code that caused the compiler to raise a syntax error, raised the syntax error themselves, evaled a string that raised the syntax error, or wrote the message explicitly to standard error: all those cases represent a syntax error to the programmer.
If your program is writing *data* to stderr, it's badly designed. You should fix it instead of trying to get Python changed to accommodate it. Or maybe you're trying to draw a distinction between messages purposely generated by the programmer, and messages the programmer didn't want? I think that's an artificial distinction. An error is an error is an error, and by any other name would be just as smelly. Whether it's an exception raised by the interpreter, an exception raised by the programmer, or just a message printed by the programmer really doesn't matter. Python programmers have complete control over all of this, and can make it do what they want. Trying to make distinctions based on default behaviors is misguided. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Perhaps you cannot envision a case where it matters because you don't work with people that intentionally try to cheat or mislead a system. I am merely pointing out that currently there is no way to distinguish between these behaviors, and I would personally like to add that support because I have a need to deterministically differentiate between them. I don't want to break backwards compatibility, so what I propose would only take place via a command line argument.
That is a generalization which is not true. Counter example: I have students writing a simple interpreter in python and their compiler should output syntax errors to stderr when the program their interpreter is interpreting. Now say their errors looks very similar to python errors, how does one distinguish between an error in their implementation of their interpreter, or an error raised by their interpreter? Furthermore, having this separation is somewhat pointless without the structured part, as ideally I would like it if all compilers and interpreters produced similar output so I could easily measure how many errors beginning programmers have in various languages and group them by type. I have to start somewhere with this project and I was hoping the python community would be in favor of adding such support as I feel the changes are relatively trivial. -Bryce

On Thu, Apr 26, 2012 at 10:30 AM, Bryce Boe <bboe@cs.ucsb.edu> wrote:
And we're telling you that no, the changes you're interested in are not trivial - the use of stderr is embedded deep within many parts of the interpreter. I suggest just raising the bar for your students and require that they write their errors to both stderr *and* to an error log. Interpreter generated errors will show up in the former, but will never appear in the latter. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

This is the first constructive comment I've received thus far. Perhaps I am a bit optimistic that grepping for output to stderr and replacing the write or fprintf calls with a function call would be appropriate. Maybe a tedious procedure, but it still seems trivial. It's not like trying to replace the GIL ;)
The example I gave was simply an example. Sure we could ask the students to more, but ideally this tool would work on any recent python source without requiring source modification. Many of these solutions have been presented before, however, they fail to work in the general case.
I realize that even if there was another output stream the user could write to it via, os.write(3, "foobar"), however, static checks can be made on the code to detect such function calls. Also, I'm curious, can a python program suppress later syntax errors?
Again this was just a contrived example that demonstrates my want to differentiate between the two. Whether or not the students get stuck isn't the problem, the problem is that it's not possible to do the differentiation. But simply put, why not allow for such differentiation? What is lost by doing so?
It is my opinion that most people make due with what they have available to them. Of course, I can do exactly what I want to do without modifying the interpreter, however, it suffers from the ambiguity problem I've already mentioned. One of the great things about open source software is the ability to adapt it to suit your own needs, and thus I prefer to take the approach of making things better. Perhaps no one before me has even considered separating interpreter output from the programs that the interpreter interprets, but I find that quite hard to believe. This really isn't a problem with compiled code, because the compilation and type checking process is separate from the execution process, though I'll admit there is the same problem with runtime errors such as segmentation faults but these can be discovered with proper signal handling.
First, the two changes should work in tandem thus both interpreters would have a flag, say --structured-error-output that takes a filename. With such a flag directing the error output to different files is quite trivial. However, even if they went to the same stream (poor design in my opinion) the structured messages can have an attribute indicating which interpreter produced the message thus allowing for differentiation. Of course, if they went to the same stream, then you still have the possibility of spoofing the other program which is why the separation is necessary. Anyway, I appreciate the argument. It is fairly clear that if I were to implement this support it is not something that would be integrated in python thus it's not worth my time. I'll take the band-aid approach as everyone before me has. Thanks, Bryce

On Wed, Apr 25, 2012 at 07:04:15PM -0700, Bryce Boe wrote:
I don't think so. Or at least, not easily. funcs = [len, str, eval, map] value = funcs[2]("__imp" + "port"[1:] + "__('sys')") f = getattr(value, ''.join(map(chr, (115, 116, 100, 101, 114, 114)))) y = getattr(f, ''.join(map(chr, (119, 114, 105, 116, 101)))) y("statically check this!\n")
Also, I'm curious, can a python program suppress later syntax errors?
try: exec "x = )(" except SyntaxError: print("nothing to see here, move along") [...]
Apart from simplicity? You risk infinite regress. stderr exists to differentiate "good" output from "error" output. So you propose a new stream to differentiate "good errors" (those raised by the students' interpreter, call it stderr2) from "bad errors" (those raised by the interpreter running the students' interpreter). At some point, someone will have some compelling (to them, not necessarily everyone else) use-case that needs to distinguish between "real" good errors going to stderr2 and "fake" good errors going to stderr2, and propose stderr3. And deeper and deeper down the rabbit hole we go... At some point, people will just say "Enough!". I suggest that the distinction between stdout and stderr is exactly that point. (Although, having said that, I wish there was a stdinfo for informational messages that are neither the intended program output nor unintended program errors, e.g. status messages, progress indicators, etc.) -- Steven

On Thu, Apr 26, 2012 at 12:44 PM, Steven D'Aprano <steve@pearwood.info> wrote:
And, indeed, such a channel exists: it's called the logging system, which separates event *generation* (calls to the logging message API) from event *display* (configuration of logging handlers). In particular, see the following table in the logging HOWTO guide: http://docs.python.org/howto/logging.html#when-to-use-logging Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 26/04/12 14:04, Bryce Boe wrote:
I don't see how that would solve your problem anyway. If your student's interpreter code, written in Python, raises e.g. a TypeError, and nothing catches it, the error message for it will get printed by the very same printf call as any other uncaught exception. Seems to me the solution to your problem lies in sandboxing the student's code inside something that catches any exceptions emanating from it and logs them in a distinctive way. You could also replace sys.stdout and sys.stderr with objects that perform a similar function. -- Greg

Bryce Boe <bboe@cs.ucsb.edu> wrote:
Yes, a python program can supporess later syntax errors.
For sorting out error messages in modern languages, compilation and execution are not necessarily separate. You can get compilation errors at pretty much any point in the execution of the program. Most such languages - include python, but also include languages that compile to machine or JVM code - include both the ability to import uncompiled source, compiling it along the way, and the ability to compile and run code fragments (aka "eval") in the program. Both of these can generate compilation errors in the middle of runtime. If my proram imports a config modules the user provided and it has a syntax error in it, is that syntax error a runtime error or a compilation error?
It is? How do you distinguish between an actual syntax error and a syntax error raised by an explicit raise statement? And which of those two cases would a syntax error raised by passing a bad code fragment to eval be, or is that a third case requiring yet another flag?
I'm still waiting for a proposal solid enough to evaluate. I like the idea of more structured error output, and think it might be a nice addition to the interpreter. A python programmer already has complete control over all error messages, though it can be hard to get to. Making that easier is a worthy goal, but it's got to be more than the ability to send some ill-defined set of exceptions to a different output stream. <mike -- Sent from my Android tablet. Please excuse my swyping.

On 4/25/2012 3:27 PM, Bryce Boe wrote:
That should be a false distinction. User programs should only print error messages to stderr. Some modify error messages before they get printed. Some raise exceptions themselves with messages. The interpreter makes makes no distinction between user code, 3rd party code, and stdlib code.
'Raw' interpreter error messages start with 'SyntaxError' or 'Traceback'. Runtime errors do not seem to go to the normal stderr channel.
Exception instances have a .__traceback__ instance that is used to print the default traceback message. So it has or can generate much of what you request. I believe traceback objects are documented somewhere. Some apps wrap everything in try: run_app except Exception as e: custom_handle(e) -- Terry Jan Reedy

On Wed, Apr 25, 2012 at 2:33 PM, Terry Reedy <tjreedy@udel.edu> wrote:
The sys module also has an excepthook (1) which can be overridden to customize the exception handling. Note that it does not function with the threading module, however. (1) http://docs.python.org/library/sys.html#sys.excepthook

Perhaps I wasn't very clear. I want to write a tool to collect error messages when I run a program. Ideally the tool should be agnostic to what language is used and should be able to identify syntax errors, parser errors, and runtime errors. While I can parse both the stdout and stderr streams to find this information, from what I can tell there is no way to distinguish between a real syntax error (output to stderr): File "./test.py", line 5 class ^ SyntaxError: invalid syntax and a program that outputs that exact output to stderr and exits with status 1. This "channel" sharing of control (error messages) and data is a problem that affects more than just the python interpreter. I am hoping to start with python and provide a way to separate the control and data information so I can be certain that output on the "control" file descriptor is guaranteed to be generated by the interpreter.
Exception instances have a .__traceback__ instance that is used to print the default traceback message
I am aware I can obtain this information and output it however I want from my own program (less syntax errors), however, the goal is to run third party code and provide a more detailed error report. -Bryce

On Wed, 25 Apr 2012 15:06:16 -0700 Bryce Boe <bboe@cs.ucsb.edu> wrote:
Correct. And it isn't a problem, because it shouldn't matter if the programmer wrote a bit of code that caused the compiler to raise a syntax error, raised the syntax error themselves, evaled a string that raised the syntax error, or wrote the message explicitly to standard error: all those cases represent a syntax error to the programmer.
If your program is writing *data* to stderr, it's badly designed. You should fix it instead of trying to get Python changed to accommodate it. Or maybe you're trying to draw a distinction between messages purposely generated by the programmer, and messages the programmer didn't want? I think that's an artificial distinction. An error is an error is an error, and by any other name would be just as smelly. Whether it's an exception raised by the interpreter, an exception raised by the programmer, or just a message printed by the programmer really doesn't matter. Python programmers have complete control over all of this, and can make it do what they want. Trying to make distinctions based on default behaviors is misguided. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Perhaps you cannot envision a case where it matters because you don't work with people that intentionally try to cheat or mislead a system. I am merely pointing out that currently there is no way to distinguish between these behaviors, and I would personally like to add that support because I have a need to deterministically differentiate between them. I don't want to break backwards compatibility, so what I propose would only take place via a command line argument.
That is a generalization which is not true. Counter example: I have students writing a simple interpreter in python and their compiler should output syntax errors to stderr when the program their interpreter is interpreting. Now say their errors looks very similar to python errors, how does one distinguish between an error in their implementation of their interpreter, or an error raised by their interpreter? Furthermore, having this separation is somewhat pointless without the structured part, as ideally I would like it if all compilers and interpreters produced similar output so I could easily measure how many errors beginning programmers have in various languages and group them by type. I have to start somewhere with this project and I was hoping the python community would be in favor of adding such support as I feel the changes are relatively trivial. -Bryce

On Thu, Apr 26, 2012 at 10:30 AM, Bryce Boe <bboe@cs.ucsb.edu> wrote:
And we're telling you that no, the changes you're interested in are not trivial - the use of stderr is embedded deep within many parts of the interpreter. I suggest just raising the bar for your students and require that they write their errors to both stderr *and* to an error log. Interpreter generated errors will show up in the former, but will never appear in the latter. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

This is the first constructive comment I've received thus far. Perhaps I am a bit optimistic that grepping for output to stderr and replacing the write or fprintf calls with a function call would be appropriate. Maybe a tedious procedure, but it still seems trivial. It's not like trying to replace the GIL ;)
The example I gave was simply an example. Sure we could ask the students to more, but ideally this tool would work on any recent python source without requiring source modification. Many of these solutions have been presented before, however, they fail to work in the general case.
I realize that even if there was another output stream the user could write to it via, os.write(3, "foobar"), however, static checks can be made on the code to detect such function calls. Also, I'm curious, can a python program suppress later syntax errors?
Again this was just a contrived example that demonstrates my want to differentiate between the two. Whether or not the students get stuck isn't the problem, the problem is that it's not possible to do the differentiation. But simply put, why not allow for such differentiation? What is lost by doing so?
It is my opinion that most people make due with what they have available to them. Of course, I can do exactly what I want to do without modifying the interpreter, however, it suffers from the ambiguity problem I've already mentioned. One of the great things about open source software is the ability to adapt it to suit your own needs, and thus I prefer to take the approach of making things better. Perhaps no one before me has even considered separating interpreter output from the programs that the interpreter interprets, but I find that quite hard to believe. This really isn't a problem with compiled code, because the compilation and type checking process is separate from the execution process, though I'll admit there is the same problem with runtime errors such as segmentation faults but these can be discovered with proper signal handling.
First, the two changes should work in tandem thus both interpreters would have a flag, say --structured-error-output that takes a filename. With such a flag directing the error output to different files is quite trivial. However, even if they went to the same stream (poor design in my opinion) the structured messages can have an attribute indicating which interpreter produced the message thus allowing for differentiation. Of course, if they went to the same stream, then you still have the possibility of spoofing the other program which is why the separation is necessary. Anyway, I appreciate the argument. It is fairly clear that if I were to implement this support it is not something that would be integrated in python thus it's not worth my time. I'll take the band-aid approach as everyone before me has. Thanks, Bryce

On Wed, Apr 25, 2012 at 07:04:15PM -0700, Bryce Boe wrote:
I don't think so. Or at least, not easily. funcs = [len, str, eval, map] value = funcs[2]("__imp" + "port"[1:] + "__('sys')") f = getattr(value, ''.join(map(chr, (115, 116, 100, 101, 114, 114)))) y = getattr(f, ''.join(map(chr, (119, 114, 105, 116, 101)))) y("statically check this!\n")
Also, I'm curious, can a python program suppress later syntax errors?
try: exec "x = )(" except SyntaxError: print("nothing to see here, move along") [...]
Apart from simplicity? You risk infinite regress. stderr exists to differentiate "good" output from "error" output. So you propose a new stream to differentiate "good errors" (those raised by the students' interpreter, call it stderr2) from "bad errors" (those raised by the interpreter running the students' interpreter). At some point, someone will have some compelling (to them, not necessarily everyone else) use-case that needs to distinguish between "real" good errors going to stderr2 and "fake" good errors going to stderr2, and propose stderr3. And deeper and deeper down the rabbit hole we go... At some point, people will just say "Enough!". I suggest that the distinction between stdout and stderr is exactly that point. (Although, having said that, I wish there was a stdinfo for informational messages that are neither the intended program output nor unintended program errors, e.g. status messages, progress indicators, etc.) -- Steven

On Thu, Apr 26, 2012 at 12:44 PM, Steven D'Aprano <steve@pearwood.info> wrote:
And, indeed, such a channel exists: it's called the logging system, which separates event *generation* (calls to the logging message API) from event *display* (configuration of logging handlers). In particular, see the following table in the logging HOWTO guide: http://docs.python.org/howto/logging.html#when-to-use-logging Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 26/04/12 14:04, Bryce Boe wrote:
I don't see how that would solve your problem anyway. If your student's interpreter code, written in Python, raises e.g. a TypeError, and nothing catches it, the error message for it will get printed by the very same printf call as any other uncaught exception. Seems to me the solution to your problem lies in sandboxing the student's code inside something that catches any exceptions emanating from it and logs them in a distinctive way. You could also replace sys.stdout and sys.stderr with objects that perform a similar function. -- Greg

Bryce Boe <bboe@cs.ucsb.edu> wrote:
Yes, a python program can supporess later syntax errors.
For sorting out error messages in modern languages, compilation and execution are not necessarily separate. You can get compilation errors at pretty much any point in the execution of the program. Most such languages - include python, but also include languages that compile to machine or JVM code - include both the ability to import uncompiled source, compiling it along the way, and the ability to compile and run code fragments (aka "eval") in the program. Both of these can generate compilation errors in the middle of runtime. If my proram imports a config modules the user provided and it has a syntax error in it, is that syntax error a runtime error or a compilation error?
It is? How do you distinguish between an actual syntax error and a syntax error raised by an explicit raise statement? And which of those two cases would a syntax error raised by passing a bad code fragment to eval be, or is that a third case requiring yet another flag?
I'm still waiting for a proposal solid enough to evaluate. I like the idea of more structured error output, and think it might be a nice addition to the interpreter. A python programmer already has complete control over all error messages, though it can be hard to get to. Making that easier is a worthy goal, but it's got to be more than the ability to send some ill-defined set of exceptions to a different output stream. <mike -- Sent from my Android tablet. Please excuse my swyping.
participants (7)
-
Bryce Boe
-
Chris Kaynor
-
Greg Ewing
-
Mike Meyer
-
Nick Coghlan
-
Steven D'Aprano
-
Terry Reedy