Hi, I looked through the man page for python's interpreter and appears that there is no way to properly distinguish between error messages output to stderr by the interpreter and output produced the by a user-program to stderr. What I would really like to have are two things: 1) an option to output interpreter generated messages to a specified file, whether these messages are uncatchable syntax errors, or catchable runtime errors that result in the termination of the interpreter. This feature would allow a wrapper program to distinguish between user-output and python interpreter output. 2) an option to provide a structured error output in some common easy-to-parse and extendable format that can be used to associate the file, line number, error type/number in some post-processing error handler. This feature would make the parsing of error messages more deterministic, and would be of significant benefit if other compilers/interpreters also provide the same functionality in the same common format. Does anyone know if there is already such a way to do what I've asked? If not, do you think having such features added to python would be something that would actually be included? Thanks, Bryce Boe
On 4/25/2012 3:27 PM, Bryce Boe wrote:
Hi,
I looked through the man page for python's interpreter and appears that there is no way to properly distinguish between error messages output to stderr by the interpreter and output produced the by a user-program to stderr.
That should be a false distinction. User programs should only print error messages to stderr. Some modify error messages before they get printed. Some raise exceptions themselves with messages. The interpreter makes makes no distinction between user code, 3rd party code, and stdlib code.
What I would really like to have are two things:
1) an option to output interpreter generated messages to a specified file, whether these messages are uncatchable syntax errors, or catchable runtime errors that result in the termination of the interpreter. This feature would allow a wrapper program to distinguish between user-output and python interpreter output.
'Raw' interpreter error messages start with 'SyntaxError' or 'Traceback'. Runtime errors do not seem to go to the normal stderr channel.
2) an option to provide a structured error output in some common easy-to-parse and extendable format that can be used to associate the file, line number, error type/number in some post-processing error handler. This feature would make the parsing of error messages more deterministic, and would be of significant benefit if other compilers/interpreters also provide the same functionality in the same common format.
Exception instances have a .__traceback__ instance that is used to print the default traceback message. So it has or can generate much of what you request. I believe traceback objects are documented somewhere. Some apps wrap everything in try: run_app except Exception as e: custom_handle(e) -- Terry Jan Reedy
On Wed, Apr 25, 2012 at 2:33 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 4/25/2012 3:27 PM, Bryce Boe wrote:
2) an option to provide a structured error output in some common easy-to-parse and extendable format that can be used to associate the file, line number, error type/number in some post-processing error handler. This feature would make the parsing of error messages more deterministic, and would be of significant benefit if other compilers/interpreters also provide the same functionality in the same common format.
Exception instances have a .__traceback__ instance that is used to print the default traceback message. So it has or can generate much of what you request. I believe traceback objects are documented somewhere. Some apps wrap everything in
try: run_app except Exception as e: custom_handle(e)
The sys module also has an excepthook (1) which can be overridden to customize the exception handling. Note that it does not function with the threading module, however. (1) http://docs.python.org/library/sys.html#sys.excepthook
-- Terry Jan Reedy
______________________________**_________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>
I looked through the man page for python's interpreter and appears that there is no way to properly distinguish between error messages output to stderr by the interpreter and output produced the by a user-program to stderr.
That should be a false distinction. User programs should only print error messages to stderr. Some modify error messages before they get printed. Some raise exceptions themselves with messages. The interpreter makes makes no distinction between user code, 3rd party code, and stdlib code.
Perhaps I wasn't very clear. I want to write a tool to collect error messages when I run a program. Ideally the tool should be agnostic to what language is used and should be able to identify syntax errors, parser errors, and runtime errors. While I can parse both the stdout and stderr streams to find this information, from what I can tell there is no way to distinguish between a real syntax error (output to stderr): File "./test.py", line 5 class ^ SyntaxError: invalid syntax and a program that outputs that exact output to stderr and exits with status 1. This "channel" sharing of control (error messages) and data is a problem that affects more than just the python interpreter. I am hoping to start with python and provide a way to separate the control and data information so I can be certain that output on the "control" file descriptor is guaranteed to be generated by the interpreter.
Exception instances have a .__traceback__ instance that is used to print the default traceback message
I am aware I can obtain this information and output it however I want from my own program (less syntax errors), however, the goal is to run third party code and provide a more detailed error report. -Bryce
On Wed, 25 Apr 2012 15:06:16 -0700 Bryce Boe <bboe@cs.ucsb.edu> wrote:
Perhaps I wasn't very clear. I want to write a tool to collect error messages when I run a program. Ideally the tool should be agnostic to what language is used and should be able to identify syntax errors, parser errors, and runtime errors. While I can parse both the stdout and stderr streams to find this information, from what I can tell there is no way to distinguish between a real syntax error (output to stderr):
File "./test.py", line 5 class ^ SyntaxError: invalid syntax
and a program that outputs that exact output to stderr and exits with status 1.
Correct. And it isn't a problem, because it shouldn't matter if the programmer wrote a bit of code that caused the compiler to raise a syntax error, raised the syntax error themselves, evaled a string that raised the syntax error, or wrote the message explicitly to standard error: all those cases represent a syntax error to the programmer.
This "channel" sharing of control (error messages) and data is a problem that affects more than just the python interpreter. I am hoping to start with python and provide a way to separate the control and data information so I can be certain that output on the "control" file descriptor is guaranteed to be generated by the interpreter.
If your program is writing *data* to stderr, it's badly designed. You should fix it instead of trying to get Python changed to accommodate it. Or maybe you're trying to draw a distinction between messages purposely generated by the programmer, and messages the programmer didn't want? I think that's an artificial distinction. An error is an error is an error, and by any other name would be just as smelly. Whether it's an exception raised by the interpreter, an exception raised by the programmer, or just a message printed by the programmer really doesn't matter. Python programmers have complete control over all of this, and can make it do what they want. Trying to make distinctions based on default behaviors is misguided. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
Correct. And it isn't a problem, because it shouldn't matter if the programmer wrote a bit of code that caused the compiler to raise a syntax error, raised the syntax error themselves, evaled a string that raised the syntax error, or wrote the message explicitly to standard error: all those cases represent a syntax error to the programmer.
Perhaps you cannot envision a case where it matters because you don't work with people that intentionally try to cheat or mislead a system. I am merely pointing out that currently there is no way to distinguish between these behaviors, and I would personally like to add that support because I have a need to deterministically differentiate between them. I don't want to break backwards compatibility, so what I propose would only take place via a command line argument.
This "channel" sharing of control (error messages) and data is a problem that affects more than just the python interpreter. I am hoping to start with python and provide a way to separate the control and data information so I can be certain that output on the "control" file descriptor is guaranteed to be generated by the interpreter.
If your program is writing *data* to stderr, it's badly designed. You should fix it instead of trying to get Python changed to accommodate it.
That is a generalization which is not true. Counter example: I have students writing a simple interpreter in python and their compiler should output syntax errors to stderr when the program their interpreter is interpreting. Now say their errors looks very similar to python errors, how does one distinguish between an error in their implementation of their interpreter, or an error raised by their interpreter? Furthermore, having this separation is somewhat pointless without the structured part, as ideally I would like it if all compilers and interpreters produced similar output so I could easily measure how many errors beginning programmers have in various languages and group them by type. I have to start somewhere with this project and I was hoping the python community would be in favor of adding such support as I feel the changes are relatively trivial. -Bryce
On Thu, Apr 26, 2012 at 10:30 AM, Bryce Boe <bboe@cs.ucsb.edu> wrote:
Furthermore, having this separation is somewhat pointless without the structured part, as ideally I would like it if all compilers and interpreters produced similar output so I could easily measure how many errors beginning programmers have in various languages and group them by type. I have to start somewhere with this project and I was hoping the python community would be in favor of adding such support as I feel the changes are relatively trivial.
And we're telling you that no, the changes you're interested in are not trivial - the use of stderr is embedded deep within many parts of the interpreter. I suggest just raising the bar for your students and require that they write their errors to both stderr *and* to an error log. Interpreter generated errors will show up in the former, but will never appear in the latter. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
And we're telling you that no, the changes you're interested in are not trivial - the use of stderr is embedded deep within many parts of the interpreter.
This is the first constructive comment I've received thus far. Perhaps I am a bit optimistic that grepping for output to stderr and replacing the write or fprintf calls with a function call would be appropriate. Maybe a tedious procedure, but it still seems trivial. It's not like trying to replace the GIL ;)
I suggest just raising the bar for your students and require that they write their errors to both stderr *and* to an error log. Interpreter generated errors will show up in the former, but will never appear in the latter.
The example I gave was simply an example. Sure we could ask the students to more, but ideally this tool would work on any recent python source without requiring source modification. Many of these solutions have been presented before, however, they fail to work in the general case.
Oh, I work with such cases often enough to know that if they've got a complete programming language as a tool, you've already lost. And if the people writing the code are antagonistic, there is no way to differentiate those behaviors. Anything the python interpreter can do the programmer can also do. And, for that matter, suppress.
I realize that even if there was another output stream the user could write to it via, os.write(3, "foobar"), however, static checks can be made on the code to detect such function calls. Also, I'm curious, can a python program suppress later syntax errors?
That is a generalization which is not true. Counter example: I have students writing a simple interpreter in python and their compiler should output syntax errors to stderr when the program their interpreter is interpreting. Now say their errors looks very similar to python errors, how does one distinguish between an error in their implementation of their interpreter, or an error raised by their interpreter?
That's not writing data to stderr, that's writing errors. The problem in this case is that the program in question isn't handling errors in the implementation properly. If your students aren't bright enough to figure out how to catch errors in their implementation and flag them as such, flunk them.
Again this was just a contrived example that demonstrates my want to differentiate between the two. Whether or not the students get stuck isn't the problem, the problem is that it's not possible to do the differentiation. But simply put, why not allow for such differentiation? What is lost by doing so?
Trying to get all language processors to produce similar error messages is tilting at windmills. The existence of IDE's that parse error message and let the user go through them in order hasn't been sufficient to cause that to happen. Some abstract wish to study beginners errors will have even less effect.
It is my opinion that most people make due with what they have available to them. Of course, I can do exactly what I want to do without modifying the interpreter, however, it suffers from the ambiguity problem I've already mentioned. One of the great things about open source software is the ability to adapt it to suit your own needs, and thus I prefer to take the approach of making things better. Perhaps no one before me has even considered separating interpreter output from the programs that the interpreter interprets, but I find that quite hard to believe. This really isn't a problem with compiled code, because the compilation and type checking process is separate from the execution process, though I'll admit there is the same problem with runtime errors such as segmentation faults but these can be discovered with proper signal handling.
But you claim the structure is the import part. Want to give an example of how you would "structure the error output" so that errors in a program processing program source can be distinguished from errors in the processed source, yet at the same time be similar enough so that some tool could be used on both sets of errors?
First, the two changes should work in tandem thus both interpreters would have a flag, say --structured-error-output that takes a filename. With such a flag directing the error output to different files is quite trivial. However, even if they went to the same stream (poor design in my opinion) the structured messages can have an attribute indicating which interpreter produced the message thus allowing for differentiation. Of course, if they went to the same stream, then you still have the possibility of spoofing the other program which is why the separation is necessary. Anyway, I appreciate the argument. It is fairly clear that if I were to implement this support it is not something that would be integrated in python thus it's not worth my time. I'll take the band-aid approach as everyone before me has. Thanks, Bryce
On Wed, Apr 25, 2012 at 07:04:15PM -0700, Bryce Boe wrote:
I realize that even if there was another output stream the user could write to it via, os.write(3, "foobar"), however, static checks can be made on the code to detect such function calls.
I don't think so. Or at least, not easily. funcs = [len, str, eval, map] value = funcs[2]("__imp" + "port"[1:] + "__('sys')") f = getattr(value, ''.join(map(chr, (115, 116, 100, 101, 114, 114)))) y = getattr(f, ''.join(map(chr, (119, 114, 105, 116, 101)))) y("statically check this!\n")
Also, I'm curious, can a python program suppress later syntax errors?
try: exec "x = )(" except SyntaxError: print("nothing to see here, move along") [...]
Again this was just a contrived example that demonstrates my want to differentiate between the two. Whether or not the students get stuck isn't the problem, the problem is that it's not possible to do the differentiation. But simply put, why not allow for such differentiation? What is lost by doing so?
Apart from simplicity? You risk infinite regress. stderr exists to differentiate "good" output from "error" output. So you propose a new stream to differentiate "good errors" (those raised by the students' interpreter, call it stderr2) from "bad errors" (those raised by the interpreter running the students' interpreter). At some point, someone will have some compelling (to them, not necessarily everyone else) use-case that needs to distinguish between "real" good errors going to stderr2 and "fake" good errors going to stderr2, and propose stderr3. And deeper and deeper down the rabbit hole we go... At some point, people will just say "Enough!". I suggest that the distinction between stdout and stderr is exactly that point. (Although, having said that, I wish there was a stdinfo for informational messages that are neither the intended program output nor unintended program errors, e.g. status messages, progress indicators, etc.) -- Steven
On Thu, Apr 26, 2012 at 12:44 PM, Steven D'Aprano <steve@pearwood.info> wrote:
(Although, having said that, I wish there was a stdinfo for informational messages that are neither the intended program output nor unintended program errors, e.g. status messages, progress indicators, etc.)
And, indeed, such a channel exists: it's called the logging system, which separates event *generation* (calls to the logging message API) from event *display* (configuration of logging handlers). In particular, see the following table in the logging HOWTO guide: http://docs.python.org/howto/logging.html#when-to-use-logging Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 26/04/12 14:04, Bryce Boe wrote:
Perhaps I am a bit optimistic that grepping for output to stderr and replacing the write or fprintf calls with a function call would be appropriate.
I don't see how that would solve your problem anyway. If your student's interpreter code, written in Python, raises e.g. a TypeError, and nothing catches it, the error message for it will get printed by the very same printf call as any other uncaught exception. Seems to me the solution to your problem lies in sandboxing the student's code inside something that catches any exceptions emanating from it and logs them in a distinctive way. You could also replace sys.stdout and sys.stderr with objects that perform a similar function. -- Greg
Bryce Boe <bboe@cs.ucsb.edu> wrote:
I realize that even if there was another output stream the user could write to it via, os.write(3, "foobar"), however, static checks can be made on the code to detect such function calls. Also, I'm curious, can a python program suppress later syntax errors?
Yes, a python program can supporess later syntax errors.
This really isn't a problem with compiled code, because the compilation and type checking process is separate from the execution process
For sorting out error messages in modern languages, compilation and execution are not necessarily separate. You can get compilation errors at pretty much any point in the execution of the program. Most such languages - include python, but also include languages that compile to machine or JVM code - include both the ability to import uncompiled source, compiling it along the way, and the ability to compile and run code fragments (aka "eval") in the program. Both of these can generate compilation errors in the middle of runtime. If my proram imports a config modules the user provided and it has a syntax error in it, is that syntax error a runtime error or a compilation error?
But you claim the structure is the import part. Want to give an example of how you would "structure the error output" so that errors in a program processing program source can be distinguished from errors in the processed source, yet at the same time be similar enough so that some tool could be used on both sets of errors?> First, the two changes should work in tandem thus both interpreters would have a flag, say --structured-error-output that takes a filename. With such a flag directing the different errors to different files is quite trivial.
It is? How do you distinguish between an actual syntax error and a syntax error raised by an explicit raise statement? And which of those two cases would a syntax error raised by passing a bad code fragment to eval be, or is that a third case requiring yet another flag?
Anyway, I appreciate the argument. It is fairly clear that if I were to implement this support it is not something that would be integrated in python thus it's not worth my time. I'll take the band-aid approach as everyone before me has.
I'm still waiting for a proposal solid enough to evaluate. I like the idea of more structured error output, and think it might be a nice addition to the interpreter. A python programmer already has complete control over all error messages, though it can be hard to get to. Making that easier is a worthy goal, but it's got to be more than the ability to send some ill-defined set of exceptions to a different output stream. <mike -- Sent from my Android tablet. Please excuse my swyping.
participants (7)
-
Bryce Boe -
Chris Kaynor -
Greg Ewing -
Mike Meyer -
Nick Coghlan -
Steven D'Aprano -
Terry Reedy