Catching "return" and "return expr" at compile time

Attached is a context diff against the latest version of Python/compile.c that checks at compile time for functions that both return expressions or execute return statements with no expression (or equivalently, fall off the end of the function). I figured I'd post it here to get a little friendly feedback and bug discovery before shooting it off to c.l.py. I modified compile.c instead of some preexisting PyLint script because I don't know what's popular out there. On the other hand, I'm sure most people who would be interested in this sort of thing have access to the C source...
The basic idea is that each straight line chunk of code is terminated one of four ways:
1. return with no expression 2. return an expression 3. raise an exception 4. fall off the end of the chunk
Falling off the end of the function is obviously treated like return with no expression. (This is, after all, what motivated me to do this. ;-)
This information is recorded in a new bit vector added to the struct compiling object that's carried around during the compilation. Compound statements simply aggregate the bit vectors for their various clauses in ways appropriate to their semantics.
At the end of a function's compilation, the set of return bits computed up to that point tells you whether or not to spit out a warning. Note that it does nothing to recognize constant expressions. The following function will generate a warning:
def f(): i = 0 while 1: i = i + 1 if i > 10: return i
even though the only way to return from the function is the return statement. To get the above to shut up the compiler you'd have to do something like
class CantGetHere: pass
def f(): i = 0 while 1: i = i + 1 if i > 10: return i raise CantGetHere
Raise statements are treated as a valid way to "return" from a function. Registering them as separate styles of returns serves effectively to turn off the "no return" bit for a block of code. Raise is compatible with either form of return, though they aren't compatible with each other.
The code is run whenever a module is compiled. I didn't bother to add a new flag to the python interpreter to enable/disable warnings during compilation, though a -w switch for Python has been mentioned before.
I ran the modified byte code compiler over a silly test module as well as executing
./python Lib/test/regrtest.py ./python Lib/compileall.py
It uncovered some interesting programming practices and one item I think is an actual bug. In Lib/gzip.py, GzipFile._read returns EOFError at one point instead of raising it. At other points in the method it raises EOFError. There are no other return statements in the function. (I haven't taken the time/had the nerve to run it against my own Python code yet. ;-)
I'm firmly of the opinion that more subtle bugs exist in the way people write functions that return and raise values than in the code that calls those functions, contrary to a few vocal folks on c.l.py who may believe otherwise.
Enjoy,
Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/~skip/ 847-971-7098 | Python: Programming the way Guido indented...

This is a valuable service! Even though I'm sure that it will cause some pain for people who were used to this programming style...
I'm not sure I like the fact that you can't turn it off -- traditionally, Python has had a "no warnings" policy. That has been diluted a bit (python -t prints warnings) but so far it has been the default.
I'm wondering if we should introduce a general '-w' flag to turn on warnings like this (which would subsume -t)? Or perhaps there should be a -W flag ("no warnings") and warnings should be the default?
There are also platform problems, e.g. on the Mac, stderr doesn't always exist, and on Windows, it doesn't exist if pythonw.exe is used...
--Guido van Rossum (home page: http://www.python.org/~guido/)

On 07 September 1999, Guido van Rossum said:
This is a valuable service! Even though I'm sure that it will cause some pain for people who were used to this programming style...
I'm not sure I like the fact that you can't turn it off -- traditionally, Python has had a "no warnings" policy. That has been diluted a bit (python -t prints warnings) but so far it has been the default.
I'm wondering if we should introduce a general '-w' flag to turn on warnings like this (which would subsume -t)? Or perhaps there should be a -W flag ("no warnings") and warnings should be the default?
Yes yes yes! While adding "-w" is a long way from having a comprehensive set of compile-time warnings in place, it at least means that someone is *thinking* about it.
Also, I would suggest that there should be some standard internal mechanism for reporting errors rather than just calling 'PySys_WriteStderr()'. Something as simple as this would probably do the trick:
void Py_Warning (char *filename, int line, char *msg) { if (on_a_platform_where_stderr_means_something) PySys_WriteStderr ("warning: file %s, line %d: %s", filename, line, msg); else do_whatever_it_takes_for_this_platform(); }
Well, you get the idea. I make no claim that this is an appropriate name for this function, nor do I have anything to say about where it should live. It should also be smart about unknown filename or line number (eg. skip filename if filename == NULL, skip line number if line == -1).
Oh, and of course we'll need to add a global variable $^W so that programmers can turn run-time warnings on and off as needed. *duck* Maybe sys.show_warnings? ;-) (Of course, that's assuming a run-time warning system in addition to the compile-time warnings of -t and Skip's patch.)
Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913

Greg Ward wrote:
On 07 September 1999, Guido van Rossum said:
This is a valuable service! Even though I'm sure that it will cause some pain for people who were used to this programming style...
I'm not sure I like the fact that you can't turn it off -- traditionally, Python has had a "no warnings" policy. That has been diluted a bit (python -t prints warnings) but so far it has been the default.
I'm wondering if we should introduce a general '-w' flag to turn on warnings like this (which would subsume -t)? Or perhaps there should be a -W flag ("no warnings") and warnings should be the default?
Yes yes yes! While adding "-w" is a long way from having a comprehensive set of compile-time warnings in place, it at least means that someone is *thinking* about it.
I would recommend no warnings by default, and -Wfeature to add specific types of warnings. This pattern follows that used by gcc (well, gcc has *some* warnings by default). Rather than invent a new set of switches, I'd rather steal an existing semantic :-)
Also, I would suggest that there should be some standard internal mechanism for reporting errors rather than just calling 'PySys_WriteStderr()'. Something as simple as this would probably do
Why? Why not just use PySys_WriteStdErr() as your requested function? It can easily determine "oops. no stderr. let's do something else."
... Maybe sys.show_warnings? ;-) (Of course, that's assuming a run-time warning system in addition to the compile-time warnings of -t and Skip's patch.)
There is no such thing as run-time vs compile-time warnings. You always have a compiler at run-time, and it can be used at any time. Therefore, you just have "(compilation) warnings" (I could imagine that people will come up with other kinds of warnings once the feature is provided).
I would suggest sys.warnings be a dictionary.
python -Wbad-return -Wlines-per-func=50
print sys.warnings
{'bad-return': None, 'lines-per-func': '50'}
Cheers, -g
-- Greg Stein, http://www.lyra.org/

On 07 September 1999, Greg Stein said:
Also, I would suggest that there should be some standard internal mechanism for reporting errors rather than just calling 'PySys_WriteStderr()'. Something as simple as this would probably do
Why? Why not just use PySys_WriteStdErr() as your requested function? It can easily determine "oops. no stderr. let's do something else."
Hmm, that makes sense for the "what's the local equivalent of stderr?" determination. Probably that actually belongs in mywrite() (the static function in Python/sysmodule.c that PySys_WriteStdout() and PySys_WriteStderr() both invoke), so that the same thing can be done for stdout and stderr.
However, I still think a separate function for printing source-code-based warnings is a good idea. This is mainly so that the association from
(filename, line_number, message)
to
"warning: file %s, line %s: %s" % (filename, line_number, message)
is done in *one* place, rather than everywhere a warning message is generated. For instance, platforms that don't have stderr, but instead pop up a window with all your compile-time warnings nicely formatted, could take advantage of knowing the filename and line number separately to nicely format those warnings. (Of course, this argues *against* putting the "what's the local equivalent of stderr?" determination in the low-level mywrite() function... arg...)
Maybe sys.show_warnings? ;-) (Of course, that's assuming a run-time warning system in addition to the compile-time warnings of -t and Skip's patch.)
There is no such thing as run-time vs compile-time warnings. You always have a compiler at run-time, and it can be used at any time. Therefore, you just have "(compilation) warnings" (I could imagine that people will come up with other kinds of warnings once the feature is provided).
Well, currently that's true, since currently Python's only warning is the tab warning from -t -- clearly a compile-time warning. (Is this true? I'm no expert on the internals, but I've certainly not seen any other warnings from Python, and I've included plenty of bugs in my code -- umm, just seeing if it would catch them, yeah that's it...)
However, one could certainly envision a world where Python issues runtime warnings. If my time machine were working, I'd zip back and suggest to Guido that mistakes with the % operator should issue warnings rather than raising exceptions. (Ignore the language philosophy issue and presume this would be worthwhile.)
There are probably other situations where, ignoring past history and language philosophy, it would make sense to issue a warning and march on ahead rather than blowing up immediately. Sometimes Python has a bit of an itchy trigger finger for that ol' TypeError...
Anyways, the focus should probably be on compile-time warnings: I can't think of any major runtime errors offhand that Python currently does *nothing* about, so there's not a great need to go scattering the code with runtime warnings. But they *are* a theoretical possibility, and there *is* a difference with compile-time warnings.
I would suggest sys.warnings be a dictionary.
python -Wbad-return -Wlines-per-func=50
print sys.warnings
{'bad-return': None, 'lines-per-func': '50'}
Makes sense -- true to warn (possibly giving some extra meaning to "truth", as in this example), and false to not warn. Or maybe None to not warn, not-None to warn. Of course, if there are only compile-time warnings, then modifying sys.warnings will only affect future imports, execs, evals, etc.
Greg

[ ...Lot's of stuff about details of warning implementation snipped... ]
That's why I only provided the code to check for inconsistent use of returns, not the flag to turn it on and off!
In a message I accidentally sent only to Guido and myself on the subject, I outlined my take on things, which really does about exhaust my knowledge/interest on how/when to enable warnings:
Guido> I'm not sure I like the fact that you can't turn it off -- Guido> traditionally, Python has had a "no warnings" policy. That has Guido> been diluted a bit (python -t prints warnings) but so far it has Guido> been the default.
The only reason for not being able to turn it off was that would require introducing some sort of -w flag, which wasn't the point of the exercise. We can have the -w/-t/-W discussion now. I haven't any particular opinion on the best way to do it, although I would much prefer it be a run-time as opposed to compile-time option. One other issue might be whether or not to ignore an existing .pyc file and always recompile .py's if warnings are enabled. Of course, we're still all adults here (I think), so perhaps it's sufficient to remind people in the docs to delete the desired .pyc files before running with warnings enabled.
Guido> I'm wondering if we should introduce a general '-w' flag to turn Guido> on warnings like this (which would subsume -t)? Or perhaps there Guido> should be a -W flag ("no warnings") and warnings should be the Guido> default?
-w sounds fine to me.
Guido> There are also platform problems, e.g. on the Mac, stderr doesn't Guido> always exist, and on Windows, it doesn't exist if pythonw.exe is Guido> used...
Perhaps on those platforms a file could be opened in a standard location to catch stderr (I presume you can detect the lack of stderr at run-time?). While that would force some (more) Unix conventions on programmers on those platforms, it would also provide more cross-platform uniformity.
Skip

Perhaps on those platforms a file could be opened in a standard location to catch stderr (I presume you can detect the lack of stderr at run-time?).
Not really - there are 2 scenarios here. pythonw.exe, for example, always has a valid stdout handle - it just goes nowhere. When Python is embedded in certain COM servers (such as ASP), the stdout handle is invalid - operations on it will fail (perversely, this also means a single "print" statement in your Python code can raise an exception and make your code fail - and seeing as print statements are the debugging state of the art at the moment, this is less than ideal - but I digress)
So Im not sure we can check this reasonably at runtime - invalid handles are easy, but valid handles that go nowhere useful (as in pythonw.exe, and therefore the majority of cases we care about) is obviously difficult.
OTOH, pythonw.exe doesnt print tracebacks either. Although not ideal, people arent loudly complaining about this - they know to develop and debug using python.exe. As the warnings we are discussing are compile time warnings, we could simply document that they should run "compileall" over their scripts to generate the warnings before attempting to embed it in some sort of wierd system.
On my third hand, I would _really_ like to see this in a lint tool rather than in the core. I realize there is no such tool at the moment, but IMO is where we should be heading. Skip's return statement warnings are fine and a nice addition, but in my experience account for a trivial number of my errors. Stuff like warning about a variable name used only once, for example, will probably never get into core Python but in my opinion is far more valuable. So adding this "-w" switch is fine, but still doesnt give us the framework we need to actually create a truly useful package of warnings for the Python developer.
[And I am slowly and painfully starting work in this - a lint tool based on the Python parser module. Dont hold your breath though :-]
Mark.

Perhaps on those platforms a file could be opened in a standard location to catch stderr (I presume you can detect the lack of stderr at run-time?).
Not really - there are 2 scenarios here. pythonw.exe, for example, always has a valid stdout handle - it just goes nowhere. When Python is embedded in certain COM servers (such as ASP), the stdout handle is invalid - operations on it will fail (perversely, this also means a single "print" statement in your Python code can raise an exception and make your code fail - and seeing as print statements are the debugging state of the art at the moment, this is less than ideal - but I digress)
So Im not sure we can check this reasonably at runtime - invalid handles are easy, but valid handles that go nowhere useful (as in pythonw.exe, and therefore the majority of cases we care about) is obviously difficult.
OTOH, pythonw.exe doesnt print tracebacks either. Although not ideal, people arent loudly complaining about this - they know to develop and debug using python.exe. As the warnings we are discussing are compile time warnings, we could simply document that they should run "compileall" over their scripts to generate the warnings before attempting to embed it in some sort of wierd system.
Hmm... Perhaps pythonw.exe could use freopen() to point stdout and stderr to a log file in a temp directory? The wizards will know where to look...
On my third hand, I would _really_ like to see this in a lint tool rather than in the core. I realize there is no such tool at the moment, but IMO is where we should be heading. Skip's return statement warnings are fine and a nice addition, but in my experience account for a trivial number of my errors. Stuff like warning about a variable name used only once, for example, will probably never get into core Python but in my opinion is far more valuable. So adding this "-w" switch is fine, but still doesnt give us the framework we need to actually create a truly useful package of warnings for the Python developer.
[And I am slowly and painfully starting work in this - a lint tool based on the Python parser module. Dont hold your breath though :-]
Eventually, I also plan to have some kind of lint in IDLE. If the CP4E money comes, I'll start working on that for earnest...
--Guido van Rossum (home page: http://www.python.org/~guido/)

Guido> Eventually, I also plan to have some kind of lint in IDLE. If Guido> the CP4E money comes, I'll start working on that for earnest...
Speaking of which, just where *is* IDLE? I get the Python source via CVS, but I'll be damned if I have anything called idle.py or IDLE.py or even anything that matches "*idle*" or "*IDLE*" glob patterns. I just executed "cvs update -A ." from the top of my tree and checked again. Still nothing. Is it a separate module from the main Python source?
Thx,
Skip

Skip Montanaro writes:
Speaking of which, just where *is* IDLE? I get the Python source via CVS, but I'll be damned if I have anything called idle.py or IDLE.py or even
Look in Tools/idle/
-Fred
-- Fred L. Drake, Jr. fdrake@acm.org Corporation for National Research Initiatives

Mark> On my third hand, I would _really_ like to see this in a lint tool Mark> rather than in the core. I realize there is no such tool at the Mark> moment, but IMO is where we should be heading. Skip's return Mark> statement warnings are fine and a nice addition, but in my Mark> experience account for a trivial number of my errors. Stuff like Mark> warning about a variable name used only once, for example, will Mark> probably never get into core Python but in my opinion is far more Mark> valuable. So adding this "-w" switch is fine, but still doesnt Mark> give us the framework we need to actually create a truly useful Mark> package of warnings for the Python developer.
I'm not sure the stuff I wrote belongs in the core either, certainly not in C code. As I mentioned when I posted it though, I wasn't sure where a PyLint type program already existed that I could simply graft onto. I've fiddled around enough with the compile.c code in the past couple of years that I understand it fairly well already.
I do have some Python code that does peephole optimization on Python bytecode. I could have put it in there (it already divides functions into basic blocks), but again, not many people have it laying about to play with.
Can we start/settle on a Python-based source code framework for this sort of thing? Ideally, I'd like to see a framework that brings the parser module's output up to a level where mere mortals like me can reason about Python code.
Skip

Skip Montanaro writes:
thing? Ideally, I'd like to see a framework that brings the parser module's output up to a level where mere mortals like me can reason about Python
This was exactly what I wanted to prevent when I created the parser module! ;-) I think a wrapper that simplifies the parse tree wouldn't be too hard to do; you simply have to be sure that the simplified version can be re-elaborated to pass back to the byte-code compiler via parser.sequence2ast(<seq>).compile(). Otherwise you can't modify the tree without loosing line number information, which would be nice to keep around!
-Fred
-- Fred L. Drake, Jr. fdrake@acm.org Corporation for National Research Initiatives

Fred L. Drake, Jr. wrote:
Skip Montanaro writes:
thing? Ideally, I'd like to see a framework that brings the parser module's output up to a level where mere mortals like me can reason about Python
This was exactly what I wanted to prevent when I created the parser module! ;-) I think a wrapper that simplifies the parse tree wouldn't be too hard to do; you simply have to be sure that the simplified version can be re-elaborated to pass back to the byte-code compiler via parser.sequence2ast(<seq>).compile(). Otherwise you can't modify the tree without loosing line number information, which would be nice to keep around!
This has already been done. Grab the Python2C distribution from http://www.mudlib.org/~rassilon/p2c/. There is a module named "transformer.py" which does just what you're thinking -- it converts Python's deeply-nested trees into something human-readable. Each of the resulting node types are doc'd at the top of the module.
It is also over a couple years old, so it has had some decent debugging/stabilization.
Cheers, -g
-- Greg Stein, http://www.lyra.org/

Can we start/settle on a Python-based source code framework for this sort of thing? Ideally, I'd like to see a framework that brings the parser module's output up to a level where mere mortals like me can reason about Python code.
Actually, I struggled with this a _lot_, then found that P2C has a module called "transform" which flattens the parse tree down to something I can understand (which is good :-)
I could simply attach it, but it is grafted to P2C IIRC. If there is interest I will rip it out...
Mark.

Mark Hammond wrote:
... Actually, I struggled with this a _lot_, then found that P2C has a module called "transform" which flattens the parse tree down to something I can understand (which is good :-)
I could simply attach it, but it is grafted to P2C IIRC. If there is interest I will rip it out...
transformer.py operates quite independently.
It is a relatively large module, though (33k), so I would recommend people just grab the P2C distribution from http://www.mudlib.org/~rassilon/p2c/.
I have started to put together a page at http://www.lyra.org/greg/python/ that includes the various modules that I've "published". I'll get transformer over there in the next day or two.
Cheers, -g
-- Greg Stein, http://www.lyra.org/

Oops - for some reason I thought we were on Python-help where Greg doesnt hang out...
transformer.py operates quite independently.
It does - however, the only simple way to see what it does is to use P2C. All I had in mind was a simple "if __name__=='__main__': block to demonstrate its output.
Mark.

[Mark Hammond]
... On my third hand, I would _really_ like to see this in a lint tool rather than in the core. I realize there is no such tool at the moment, but IMO is where we should be heading.
Following the lead taken by other modern languages, like javalint, c++lint, perllint, dylanlint and even vblint <wink>? C lint was a hack needed due to the combination of bad language design choices and poor compilers, but C compilers are smarter now than lint ever was. Who still uses lint? It's dead, and it's not missed.
Skip's return statement warnings are fine and a nice addition, but in my experience account for a trivial number of my errors. Stuff like warning about a variable name used only once, for example, will probably never get into core Python but in my opinion is far more valuable.
The notion that a valuable idea will never get into the core is disturbing. I don't really care how it's implemented, but a *visibly* separate "checking" tool is bad UI, one that even the C community left behind with relief.
So adding this "-w" switch is fine, but still doesnt give us the framework we need to actually create a truly useful package of warnings for the Python developer.
No, but adding the "-W" <wink> switch does give us the means by which (perhaps the illusion of) "a" smarter compiler can be invoked & controlled.
[And I am slowly and painfully starting work in this - a lint tool based on the Python parser module. Dont hold your breath though :-]
Aaron W has had a capable pylint tool for a couple years, & it remains virtually unknown; and, far as I can tell, Aaron reciprocated the lack of interest by dropping active development.
So why was C lint successful in its day while every crack at pylint flops (btw, yours will too <0.5 wink>)? I think it's two sides of the same coin: C lint found dozens of deadly problems that infested almost all C code (remember the pre-prototype days?). Versions of pylint offer very little beyond pointing out unique vrbl names, perhaps indentation checking, and ...? I'm drawing a blank here. I suppose they should strive to give better msgs for runaway triple-quoted strings. What else? Skip's "return" checker, and far as I can tell then we're already at the point of diminishing returns.
My claim is that pylints don't get used both because they're a separate step, and because the odds of them catching something interesting are comparatively tiny. Python simply doesn't have many errors that *can* be caught at compile-time. It's like me firing up the spell-checker at this point to verify "compile-time" -- the expected payoff is negative.
There's little enough useful a pylint could do that a mod to add those few smarts to the core would be a fraction of the size & effort of yet another separate tool. Better, in the core, it would actually do people some good because it would actually get used.
Which specific problems do you expect your lint tool to uncover? Perhaps there's a world of mechanically-checkable Python errors I haven't yet bumped into.
you-windows-guys-write-strange-code<wink>-ly y'rs - tim

[Tim laments the death of lint for C :-]
The notion that a valuable idea will never get into the core is disturbing.
Agreed. I based my assesment simply on my perception of what is likely to happen, not my opinion of what _should_ happen. I do agree that it is far far preferable for Python itself to be capable of issuing these warnings, and if Guido feels that is the best direction then it would be very cool. Only Guido can state if he would support such efforts, and probably should right about now (the funk soul brother - sorry - just got the Fat Boy Slim CD, and its going around in my head :-)
Aaron W has had a capable pylint tool for a couple years, & it remains virtually unknown; and, far as I can tell, Aaron reciprocated the lack of interest by dropping active development.
Which tends to be the biggest problem with it. A number of people have tried to use it, but often get stymied by the lack of 1.5.?isms - ie, "raise" (ie re-raise) and "assert". It bombs at these statements, and there is some real magic I didnt want to understand. Aaron agrees that a parser module based one would be better.
But your original point still remains - I agree having Python do this is a better solution all round.
(remember the pre-prototype days?). Versions of pylint offer very little beyond pointing out unique vrbl names, perhaps indentation checking, and ...? I'm drawing a blank here. I suppose they should strive to give better msgs for runaway triple-quoted strings. What else? Skip's "return" checker, and far as I can tell then we're already at the point of diminishing returns.
Agreed. However, all of these would be very valuable and account for the vast majority of my errors.
you-windows-guys-write-strange-code<wink>-ly y'rs - tim
Only cos we use a strange OS <wink>
Mark.

Agreed. I based my assesment simply on my perception of what is likely to happen, not my opinion of what _should_ happen. I do agree that it is far far preferable for Python itself to be capable of issuing these warnings, and if Guido feels that is the best direction then it would be very cool. Only Guido can state if he would support such efforts, and probably should right about now (the funk soul brother - sorry - just got the Fat Boy Slim CD, and its going around in my head :-)
I agree it should happen, and Tim's argument about keeping lint and compiler together is a convincing one.
What stands in the way?
(a) There's so much else to do...
(b) *Someone* needs to design a good framework for spitting out warnings, and for giving the programmer a way to control which warnings to ignore. I've seen plenty of good suggestions here; now somebody should simply go off and come up with a proposal too good to refuse.
(c) I have a different agenda with CP4E -- I think it would be great if the editor could do syntax checking and beyond *while you type*, like the spell checker in MS Word. (I *like* Word's spell checker, even though I hate the style checker [too stupid], and I gladly put up with the spell checker's spurious complaints -- it's easy to turn off, easy to ignore, and it finds lots of good stuff.)
Because the editor has to deal with incomplete and sometimes ungrammatical things, and because it has to work in real time (staying out of the way when you're frantically typing, catching up when your fingers take a rest), it needs a different kind of parser.
But that's another project, and once the Python core has a warning framework in place, I'm sure we'll find more things that are worth warning about.
I'm not always in agreement with Tim Peters when he says that Python is so dynamic that it's impossible to check for certain errors. It may be impossible to say *for sure* that something is an error, but there sure are lots of situations where you're doing something that's *likely* to be an error.
E.g. if the compiler sees len(1), and there's no local or global variable named len, it *could* be the case that the user has set up a parallel universe where the len() built-in accepts an integer argument, but more likely it's an honest mistake, and I would like to get a warning about it.
The hard part here is to issue this warning for len(x) where x is some variable or expression that is likely to have a non-sequence value (barring alternate universes); this might require global analysis that's hard or expensive enough that we can't put it in the core (yet). This may be seen as an argument for a separate lint...
--Guido van Rossum (home page: http://www.python.org/~guido/)

[MarkH]
... I based my assesment simply on my perception of what is likely to happen, not my opinion of what _should_ happen.
I based mine on what Guido was waiting for someone to say <wink>.
We worry too much about disagreeing here; different opinions are great! Guido will squash the ones he can't stand anyway.
[about Aaron's pylint's lack of 1.5.2 smarts]
... Aaron agrees that a parser module based one would be better.
You can't beat a real parse, no argument there. Luckily, the compiler parses too.
[Guido]
What stands in the way?
(a) There's so much else to do...
How did Perl manage to attract 150 people with nothing to do except hack on Perl internals? "Wow, that code's such a mess I bet even *I* could get something into it" <0.6 wink>.
(b) *Someone* needs to design a good framework for spitting out warnings, and for giving the programmer a way to control which warnings to ignore. I've seen plenty of good suggestions here; now somebody should simply go off and come up with a proposal too good to refuse.
The response has been ... absent. Anyone doing this? I liked JimF's push to make cmd-line options available to Python programs too. Somehow they seem related to me.
(c) I have a different agenda with CP4E -- I think it would be great if the editor could do syntax checking and beyond *while you type*, like the spell checker in MS Word. (I *like* Word's spell checker, even though I hate the style checker [too stupid], and I gladly put up with the spell checker's spurious complaints -- it's easy to turn off, easy to ignore, and it finds lots of good stuff.)
Because the editor has to deal with incomplete and sometimes ungrammatical things, and because it has to work in real time (staying out of the way when you're frantically typing, catching up when your fingers take a rest), it needs a different kind of parser.
Different from what? Python's own parser for sure. IDLE has at least two distinct parsers of its own that have nothing in common with Python's parser and little in common with each other. Using the horrid tricks in PyParse.py, it may even be possible to write the kind of parser you need in Python and still have it run fast enough.
For parsing-on-the-fly from random positions, take my word for it and Barry's as insurance <wink>: the single most frequent question you need to have a fast and reliable answer for is "is this character in a string?". Unfortunately, turns out that's the hardest question to answer too. The next one is "am I on a continuation line, and if so where's the start?". Given rapid & bulletproof ways to answer those, the rest is pretty easy.
But that's another project, and once the Python core has a warning framework in place, I'm sure we'll find more things that are worth warning about.
That was frequently predicted for various pylint projects too <wink>.
I'm not always in agreement with Tim Peters when he says that Python is so dynamic that it's impossible to check for certain errors. It may be impossible to say *for sure* that something is an error, but there sure are lots of situations where you're doing something that's *likely* to be an error.
We have no disagreement there. What a compiler does must match the advertised semantics of the language-- or its own advertised deviations from those --without compromise. A warning system has no such constraint; to the contrary, in the case of a semantic mess like Perl, most of its value is in pointing out *legal* constructs that are unlikely to work the way you intended.
E.g. if the compiler sees len(1), and there's no local or global variable named len, it *could* be the case that the user has set up a parallel universe where the len() built-in accepts an integer argument, but more likely it's an honest mistake, and I would like to get a warning about it.
Me too. More: I'd also like to get a warning for *having* a local or global variable named len! Masking the builtin names is simply bad practice, and is also usually an honest mistake.
BTW, I was surprised that the most frequent gotcha among new Python users at Dragon turned out to be exactly that: dropping a "len" or a "str" or whatever (I believe len, str and list were most common) into their previously working code-- because they just learned about that builtin --and getting runtime errors as a result. That is, they already had a local var of that name, and forgot. Then they were irked that Python didn't nag them from the start (with a msg they understood, of course).
The hard part here is to issue this warning for len(x) where x is some variable or expression that is likely to have a non-sequence value (barring alternate universes); this might require global analysis that's hard or expensive enough that we can't put it in the core (yet). This may be seen as an argument for a separate lint...
Curiously, Haskell is statically type-safe but doesn't require declarations of any kind -- it does global analysis, and has a 100% reliable type inference engine (the language was, of course, designed to make this true). Yet I don't think I've ever seen a Haskell program on the web that didn't explicitly declare the type of every global anyway. I think this is natural, too: while it's a PITA to declare the type of every stinking local that lives for two lines and then vanishes, the types of non-local names aren't self-evident: type decls really help for them.
So if anyone is thinking of doing the kind of global analysis Guido mentions here, and is capable of doing it <wink>, I'd much rather they put their effort into optional static type decls for Python2. Many of the same questions need to be answered either way (like "what's a type?", and "how do we spell a type?" -- the global analysis warnings won't do any good if you can't communicate the substance of an error <wink>), and optional decls are likely to have bigger bang for the buck.
[Skip Montanaro]
... Perl's experience with -w seems to suggest that it's best to always enable whatever warnings you can as well.
While that's my position, I don't want to oversell the Perl experience. That language allows so many goofy constructs, and does so many wild guesses at runtime, that Perl is flatly unusable without -w for non-trivial programs. Not true of Python, although the kinds of warnings people have suggested so far certainly do seem worthy of loud complaint by default.
(More and more I see people using gcc's -Wall flag as well.)
If you have to write portable C++ code, and don't enable every warning you can get on every compiler you have, and don't also turn on "treat warnings as errors", non-portable code will sneak into the project rapidly. That's my experience, over & over. gcc catches stuff MS doesn't, and vice versa, and MetroWerks yet another blob, and platform-specific cruft *still* gets in. It's irksome.
Now, my return consistency stuff was easy enough to write in C for two reasons. One, I'm fairly comfortable with the compile.c code.
I don't anticipate dozens of people submitting new warning code. It would be unprecendented if even two of us decided this was our thing. If would be almost unprecendented if even one of us followed up on it <0.6 wink>.
Two, adding my checks required no extra memory management overhead.
Really good global analysis likely requires again as much C code as already exists. Luckily, I don't think putting in some warnings requires that all conceivable warnings be implemented at once <wink>. For stuff that complex, I'd rather make it optional and write it in Python; I don't believe any law prevents the compiler from running Python code.
Consider a few other checks you might conceivably add to the byte code compiler:
* tab nanny stuff (already enabled with -t, right?)
Very timidly, yes <wink>. Doesn't complain by default, and you need -tt to make it an error. Only catches 4 vs 8 tab size ambiguities, but that's good enough for almost everyone almost all the time.
* variables set but not used * variables used before set
These would be wonderful. The Perl/pylint "gripe about names unique in a module" is a cheap approximation that gets a surprising percentage of the benefit for the cost of a dict and an exception list.
If all of this sort of stuff is added to the compiler proper, I predict a couple major problems will surface:
* The complexity of the code will increase significantly, making it harder to maintain and enhance
The complexity of the existing code should be almost entirely unaffected, because non-trivial semantic analysis belongs in a new subsystem with its own code.
* Fewer and fewer people will be able to approach the code, making it less likely that new checks are added
As opposed to what? The status quo, with no checks at all? Somehow, facing the prospect of *some* checks doesn't frighten me away <wink>. Besides, I don't buy the premise: if someone takes this on as their project, worrying that they'll decline to add new valuable checks is like MarkH worrying that I wouldn't finish adding full support for stinking tabs to the common IDLE/PythonWin editing components. People take pride in their hackery.
* Future extensions like pluggable virtual machines will be harder to add because their byte code compilers will be harder to integrate into the core
If you're picturing adding this stuff sprayed throughout the guts of the existing com_xxx routines, we've got different pictures in mind.
Semantic analysis is usually a pass between parsing and code generation, transforming the parse tree and complaining about source it thinks is fishy. If done in any conventional way, it has no crosstalk at all with either the parsing work that precedes it or the code generation that follows it. It's a pipe stage between them, whose output is of the same type as its input. That is, it's a "pluggable component" in its own right, and doesn't even need to be run. So potential for interference just isn't there.
At present, Python is very unusual both in:
1) Having no identifiable semantic pass at all, parsing directly to byte code, and enforcing its few semantic constraints (like "'continue' not properly in loop") lumped in with both of those.
and
2) Having one trivial optimization pass-- 76 lines of code instead of the usual 76,000 <wink> --running after the byte code has been generated. However, the sole transformation made here (distinguishing local from non-local names) is much more properly viewed as being a part of semantic analysis than as being "an optimization". It's deducing trivial info about what names *mean* (i.e., basic semantics), and is called optimization here only because Python didn't do it at first.
So relating this to a traditional compiler, I'd say that "optimize()" is truly Python's semantic analysis pass, and all that comes before it is the parsing pass -- a parsing pass with output in a form that's unfortunately clumsy for further semantic analysis, but so it goes. The byte code is such a direct reflection of the parse tree that there's really little fundamental difference between them.
So for minimal disruption, I'd move "optimize" into a new module and call it the semantic analysis pass, and it would work with the byte code. Just as now, you wouldn't *need* to call it at all. Unlike now, the parsing pass probably needs to save away some more info (e.g., I don't *think* it keeps track of what all is in a module now in any usable way).
For Python2, I hope Guido adopts a more traditional structure (i.e., parsing produces a parse tree, codegen produces bytecode from a parse tree, and other tree->tree transformers can be plugged in between them). Almost all compilers follow this structure, and not because compiler writers are unimaginative droids <wink>. Compile-time for Python isn't anywhere near being a problem now, even on my creaky old 166MHz machine; I suspect the current structure reflects worry about that on much older & slower machines.
Some of the most useful Perl msgs need to be in com_xxx, though, or even earlier. The most glaring example is runaway triple-quoted strings. Python's "invalid token" pointing at the end of the file is maddeningly unhelpful; Perl says it looks like you have a runaway string, and gives the line number it thinks it may have started on. That guess is usually correct, or points you to what you *thought* was the end of a different string. Either way your recovery work is slashed. (Of course IDLE is even better: the whole tail of the file changes to "string color", and you just need to look up until the color changes!)
In addition, more global checks probably won't be possible (reasoning
about
code across module boundaries for instance) because the compiler's view of the world is fairly narrow.
As above, I don't think there's enough now even to analyze one module in isolation.
I think lint-like tools should be implemented in Python (possibly with the support of an extension module for performance-critical sections) which is then called from the compiler proper under appropriate circumstances (warnings enabled, all necessary modules importable, etc).
I have no objection to that. I do object to the barely conceivable getting in the way of the plainly achievable, though -- the pylint tools out there now, just like your return consistency checker, do a real service already without any global analysis. Moving that much into the core (implemented in Python if possible, recoded in C if not) gets a large chunk of the potential benefit for maybe 5% of the eventual work.
It's nice that Guido is suddenly keen on global analysis too, but I don't see him volunteering to do any work either <wink>.
I believe the code would be much more easily maintained and extended.
If it's written in Python, of course.
You'd be able to swap in a new byte code compiler without risking the loss of your checking code.
I never understood this one; even if there *were* a competing byte code compiler out there <0.1 wink>, keeping as much as possible out of com_xxx should render it a non-issue. If I'm missing your point and there is some fundamental conflict here, fine, then it's another basis on which bytecode compilers will compete.
more-concerned-about-things-that-exist-than-things-that-don't-ly y'rs - tim

Skip> * Future extensions like pluggable virtual machines will be harder Skip> to add because their byte code compilers will be harder to Skip> integrate into the core
Tim> If you're picturing adding this stuff sprayed throughout the guts Tim> of the existing com_xxx routines, we've got different pictures in Tim> mind.
This was precisely my example, because that's the way I implemented the return warning stuff, by modifying the com_xxx routines. I believe that's the wrong way to go in the long run, and I see by the rest of your message you feel the same way as well. To the greatest extent possible, I think this stuff should be implemented in Python. (We may disagree on that point.) Being able to plug in new parse tree analysis/transformation modules between parse tree creation and code generation could at least be controlled from Python.
Skip
P.S. Something I just noticed: Since the node typedef (node.h) and the macros that manipulate nodes are shared across multiple files shouldn't they be named something slightly less likely to clash with other packages?

"GW" == Greg Ward gward@cnri.reston.va.us writes:
GW> However, one could certainly envision a world where Python GW> issues runtime warnings. If my time machine were working, I'd GW> zip back and suggest to Guido that mistakes with the % GW> operator should issue warnings rather than raising exceptions. GW> (Ignore the language philosophy issue and presume this would GW> be worthwhile.)
Moderately off-topic, but since you brought it up, here's what I use in Mailman (since site-admins can make mistakes editing their templates, which contains %(keys)s... we'd like to make Mailman more robust so it doesn't totally crap out when that happens).
We (hopefully) always interpolate with a SafeDict instead of a raw Python dictionary.
-Barry
class SafeDict(UserDict): """Dictionary which returns a default value for unknown keys.
This is used in maketext so that editing templates is a bit more robust. """ def __init__(self, d): # optional initial dictionary is a Python 1.5.2-ism. Do it this way # for portability UserDict.__init__(self) self.update(d)
def __getitem__(self, key): try: return self.data[key] except KeyError: if type(key) == StringType: return '%('+key+')s' else: return '<Missing key: %s>' % `key`

Greg Ward wrote:
...
I would suggest sys.warnings be a dictionary.
python -Wbad-return -Wlines-per-func=50
print sys.warnings
{'bad-return': None, 'lines-per-func': '50'}
Makes sense -- true to warn (possibly giving some extra meaning to "truth", as in this example), and false to not warn. Or maybe None to not warn, not-None to warn. Of course, if there are only compile-time warnings, then modifying sys.warnings will only affect future imports, execs, evals, etc.
Actually, I had intended *presence* in the dictionary to mean "enabled." I don't think we'd want to pre-populate the dict with all possible flags ahead of time, then check for each of them on the command line. (startup time!) However, if we "simply" parsed the command line, extracted all -W options and dropped them into the dict, then we're set.
Cheers, -g
-- Greg Stein, http://www.lyra.org/

[Guido]
... I'm wondering if we should introduce a general '-w' flag to turn on warnings like this (which would subsume -t)? Or perhaps there should be a -W flag ("no warnings") and warnings should be the default?
The latter, if for no other reason than that new users should get bludgeoned into good practice from their first day. If something's serious enough to trigger a warning, and you insist on doing it anyway, then you should at least know enough about Python to be able to find the -W switch <wink>.
Note that in response to 1,379 distinct complaints about insane Perl semantics, TomC's stock answer is that every serious Perl programmer runs with -w and "use strict". He's right! Every serious Perl programmer does. Perl picked the wrong default, letting naive programmers hang themselves 1,379 distinct ways by default.
Besides, warning by default will enhance your enviable reputation as a ruthless dictator opposed to freedom and creativity <wink>.
There are also platform problems, e.g. on the Mac, stderr doesn't always exist, and on Windows, it doesn't exist if pythonw.exe is used...
But this is already a problem for, e.g., reporting fatal syntax errors, yes? That is, -w/-W isn't creating a new problem here, it's making the lack of a solution to an old problem more evident.
all's-for-the-best-in-this-best-of-all-possible-worlds-ly y'rs - tim

On 07 September 1999, Tim Peters said:
The latter, if for no other reason than that new users should get bludgeoned into good practice from their first day. If something's serious enough to trigger a warning, and you insist on doing it anyway, then you should at least know enough about Python to be able to find the -W switch <wink>.
Note that in response to 1,379 distinct complaints about insane Perl semantics, TomC's stock answer is that every serious Perl programmer runs with -w and "use strict". He's right! Every serious Perl programmer does. Perl picked the wrong default, letting naive programmers hang themselves 1,379 distinct ways by default.
I agree, but I'm only willing to do so publicly because Tim has. So does the Perl documentation (ie. Tom C., I assume); from "man perl":
DIAGNOSTICS The -w switch produces some lovely diagnostics. [...] Did we mention that you should definitely consider using the -w switch?
BUGS The -w switch is not mandatory.
D'you think that's a hint?
Obviously, there *must* be a way to turn off warnings, so we can continue to run our crufty, bug-ridden old code without too many problems.
Greg S.'s suggestion for being able to customize *which* warnings are printed is also important. Much hair was pulled when Perl 5.004 was released with a whole bunch of new warning messages -- lots of people had to go back and "fix" working code, or remove the -w switch from production scripts to clean up the mess on their stderr, etc. I suspect most of those people (myself included) were enlightened by the new warnings, but annoyed by having to go and fix what wasn't necessarily broken. A lot of people now recommend using -w only when developing, and removing it for production use, simply because of the risk of new warning messages when you upgrade Perl.
Greg

Greg Ward wrote:
On 07 September 1999, Tim Peters said:
The latter, if for no other reason than that new users should get bludgeoned into good practice from their first day. If something's serious enough to trigger a warning, and you insist on doing it anyway, then you should at least know enough about Python to be able to find the -W switch <wink>.
Note that in response to 1,379 distinct complaints about insane Perl semantics, TomC's stock answer is that every serious Perl programmer runs with -w and "use strict". He's right! Every serious Perl programmer does. Perl picked the wrong default, letting naive programmers hang themselves 1,379 distinct ways by default.
I agree, but I'm only willing to do so publicly because Tim has. So does the Perl documentation (ie. Tom C., I assume); from "man perl":
DIAGNOSTICS The -w switch produces some lovely diagnostics. [...] Did we mention that you should definitely consider using the -w switch? BUGS The -w switch is not mandatory.
D'you think that's a hint?
Obviously, there *must* be a way to turn off warnings, so we can continue to run our crufty, bug-ridden old code without too many problems.
Greg S.'s suggestion for being able to customize *which* warnings are printed is also important. Much hair was pulled when Perl 5.004 was released with a whole bunch of new warning messages -- lots of people had to go back and "fix" working code, or remove the -w switch from production scripts to clean up the mess on their stderr, etc. I suspect most of those people (myself included) were enlightened by the new warnings, but annoyed by having to go and fix what wasn't necessarily broken. A lot of people now recommend using -w only when developing, and removing it for production use, simply because of the risk of new warning messages when you upgrade Perl.
I'd suggest to use the -W <opt>[=<value>] kind of command line option interface for warnings and to also add an environment variable to customize the standard settings, e.g. PYTHONWARNINGS.
About enabling warning per default: you should consider the fact that much code out there will probably produce such warnings, even if it is perfectly valid (e.g. consider Skip's example with while 1:...). Enabling it is definitely not a good idea for production code -- it is during the development step.
Since production code is likely to run using -O, I suggest disabling warnings when -O is used and enabling them otherwise.
Also, I'd like to second GregS' idea with the sys.warnings dict. Together with a generic -W <opt>=<value> interface this would be great for adding customized warnings to Python scripts (i.e. not only the ones that the interpreter itself produces).
participants (9)
-
Barry A. Warsaw
-
Fred L. Drake, Jr.
-
Greg Stein
-
Greg Ward
-
Guido van Rossum
-
M.-A. Lemburg
-
Mark Hammond
-
Skip Montanaro
-
Tim Peters