I think that if we're going to do string interpolation we might as go all of the way and have one unified string interpolation model. 1. There should be no string-prefix. Instead the string \$ should be magical in all non-raw literal strings as \x, \n etc. are. (if you want to do string interpolation on a raw string, you could do it using the method version below)
from __future__ import string_interp
a = "acos(.5) = \$(acos(.5))"
Embrace the __future__! 2. There should be a transition period where literal strings containing "\$" are flagged. This is likely rare but may occur here and there. And by the way, unused \-sequences should probably be proactively reserved now instead of silently "failing" as they do today. What's the use of making "\" special if sometimes it isn't special? 3. I think that it would be clearest if any expression other than a simple variable name required "\$(parens.around.it())". But that's a minor decision. 4. Between the $-sign and the opening paren, it should be possible to put a C-style formatting specification. "pi = \$5.3f(math.pi)". There is no reason to force people to switch to a totally different language feature to get that functionality. I never use it myself but presume that scientists do! 5. The interpolation functionality is useful enough to be available for use on runtime-generated strings. But at runtime it should have a totally different syntax. Now that Python has string methods it is clear that "%" could (and IMO should) have been implemented that way: newstr = mystr.interp(variabledict, evaluate_expressions=0) By default evaluate_expressions is turned off. That means that all it does is look up variables in the dictionary and insert them into the string where it seems \$. If you want full interpretation behaviour you would flip the evaluate_expressions switch. May Guido have mercy on your soul. 6. People should be discouraged from using the "%" version. Some day far in the future it could be officially deprecated. We'll tell our children stories about the days when we modulo'd strings, tuples and dictionaries in weird and wonderful ways. Once the (admittedly long) transition period is over, we would simply have a better way to do everything we can do today. Code using the new model will be easier to read, more concise, more consistent, more like other scripting languages, abuse syntax less and use fewer logical concepts. Arguably, functions like vars(), locals() and globals() could be relegated to an "introspection" module where no newbie will ever look at them again. (okay, now I'm over-reaching) There will undoubtedly be language-change backlash. Guido will take the heat, not me. He would have to decide if it was worth the pain. I think, however, that the resulting language would be an improvement for experts and newbies alike. And as with other changes -- sooner is better than later. The year after next year is going to be the Year of Python so let's get our changes in before then! Paul Prescod
On Tue, 15 Jan 2002, Paul Prescod wrote:
I think that if we're going to do string interpolation we might as go all of the way and have one unified string interpolation model.
Nice pie in the sky; my comments inserted below.
1. There should be no string-prefix. Instead the string \$ should be magical in all non-raw literal strings as \x, \n etc. are. (if you want to do string interpolation on a raw string, you could do it using the method version below)
+1 on no prefix, -0 on \$. To my eyes, \(whatever) looks much cleaner, tho I'm not sure how that would work with the evaluate_expressions flag in (5).
2. There should be a transition period where literal strings containing "\$" are flagged. This is likely rare but may occur here and there. And by the way, unused \-sequences should probably be proactively reserved now instead of silently "failing" as they do today. What's the use of making "\" special if sometimes it isn't special?
+1 on making undefined \-sequences raise SyntaxError.
3. I think that it would be clearest if any expression other than a simple variable name required "\$(parens.around.it())". But that's a minor decision.
+1 on parens, but see my comments to (1).
4. Between the $-sign and the opening paren, it should be possible to put a C-style formatting specification.
"pi = \$5.3f(math.pi)".
There is no reason to force people to switch to a totally different language feature to get that functionality. I never use it myself but presume that scientists do!
Eek -- feeping creaturism. -2. The only reason to add this here is to be able to remove the % operator on strings, and I'm not convinced that is the right way to go. Anyways, this just begs to be spelled something like \%5.3f(math.pi). Printf-like format specifications without a %-character seems just weird.
5. The interpolation functionality is useful enough to be available for use on runtime-generated strings. But at runtime it should have a totally different syntax. Now that Python has string methods it is clear that "%" could (and IMO should) have been implemented that way:
newstr = mystr.interp(variabledict, evaluate_expressions=0)
By default evaluate_expressions is turned off. That means that all it does is look up variables in the dictionary and insert them into the string where it seems \$. If you want full interpretation behaviour you would flip the evaluate_expressions switch. May Guido have mercy on your soul.
-0. Here I think is a good place to draw the line before the returns diminish too far. I see the major part of the usefulness of string interpolation coming from compile time usage, and that also nicely matches how all other \-sequences are handled. /Paul
Paul Svensson wrote:
....
+1 on no prefix, -0 on \$. To my eyes, \(whatever) looks much cleaner, tho I'm not sure how that would work with the evaluate_expressions flag in (5).
An offline correspond suggested that and also suggested perhaps \`. \` is nicely reminicent of `abc` and it does basically the same thing, only in strings, so I kind of like it.
`5+3` '8' "\`5 + 3` is enough" 8 is enough
The downside is that larger characters like $ and % are much more clear to my eyes. Plus there is the whole apos-backtick confusion. The problem with \( is that that is likely to already be a popular string in regular expressions.
...
4. Between the $-sign and the opening paren, it should be possible to put a C-style formatting specification.
"pi = \$5.3f(math.pi)".
There is no reason to force people to switch to a totally different language feature to get that functionality. I never use it myself but presume that scientists do!
Eek -- feeping creaturism. -2.
The feature is already there and sometimes used. We either keep two different ways to spell interpolation or we incorporate it.
The only reason to add this here is to be able to remove the % operator on strings, and I'm not convinced that is the right way to go. Anyways, this just begs to be spelled something like \%5.3f(math.pi). Printf-like format specifications without a %-character seems just weird.
The offline correspondant also had this idea and I'm coming around to it.
... -0. Here I think is a good place to draw the line before the returns diminish too far. I see the major part of the usefulness of string interpolation coming from compile time usage, and that also nicely matches how all other \-sequences are handled.
And do what to do templating at runtime? Modulo? string.replace? Or just don't provide that feature? Also, how to handle interpolation in raw strings? Paul Prescod
On Tue, 15 Jan 2002, Paul Prescod wrote:
Paul Svensson wrote:
....
+1 on no prefix, -0 on \$. To my eyes, \(whatever) looks much cleaner, tho I'm not sure how that would work with the evaluate_expressions flag in (5).
An offline correspond suggested that and also suggested perhaps \`. \` is nicely reminicent of `abc` and it does basically the same thing, only in strings, so I kind of like it.
`5+3` '8' "\`5 + 3` is enough" 8 is enough
The downside is that larger characters like $ and % are much more clear to my eyes. Plus there is the whole apos-backtick confusion.
I thought of \` as well, but didn't suggest it, mainly for those reasons.
The problem with \( is that that is likely to already be a popular string in regular expressions.
In which case it should either be a raw string, or spelled \\(. (We _really_ need to issue syntax errors on undefined \-sequences)
...
4. Between the $-sign and the opening paren, it should be possible to put a C-style formatting specification.
"pi = \$5.3f(math.pi)".
There is no reason to force people to switch to a totally different language feature to get that functionality. I never use it myself but presume that scientists do!
Eek -- feeping creaturism. -2.
The feature is already there and sometimes used. We either keep two different ways to spell interpolation or we incorporate it.
I don't think interpolation and variable formatting are similar enough to conflate in a single notation -- wasn't it the ungainliness of using the existing variable formatting to interpolate that started this thread ?
The only reason to add this here is to be able to remove the % operator on strings, and I'm not convinced that is the right way to go. Anyways, this just begs to be spelled something like \%5.3f(math.pi). Printf-like format specifications without a %-character seems just weird.
The offline correspondant also had this idea and I'm coming around to it.
I'm not particularly happy with that idea; simply mimicking the syntax it was supposed to replace, for little gain. I also think there could be some cause for confusion between \%(foo)s looking in vars() and %(foo)s using the other side of the % operator.
... -0. Here I think is a good place to draw the line before the returns diminish too far. I see the major part of the usefulness of string interpolation coming from compile time usage, and that also nicely matches how all other \-sequences are handled.
And do what to do templating at runtime? Modulo? string.replace? Or just don't provide that feature? Also, how to handle interpolation in raw strings?
Since the whole point of raw strings is to _not_ touch what's inside the quotes, I don't see how string interpolation makes much sense there. As for runtime templating, a string method to replace \-sequences seems like a very straightforward idea, that shouldn't need much discussion. Call it "".eval([globals, [locals]]), to get some educational synergy from teaching all the newbies not to give unchecked user input to eval(). I still think compile-time templating would be the more common use, and thus should be the driving issue behind the design. /Paul
Paul Prescod wrote:
I think that if we're going to do string interpolation we might as go all of the way and have one unified string interpolation model.
1. There should be no string-prefix. Instead the string \$ should be magical in all non-raw literal strings as \x, \n etc. are. (if you want to do string interpolation on a raw string, you could do it using the method version below)
from __future__ import string_interp
a = "acos(.5) = \$(acos(.5))"
Embrace the __future__!
-1. Too dangerous. If string interpolation makes it into the core, then please use a *new* construct. '\$' is currently interpreted as '\$' and this should not be changed (heck, just think what would happen to all the shell script snippets encoded in Python strings). BTW, why don't you wrap all this interpolation stuff into a module and then call a function to have it apply all the magic you want. If I remember correctly, someone else has already written such a module for Python. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
"M.-A. Lemburg" wrote:
...
Embrace the __future__!
-1.
Too dangerous.
It isn't dangerous. That's precisely what __future__ is for! It is no more dangerous than any other feature that uses __future__.
... If string interpolation makes it into the core, then please use a *new* construct. '\$' is currently interpreted as '\$' and this should not be changed (heck, just think what would happen to all the shell script snippets encoded in Python strings).
No, this should be changed. Completely ignoring string interpolation, I am strongly in favour of changing the behaviour of the literal string parser so that unknown \-combinations raise a SyntaxError. If you don't want a backslash to be interpreted as an escape sequence start, you should use a raw string. The Python documentation and grammar already says: escapeseq ::= "\" <any ASCII character> The documentation says: "Unlike Standard , all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.)" That's a weird thing to say. What could be more helpful for debugging than a good old SyntaxError???
BTW, why don't you wrap all this interpolation stuff into a module and then call a function to have it apply all the magic you want.
We've been through that in this discussion already. In fact, that's how the discussion started. Paul Prescod
On Wed, 16 Jan 2002, Paul Prescod wrote:
The documentation says:
"Unlike Standard , all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.)"
That's a weird thing to say. What could be more helpful for debugging than a good old SyntaxError???
The usefulness is relative; it's arguably easier to find the problem and fix it if the \ remains in the string than if it's simply removed (as C does, tho most compilers issue a warning). It could also be argued that you get more nutritinal value by eating only the black raisins from the cake then by eating just the golden raisins... /Paul
Paul Svensson wrote:
...
The usefulness is relative; it's arguably easier to find the problem and fix it if the \ remains in the string than if it's simply removed (as C does, tho most compilers issue a warning).
Yeah, I understood that. I just don't understand why it isn't like most other things in Python. Python tends to be strict about things that are likely mistakes, rather than helping you "debug them" after passing them through silently. Paul Prescod
Yeah, I understood that. I just don't understand why it isn't like most other things in Python. Python tends to be strict about things that are likely mistakes, rather than helping you "debug them" after passing them through silently.
Paul Prescod
The "why" is that long ago Python didn't have raw strings but it did have regular expressions. I thought it would be painful to have to double all backslashes used for the regex syntax. It would be hard to change this policy now. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
...
The "why" is that long ago Python didn't have raw strings but it did have regular expressions. I thought it would be painful to have to double all backslashes used for the regex syntax.
Aha.
It would be hard to change this policy now.
How about an optional warning which, after a year or so, would be turned on by default, and then a year or so after that would be an error? This same issue may effect some eventual merging of literal strings and Unicode literals because \N, \u etc. are treated differently in strings than in Unicode literals. And even if literal strings and Unicode strings are never merged, \N could be useful in ordinary strings. Paul Prescod
How about an optional warning which, after a year or so, would be turned on by default, and then a year or so after that would be an error?
This same issue may effect some eventual merging of literal strings and Unicode literals because \N, \u etc. are treated differently in strings than in Unicode literals. And even if literal strings and Unicode strings are never merged, \N could be useful in ordinary strings.
-1 I don't find this enough of a problem to invoke the heavy gun of a language change. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, 16 Jan 2002, Paul Prescod wrote:
Guido van Rossum wrote:
...
The "why" is that long ago Python didn't have raw strings but it did have regular expressions. I thought it would be painful to have to double all backslashes used for the regex syntax.
Aha.
It would be hard to change this policy now.
How about an optional warning which, after a year or so, would be turned on by default, and then a year or so after that would be an error?
Such a warning might prove to be a useful debugging tool, even if the language never changed. Maybe it would be a useful addition to PyChecker or some similar tool ? foot-in-the-door-ly, /Paul
On Wed, 16 Jan 2002, Guido van Rossum wrote:
Yeah, I understood that. I just don't understand why it isn't like most other things in Python. Python tends to be strict about things that are likely mistakes, rather than helping you "debug them" after passing them through silently.
Paul Prescod
The "why" is that long ago Python didn't have raw strings but it did have regular expressions. I thought it would be painful to have to double all backslashes used for the regex syntax.
It would be hard to change this policy now.
Yeah, it would be like, say, changing the semantics of integer division. Sometimes it's better to do what's right than what's easy. /Paul
Paul Prescod wrote:
"M.-A. Lemburg" wrote:
...
Embrace the __future__!
-1.
Too dangerous.
It isn't dangerous. That's precisely what __future__ is for! It is no more dangerous than any other feature that uses __future__.
It is. Currently Python strings are just that: immutable strings. Now, you suddenly add dynamics to then. This will cause nightmares in terms of security. Note that Python hasn't really had a need for Perl's "taint" because of this. I wouldn't want to see that change in any way. If you really need this, either use a string prefix or call a specific function which implements string interpolation. At least then things are obvious and explicit.
... If string interpolation makes it into the core, then please use a *new* construct. '\$' is currently interpreted as '\$' and this should not be changed (heck, just think what would happen to all the shell script snippets encoded in Python strings).
No, this should be changed.
Huh ? I bet RedHat and thousands of sysadmins who have switched from shell or Perl to Python would have strong objections.
Completely ignoring string interpolation, I am strongly in favour of changing the behaviour of the literal string parser so that unknown \-combinations raise a SyntaxError. If you don't want a backslash to be interpreted as an escape sequence start, you should use a raw string.
The Python documentation and grammar already says:
escapeseq ::= "\" <any ASCII character>
The documentation says:
"Unlike Standard , all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.)"
That's a weird thing to say. What could be more helpful for debugging than a good old SyntaxError???
If there's nothing wrong with the escape why raise a SyntaxError ?
BTW, why don't you wrap all this interpolation stuff into a module and then call a function to have it apply all the magic you want.
We've been through that in this discussion already. In fact, that's how the discussion started.
I've jumped in at a rather late point. Perhaps you ought to rewind the discussion then and start discussing in a different direction :-) E.g. about the syntax to be used in the interpolation and where, when and in which context to evaluate the strings. There are so many options that I can't really see any benefit from chosing only one and hard-coding it into the language. Other users will have other requirement which are likely not to combine well with the one implementation you have in mind. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
On Thu, 17 Jan 2002, M.-A. Lemburg wrote:
Paul Prescod wrote:
The documentation says:
"Unlike Standard , all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.)"
That's a weird thing to say. What could be more helpful for debugging than a good old SyntaxError???
If there's nothing wrong with the escape why raise a SyntaxError ?
I would certainly claim that an unrecognized escape sequence _is_ wrong. /Paul
Paul Svensson wrote:
On Thu, 17 Jan 2002, M.-A. Lemburg wrote:
Paul Prescod wrote:
The documentation says:
"Unlike Standard , all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.)"
That's a weird thing to say. What could be more helpful for debugging than a good old SyntaxError???
If there's nothing wrong with the escape why raise a SyntaxError ?
I would certainly claim that an unrecognized escape sequence _is_ wrong.
Depending on how you see it, an "unrecognized escape sequence" is not an escape sequence to begin with :-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
I would certainly claim that an unrecognized escape sequence _is_ wrong.
Then you are wrong. Go away and design your own language. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Thu, 17 Jan 2002, Guido van Rossum wrote:
I would certainly claim that an unrecognized escape sequence _is_ wrong.
Then you are wrong. (---)
Then maybe the Python Referece Manual (2.4.1) needs to be updated, since the paragraph concerning unrecognized escape sequences doesn't mention them other than being "mistyped" or "broken". (Does "mistyped" and "broken" qualify as "wrong" ?) /Paul
Guido van Rossum wrote:
Paul Svensson:
I would certainly claim that an unrecognized escape sequence _is_ wrong.
Then you are wrong. Go away and design your own language.
Hey! That's a bit harsh. I'm not going to campaign to make unrecognized escape sequences a syntax error, but not raising a syntax error does seem to be against Python's principles. -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista We must not let the evil of a few trample the freedoms of the many.
Paul Svensson:
I would certainly claim that an unrecognized escape sequence _is_ wrong.
Guido van Rossum wrote:
Then you are wrong. Go away and design your own language.
Aahz:
Hey! That's a bit harsh. I'm not going to campaign to make unrecognized escape sequences a syntax error, but not raising a syntax error does seem to be against Python's principles.
Whatever. Who is Paul Svensson and what is he doing in python-dev? --Guido van Rossum (home page: http://www.python.org/~guido/)
I think that something in particular that Paul S. said got under your skin (and there was something he said that could certainly get under a person's skin). I'm pretty sure it isn't now a policy to rudely reject suggestions from people you haven't heard of! Until I went back through the thread I felt as Aahz did that your rejection was somewhat severe in tone. I think you (still) agree that people should not be afraid of (politely) stating their opinions in python-dev, even when those opinions disagree with yours. Or if there is an unspoken rule that unproven developers shouldn't be in python-dev then maybe we should just make it a spoken rule. But I'm most confident of the theory that you snapped at one person in particular because of something he said. Paul Prescod Guido van Rossum wrote:
Paul Svensson:
I would certainly claim that an unrecognized escape sequence _is_ wrong.
Guido van Rossum wrote:
Then you are wrong. Go away and design your own language.
Aahz:
Hey! That's a bit harsh. I'm not going to campaign to make unrecognized escape sequences a syntax error, but not raising a syntax error does seem to be against Python's principles.
Whatever. Who is Paul Svensson and what is he doing in python-dev?
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev
I think that something in particular that Paul S. said got under your skin (and there was something he said that could certainly get under a person's skin). I'm pretty sure it isn't now a policy to rudely reject suggestions from people you haven't heard of! Until I went back through the thread I felt as Aahz did that your rejection was somewhat severe in tone. I think you (still) agree that people should not be afraid of (politely) stating their opinions in python-dev, even when those opinions disagree with yours. Or if there is an unspoken rule that unproven developers shouldn't be in python-dev then maybe we should just make it a spoken rule. But I'm most confident of the theory that you snapped at one person in particular because of something he said.
Paul Prescod
He harped at the same issue in three consecutive message without explaining his position. --Guido van Rossum (home page: http://www.python.org/~guido/)
Paul Prescod:
[...] But I'm most confident of the theory that you snapped at one person in particular because of something he said.
Guido:
He harped at the same issue in three consecutive message without explaining his position.
Actually I was quite happy with the thread. At runtime, Python tends to complain about iffy situations, even situations that other languages might silently accept. For example: print 50 + " percent" # TypeError x = [1, 2, 3]; x.remove(4) # ValueError x = {}; print x[3] # KeyError a, b = "x,y,z,z,y".split() # ValueError x.append(1, 2) # TypeError, recently print u"\N{EURO SIGN}" # UnicodeError I'm not complaining. I like the pickiness. But the Python compiler (that is, Python's syntax) tends to be more forgiving. Examples: - Inconsistent use of tabs and spaces. (Originally handled by tabnanny.py; now an optional warning in Python itself.) - Useless or probably-useless expressions, like these: def g(f): os.environ['EDITOR'] # does nothing with value f.write(xx), f.write(yy) # should be ; not , f.close # obvious mistake (PyChecker catches the last one.) - Non-escaping backslashes in strings (there is a well-known reason for this one; but the reason no longer exists, in new code anyway, since 1.5.) So we catch things like this with static analysis tools like tabnanny.py, or lately PyChecker. If Guido finds any of these syntax-checks compelling enough, he can always incorporate them into Python whenever (but don't hold your breath). Again, you'll get no complaints from me on this. But I am curious. Is this apparent difference in pickiness a design choice? Or is it just harder to write picky compilers than picky libraries? Or am I seeing something that's not really there? ## Jason Orendorff http://www.jorendorff.com/
(I'm changing the topic :-)
At runtime, Python tends to complain about iffy situations, even situations that other languages might silently accept.
"Other languages" being Perl or JavaScript? The situations you show here would all be errors in most languages that are compiled to machine code.
For example:
print 50 + " percent" # TypeError x = [1, 2, 3]; x.remove(4) # ValueError x = {}; print x[3] # KeyError a, b = "x,y,z,z,y".split() # ValueError x.append(1, 2) # TypeError, recently print u"\N{EURO SIGN}" # UnicodeError
I'm not complaining. I like the pickiness.
That's why you're using Python. :-)
But the Python compiler (that is, Python's syntax) tends to be more forgiving. Examples:
- Inconsistent use of tabs and spaces. (Originally handled by tabnanny.py; now an optional warning in Python itself.) - Useless or probably-useless expressions, like these: def g(f): os.environ['EDITOR'] # does nothing with value f.write(xx), f.write(yy) # should be ; not , f.close # obvious mistake (PyChecker catches the last one.) - Non-escaping backslashes in strings (there is a well-known reason for this one; but the reason no longer exists, in new code anyway, since 1.5.)
So we catch things like this with static analysis tools like tabnanny.py, or lately PyChecker. If Guido finds any of these syntax-checks compelling enough, he can always incorporate them into Python whenever (but don't hold your breath).
Again, you'll get no complaints from me on this. But I am curious. Is this apparent difference in pickiness a design choice? Or is it just harder to write picky compilers than picky libraries? Or am I seeing something that's not really there?
There's no unifying reason why thes examples are not errors. The first and last can be considered historical raisins -- the tabs/spaces mix was considered a good thing in the days when Python only ran on Unixoid systems where nobody would think about changing the display size for tabs; we know the reason for the last. But it's hard to change these without inconveniencing users, and there are other ways to deal with them (like picky tools). The three examples in the second item have in common that they are syntactically expressions but are used in a statement context. The problem here that any language designer is faced with: you would want to allow expressions with an obvious side-effect, but you would want to disallow expressions that obviously have no side-effects. But where to draw the line? Traditional parsing technology such as used in Python makes it hard to be very differentiating here; a good analysis of which expressions "make sense" and which ones don't can only be done during a later pass of the compiler. I believe that evertually some PyChecker-like technology will be incorporated in the Python compiler. The same happened to C compilers: the lint program became useless once GCC incorporated the same technology. But these warnings will always have a different status than purely syntactical error: there are often cases where the user knows better (for example, sometimes an attribute reference can have a desirable side effect). --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
I believe that evertually some PyChecker-like technology will be incorporated in the Python compiler. The same happened to C compilers: the lint program became useless once GCC incorporated the same technology.
pychecker was (and still is) an experiment to me. But I think it would be great if the lessons from pychecker could be integrated into the compiler. Currently, I think there are 2 or 3 warnings which definitely fit this class: No global found, using ++/--, and expressions with no effect as Jason described. I have posted a patch on SF to demonstrate the feasibility of expressions with no effect: https://sourceforge.net/tracker/index.php?func=detail&aid=505826&group_id=5470&atid=305470 It should be pretty easy to warn about ++ and --. No global found would probably require another pass of the code after compilation. I'd be happy to help the process of integrating warnings into the compiler, however, I'm not sure how to proceed. Should pychecker be put into the standard library (users can now do: import pychecker.checker and all modules imported are checked by installing an __import__)? Should pychecker be added as a tool? Should a PEP be written? etc.
But these warnings will always have a different status than purely syntactical error: there are often cases where the user knows better (for example, sometimes an attribute reference can have a desirable side effect).
I agree. Neal
Neal Norwitz:
Guido van Rossum:
But these warnings will always have a different status than purely syntactical error: there are often cases where the user knows better (for example, sometimes an attribute reference can have a desirable side effect).
I agree.
Here's what Pychecker finds in the standard library (as of 2.2). In each case, the expression is intended to raise an exception if the named variable or attribute doesn't exist. Each one could be rewritten (I'm curious as to the prevailing stylistic opinions on this): === code.py (lines 217 and 221) try: sys.ps1 except AttributeError: sys.ps1 = ">>> " try: sys.ps2 except AttributeError: sys.ps2 = "... " Could be rewritten: if not hasattr(sys, 'ps1'): sys.ps1 = ">>> " if not hasattr(sys, 'ps2'): sys.ps2 = "... " === locale.py (line 721) try: LC_MESSAGES except: pass else: __all__.append("LC_MESSAGES") Could be rewritten: if globals().has_key("LC_MESSAGES"): __all__.append("LC_MESSAGES") === pickle.py (line 58) try: UnicodeType except NameError: UnicodeType = None Could be rewritten: globals().setdefault('UnicodeType', None) ## Jason Orendorff http://www.jorendorff.com/
Each one could be rewritten (I'm curious as to the prevailing stylistic opinions on this):
I think those rewrites do not improve the code, see detailed comments below.
Could be rewritten: if not hasattr(sys, 'ps1'): sys.ps1 = ">>> " if not hasattr(sys, 'ps2'): sys.ps2 = "... "
Using string literals when you mean attribute names is bad style. It just helps to trick the checker. Sometimes, you cannot avoid this style, but if you can, you should.
if globals().has_key("LC_MESSAGES"): __all__.append("LC_MESSAGES")
This combines the previous issue with the usage of globals(). I find it confusing to perform function calls to check for the presence of names.
try: UnicodeType except NameError: UnicodeType = None
Could be rewritten: globals().setdefault('UnicodeType', None)
Same issue here. If this needs to be rewritten, I'd prefer try: from types import UnicodeType except ImportError: UnicodeType = None Somebody might also change the "from types import *" to explicitly list the set of names that are requested, when changing this fragment. Regards, Martin
"Martin v. Loewis" wrote:
...
Could be rewritten: if not hasattr(sys, 'ps1'): sys.ps1 = ">>> " if not hasattr(sys, 'ps2'): sys.ps2 = "... "
Using string literals when you mean attribute names is bad style. It just helps to trick the checker.
Just for the record, I think that Jason's rewrites were clearer in every case because they said exactly what he was trying to do. "If the sys module has the attribute ps1 then ..." This is much clearer than "Get the ps1 attribute from the sys module and throw it away.". Python has a functions specifically for checking for the existance of attributes and keys. Why not use them? Plus, I think that exceptions should be (as far as possible) reserved for exceptional situations. Using them to as tests is not as compact, not as readable and not as runtime efficient. But more to the point, any of these could have been rewritten as: _junk = sys.ps1 That would shut up compiler messages without forcing you to use the haskey/hasattr style. Paul Prescod
"NN" == Neal Norwitz
writes:
NN> Guido van Rossum wrote:
I believe that evertually some PyChecker-like technology will be incorporated in the Python compiler. The same happened to C compilers: the lint program became useless once GCC incorporated the same technology.
NN> pychecker was (and still is) an experiment to me. But I think NN> it would be great if the lessons from pychecker could be NN> integrated into the compiler. Me, too. NN> I'd be happy to help the process of integrating warnings into NN> the compiler, however, I'm not sure how to proceed. Should NN> pychecker be put into the standard library (users can now do: NN> import pychecker.checker and all modules imported are checked by NN> installing an __import__)? Should pychecker be added as a tool? NN> Should a PEP be written? etc. How much of pychecker's work could be done by the compiler itself? I'd like to see more of the warnings generated during compilation, but agree with Michael Hudson that extending it is a lot of work. Perhaps it's time to redesign the compiler. A PEP is probably good for more than one reason. One reason is to document the warnings that are generated and the rationale for them. If you integrate it into the compiler, the PEP is a good place to capture some design info. Jeremy
We could talk about this at the conference. Jeremy
Guido van Rossum wrote:
Jason Orendorff wrote:
At runtime, Python tends to complain about iffy situations, even situations that other languages might silently accept.
"Other languages" being Perl or JavaScript? The situations you show here would all be errors in most languages that are compiled to machine code.
For example: print 50 + " percent" # TypeError x = [1, 2, 3]; x.remove(4) # ValueError x = {}; print x[3] # KeyError a, b = "x,y,z,z,y".split() # ValueError x.append(1, 2) # TypeError, recently print u"\N{EURO SIGN}" # UnicodeError
Not to bicker, but Java only manages to reject 2 of the 6, both at compile time. The other 4 silently pass through the standard library without complaint. None cause exceptions during execution. ML makes no distinction between append(1, 2) and append((1, 2)), but that's a syntax thing... C++ STL remove() doesn't complain if it doesn't find anything to remove; nor does the C++ map<>::operator[]() complain if no entry exists.
I'm not complaining. I like the pickiness.
That's why you're using Python. :-)
(laugh) You sell yourself short, Guido. :) I would still use Python even if (50 + " percent") started evaluating to "50 percent" tomorrow. ## Jason Orendorff http://www.jorendorff.com/
"MAL" == M
writes:
MAL> It is. Currently Python strings are just that: immutable MAL> strings. Now, you suddenly add dynamics to then. This will MAL> cause nightmares in terms of security. Note that Python MAL> hasn't really had a need for Perl's "taint" because of MAL> this. I wouldn't want to see that change in any way. Bingo! MAL> I've jumped in at a rather late point. Perhaps you ought to MAL> rewind the discussion then and start discussing in a MAL> different direction :-) E.g. about the syntax to be used in MAL> the interpolation and where, when and in which context to MAL> evaluate the strings. Proponants of this feature can start by updating the PEP. -Barry
"M.-A. Lemburg" wrote:
...
It is. Currently Python strings are just that: immutable strings. Now, you suddenly add dynamics to then.
I don't want to go through this whole thread from the beginning again. PEP 215 does not add "dynamics" to anything. In fact, PEP 215 is a more static mechanism than the current idiom. Even if we make PEP 215's behaviour the default for strings, it is still NOT DYNAMIC.
... This will cause nightmares in terms of security.
There is a thread called "PEP 215 does not introduce security issues". Please read it. Everyone involved who initially thought that PEP 215 had security issues backed down and agreed that it did not. Once again, whether there is a string prefix or not is irrelevant to this question. PEP 215's semantics are *not dynamic*.
... Note that Python hasn't really had a need for Perl's "taint" because of this. I wouldn't want to see that change in any way.
I am certainly not a Perl programmer but Python is also attackable through the sorts of holes that "taint" is intended to avoid. username = raw_input() os.system("cp %s.new %s.old" % (username, username)) Perl considers this "dangerous" and so it has taint. It has *nothing* to do with interpolation syntax.
... Huh ? I bet RedHat and thousands of sysadmins who have switched from shell or Perl to Python would have strong objections.
Python has a construct called a "raw string" which is perfect for when you don't want backslashes treated specially. Paul Prescod
"M.-A. Lemburg" wrote:
... Note that Python hasn't really had a need for Perl's "taint" because of this. I wouldn't want to see that change in any way.
On Thu, 17 Jan 2002, Paul Prescod wrote:
I am certainly not a Perl programmer but Python is also attackable through the sorts of holes that "taint" is intended to avoid.
Paul is right on the money. Tainting is a completely separate issue. That said, however, i wonder why security rarely comes up as an issue for Python. Is it because nobody expects security properties from the language? Does anyone know how much the restricted execution feature gets used? Is there anyone here that would use a tainting feature if it existed? It would be interesting to explore the possibilities for safe distributed programming in Python. Restricted execution mode and the ability to hook __import__ seem like a pretty strong starting point, and given a suitable cryptographic comm library, it might be feasible to get from there to capability-style distributed programming. IMHO, simplicity and readability are extremely important for a secure programming language, so that gives Python a great head start. (By the way, i'm planning to be at Python 10, and hope to see many of you there. As i'm looking for ways to keep costs down, would anyone be interested in splitting the cost of a hotel room in exchange for a roommate with a strange hairstyle? I'll be there Feb 4 to 7, three nights.) -- ?!ng
That said, however, i wonder why security rarely comes up as an issue for Python. Is it because nobody expects security properties from the language? Does anyone know how much the restricted execution feature gets used? Is there anyone here that would use a tainting feature if it existed?
In my understanding, tainting is needed if you allow data received from remote to invoke arbitrary operations. In Python, there is only a short list where this might cause a problem: - invoking exec or eval on a string of unknown origin - unpickling an arbitrary string - performing getattr with a parameter of unknown origin. Because there are so few places where tainted data may cause problems, it never is an issue: people just intuitively know to avoid them.
It would be interesting to explore the possibilities for safe distributed programming in Python.
Not sure what this has to do with tainting, though: if you want to execute code you receive from untrusted sources, a sandbox is closer to what you need. Regards, Martin
"MvL" == Martin v Loewis
writes:
| - invoking exec or eval on a string of unknown origin | - unpickling an arbitrary string | - performing getattr with a parameter of unknown origin. Don't forget os.system(), popen(), and friends, i.e. passing unsanitized strings to the shell. In my my long rusty Perl experience, this was the most common reason to use taint strings. Python OTOH really has very little need to call out to the shell; almost everything you'd want to do that way can be done in pure Python. There are some opportunties for improving string sanitization for the few instances where os.system() is necessary. Most of the security issues I've had to deal with in Mailman have been in library modules -- or the use thereof, not in the language itself. Things like vulnerabilies in Cookie.py or pickle/marshal, or cross-site scripting exploits, that kind of thing. There are also more subtle issues that would be interesting to explore, like DoS attacks with thru-the-web regular expression searching, deliberate form confuddling, and some of the ttw code execution stuff that e.g. Zope gets into. Rexec is an incomplete solution to the latter. -Barry
Barry A. Warsaw wrote:
"MvL" == Martin v Loewis
writes: | - invoking exec or eval on a string of unknown origin | - unpickling an arbitrary string | - performing getattr with a parameter of unknown origin.
Don't forget os.system(), popen(), and friends, i.e. passing unsanitized strings to the shell. In my my long rusty Perl experience, this was the most common reason to use taint strings.
More precisely, because Perl culture developed as a superset of shell scripts, it used to be all-too-common for Perl scripts to get their data by parsing the output of a Unix utility (instead of calling a library function directly). This necessarily spawned a subshell where malicious input could be a security problem. (When I was learning Perl, the available books often taught this programming style.) I've heard that Perl culture has changed, but the taint capability is still there because too many Perlers stick to their trusty poor habits. Pythonistas, of course, never learned bad habits. ;-) -- --- Aahz (@pobox.com) Hugs and backrubs -- I break Rule 6 <*> http://www.rahul.net/aahz/ Androgynous poly kinky vanilla queer het Pythonista We must not let the evil of a few trample the freedoms of the many.
On Sun, Jan 20, 2002 at 05:38:59PM -0800, Aahz Maruch wrote:
More precisely, because Perl culture developed as a superset of shell scripts, it used to be all-too-common for Perl scripts to get their data by parsing the output of a Unix utility (instead of calling a library function directly). This necessarily spawned a subshell where malicious input could be a security problem.
Not so. This is what taint is: Taint tells you where there's some shit you want to clean up. If you ask the user for a filename to write to, taint tells you that you'd better check for leading slashes, double dots and the like before writing to it. If you're about to run an external program, taint tells you that you might not want to believe the user's idea of what $PATH ought to be. If you're getting a URL from somewhere, taint tells you that you should probably think twice before happily passing back file:///etc/shadow. And so on and so forth. None of these examples are about input to a subshell. I'm not in a position to say whether or not Python needs taint; if it had it, I probably wouldn't use the feature. But let's not misunderstand what it's for. -- Thermodynamics in a nutshell: 1st Law: You can't win. (Energy is conserved) 2nd Law: You can't break even. (Entropy) 0th Law: You can't even quit the game. (Closed systems) -- Taki Kogoma
On Sun, Jan 20, 2002 at 11:37:11PM +0100, Martin v. Loewis wrote:
In my understanding, tainting is needed if you allow data received from remote to invoke arbitrary operations. In Python, there is only a short list where this might cause a problem:
- invoking exec or eval on a string of unknown origin - unpickling an arbitrary string - performing getattr with a parameter of unknown origin.
Ka-Ping Yee wrote:
...
That said, however, i wonder why security rarely comes up as an issue for Python.
I guess you didn't read comp.lang.python this week. ;) http://www.securityfocus.com/archive/1/250580
... Is it because nobody expects security properties from the language?
Remember that people for a long time thought of Perl as a "CGI language". And early uses of CGI would probably have depended heavily on the Perl equivalents of "popen" and "system". Plus, those features are so easy to get at in the language. Compare: print `ls` versus: import os print os.popen("ls").read() If you were a newbie in each of these languages what are the percentage chance of you using either of these features versus the list-dir equivalent. List-dir is available in each language.
... Does anyone know how much the restricted execution feature gets used?
I personally would not trust it because I don't know if anyone is following its progress from one version of Python to another. I also know that even languages that are designed from scratch to be safe (Java and JavaScript) have had leaky implemetations so I don't really hold out much hope for Python until I hear that someone is actively researching this.
... Is there anyone here that would use a tainting feature if it existed?
I'd like to think I've internalized taints rules by osmosis...
(By the way, i'm planning to be at Python 10, and hope to see many of you there. As i'm looking for ways to keep costs down, would anyone be interested in splitting the cost of a hotel room in exchange for a roommate with a strange hairstyle? I'll be there Feb 4 to 7, three nights.)
Maybe there should be a bulletin board or something for people to find each other. I think one of the Python conferences had something like that...for hotels and also to share cabs from the airport. Paul Prescod
participants (12)
-
aahz@rahul.net
-
barry@zope.com
-
Guido van Rossum
-
Jason Orendorff
-
Jeremy Hylton
-
Ka-Ping Yee
-
M.-A. Lemburg
-
Martin v. Loewis
-
Neal Norwitz
-
Paul Prescod
-
Paul Svensson
-
Simon Cozens