PEP 216 (string interpolation) alternative EvalDict
Since PEP 216 on string interpolation is still active, I'ld appreciate it if some of it's supporters would comment on my revised alternative solution (posted on comp.lang.python and at google thru): http://groups.google.com/groups?start=25&hl=en&group=comp.lang.python&selm=mailman.1010877739.27776.python-list%40python.org I didn't get any feedback on the first version that was posted -- http://groups.google.com/groups?hl=en&th=36990ccf4bc5e931&rnum=18 particularly whether the syntax was acceptable, or if a 'magic string' solution was still preferred. -- Steve Majewski
Steven Majewski wrote:
...
particularly whether the syntax was acceptable, or if a 'magic string' solution was still preferred.
IMHO, string interpolation should be one of the easiest things in the language. It should be something you learn in the first half of your first day learning Python. Any extra level of logical indirection seems misplaced to me. Paul Prescod
Paul Prescod wrote:
IMHO, string interpolation should be one of the easiest things in the language. It should be something you learn in the first half of your first day learning Python. Any extra level of logical indirection seems misplaced to me.
+1 ## Jason Orendorff http://www.jorendorff.com/
On Mon, 14 Jan 2002, Paul Prescod wrote:
particularly whether the syntax was acceptable, or if a 'magic string' solution was still preferred.
IMHO, string interpolation should be one of the easiest things in the language. It should be something you learn in the first half of your first day learning Python. Any extra level of logical indirection seems misplaced to me.
Do you have any comments or suggestions about a substitution syntax, Paul? I think anything except PEP 216's magic initial u" for strings is able to be done with an object extension rather than a syntax change, including the substitution syntax within the magic string. I kept '%' rather than '$' because I assumed that particular char choice was a rather arbitrary part of the design patterned after Tcl or Perl, and that by keeping '%' I could do it with a dict. If a different syntax is desired, then it can be done by extending string to a magic format string object (rather than a magic string syntax). I'm not sure what you mean by logical indirection here: is that a comment on the syntax, or do you object to the idea of not implementing substitution by a language syntax change. ( But if what you mean is you want fewer chars for a double substition, that's something that can be fixed.) One reason I would prefer a "magic object" implementation, rather than a 'magic syntax' one is that, after playing around with this for a bit, I can see that there are a lot of possibilities for various substitution and template languages. A language syntax change, once accepted is cast in stone (and a new revised proposal is much less likely to be considered) while we can muck about and experiment with object extensions both before and after the get put into the standard lib. -- Steve Majewski
On Mon, 14 Jan 2002, Jason Orendorff wrote:
Paul Prescod wrote:
IMHO, string interpolation should be one of the easiest things in the language. It should be something you learn in the first half of your first day learning Python. Any extra level of logical indirection seems misplaced to me.
+1
Was that +1 for PEP 216?, my alternative proposal? or Paul's comments? I think I agree with his comment above, but I'm not sure whether it was intended a comment on the syntax (which is probably justified), or objecting to solving the problem other than my changing the language syntax (which I don't agree with), or just a statement of principals. -- Steve
+1
Was that +1 for PEP 216?, my alternative proposal? or Paul's comments?
It was a +1 for Paul's comments, both its principles and as maybe a -0.3 criticism of your alternative. No opinion on PEP 215. ## Jason Orendorff http://www.jorendorff.com/
Steven Majewski wrote:
...
I'm not sure what you mean by logical indirection here: is that a comment on the syntax, or do you object to the idea of not implementing substitution by a language syntax change.
Sorry I wasn't clear. Let's say it's the second hour of our Perl/Python class. Here's Perl: $a = 5; $b = 6; print "$a $b"; Lots of yucky extra chars in that code but you can't find much negative stuff to say about the complexity of the string interpolation! Here's Python: a = 5; b = 6; print "%(a)s %(b)s" % vars() Extra indirection: What does % do? What does vars() do? What does the "s" mean? How does this use of % relate to the traditional meanings of either percentage or modulus? This is one of the two problems I would like PEP 215 to solve. The other one is to allow simple function calls and array lookups etc. to be done "inline" to avoid setting up trivial vars or building unnecessary dictionaries. If I understand your proposal correctly, I could only get the evaluation behaviour by making the "indirection" problem even worse...by adding in yet another function call (well, class construtor call), tentatively called EvalDict. Another benefit of the PEP 215 model is that something hard-coded in the syntax is much more amenable to compile time analysis. String interpolation is actually quite compatible with standard compilation techniques. You just rip the expressions out of the string, compile them to byte-code and replace them with pointers ot the evaluated results. As PEP 215 mentions, this also has advantages for reasoning about security. If I tell a new programmer to avoid the use of "eval" unless they consult with me, I'll have to tell them to avoid EvalDict also. My usual approach is to consider eval and exec to be advanced (and rarely used) features that I don't even teach new programmers. I don't know that Jython allows me today to ship a JAR without the Python parser and evaluator but I could imagine a future version that would give me that option. Widespread use of EvalDict would render that option useless. Re: $ versus %. $ is "the standard" in other languages and shells. % is the current standard in Python. $ has the advantage that it doesn't have to work around Python's current C-inspired syntax. So I guess I reluctantly favor $. Also, EvalDict should be called evaldict to match the other constructors in __builtins__. So while I understand the advantage of non-syntactic solutions, in this case I am still in favor of the syntax. Paul Prescod
[ Oops. Initial subject line said incorrectly PEP 216] On Mon, 14 Jan 2002, Paul Prescod wrote:
[...] As PEP 215 mentions, this also has advantages for reasoning about security. If I tell a new programmer to avoid the use of "eval" unless they consult with me, I'll have to tell them to avoid EvalDict also. My usual approach is to consider eval and exec to be advanced (and rarely used) features that I don't even teach new programmers.
But if you're going to allow interpolation of the results of arbitrary function into a string, it's going to be a security problem whether or not you use 'eval' to do it. My code hides the eval in the object's python code. u" strings would hide the eval in the C code. How is one more or less secure than the other. The security issue seems to be an argument for a non-language-syntax implementation, as it means that: the hidden eval's could be controlled with a restricted execution environment. ( Also the same advantages I cited to easily experiment with alternatives -- we could roll out a solution without having to tackle the security issue right away.) Also, although I agree with most of your other comments on making it simple and easy, the security issue argues against making it TOO simple. For example, I was considering making the current namespace of the call a default, so you wouldn't need globals() -- but I was worried that because of security and other issues, maybe that was too much "magic" . I think maybe how much magic is enough and how much is too much is one of the issues to discuss. Thanks for expanding on your initial comment. I think you're right that it needs to be simpler. But, for several reasons, security among them, I'm still -1 on PEP 215. -- Steve
On Mon, 14 Jan 2002, Steven Majewski wrote:
[...] I think maybe how much magic is enough and how much is too much is one of the issues to discuss.
Thanks for expanding on your initial comment. I think you're right that it needs to be simpler. But, for several reasons, security among them, I'm still -1 on PEP 215.
In fact, I think "too much magic" is my main objection to PEP 215. Having a magic string, which looks like it's a constant, with no operators or function calls associated with it being the implicit source of a while series of function calls and possibly unbounded computations is just hiding too much magic for me to swallow. u"$$main()" ? -- Steve
Steven Majewski wrote:
....
But if you're going to allow interpolation of the results of arbitrary function into a string, it's going to be a security problem whether or not you use 'eval' to do it. My code hides the eval in the object's python code. u" strings would hide the eval in the C code. How is one more or less secure than the other.
I think you mean $" strings, not u" strings. Given: a = $"foo.bar: $foo.bar(abc, 5)" I can translate that *at compile time* to: a = $"foo.bar: %s" % foo.bar(abc, 5) No runtime evaluation is necessary. So I see no security issues here. On the other hand, evaldict really does have the same semantics as an eval, right? Probably it is no more or less dangerous if you only do a single level of EvalDict-ing. But once you get into multiple levels you could get into a situation where user-provided code is being evaluated. The first level of EvalDict incorporates the user-provided code into the string and the second level evaluates it. Ping's current runtime implementation does use "eval" but you could imagine an alternate implementation that actually parses the relevant parts of the string according to the Python grammar, and merely applies the appropriate semantics. It would use "." to trigger getattr, "()" to trigger apply, "[]" to trigger getitem and so forth. Then there would be no eval and thus way to eval user-provided code. Paul Prescod
Paul> Sorry I wasn't clear. Let's say it's the second hour of our Paul> Perl/Python class. Paul> Here's Perl: Paul> $a = 5; Paul> $b = 6; Paul> print "$a $b"; ... Paul> Here's Python: Paul> a = 5; Paul> b = 6; Paul> print "%(a)s %(b)s" % vars() So? There are some things Perl does better than Python, some things Python does better than Perl. Maybe this is a (small) notch in Perl's gun. It just doesn't seem significantly better enough to me to warrant a language change. I would have written the Python example as print a, b For the simple examples that would normally arise in an introductory programming class, I think Python's print statement works just fine. For more hairy cases, Perl probably wins. That's life. but-that's-just-me-ly, y'rs, -- Skip Montanaro (skip@pobox.com - http://www.mojam.com/)
On Mon, 14 Jan 2002, Paul Prescod wrote:
Steven Majewski wrote:
....
But if you're going to allow interpolation of the results of arbitrary function into a string, it's going to be a security problem whether or not you use 'eval' to do it. My code hides the eval in the object's python code. u" strings would hide the eval in the C code. How is one more or less secure than the other.
I think you mean $" strings, not u" strings. Given:
Oops. Yes.
a = $"foo.bar: $foo.bar(abc, 5)"
I can translate that *at compile time* to:
a = $"foo.bar: %s" % foo.bar(abc, 5)
No runtime evaluation is necessary. So I see no security issues here. On the other hand, evaldict really does have the same semantics as an eval, right? Probably it is no more or less dangerous if you only do a single level of EvalDict-ing. But once you get into multiple levels you could get into a situation where user-provided code is being evaluated. The first level of EvalDict incorporates the user-provided code into the string and the second level evaluates it.
The multiple level was an addition to the last version because that was what some people expressed a desire for in the earlier string interpolation discussion. EvalDict2 does a single level eval. ( Again: that seems to me to be an argument for several alternative object versions rather than one builtin syntax change. )
Ping's current runtime implementation does use "eval" but you could imagine an alternate implementation that actually parses the relevant parts of the string according to the Python grammar, and merely applies the appropriate semantics. It would use "." to trigger getattr, "()" to trigger apply, "[]" to trigger getitem and so forth. Then there would be no eval and thus way to eval user-provided code.
The same things holds for an object implementation. eval isn't required for an implementation. But EVERY implementation of that semantics allows implicit function calls. ( I was going to say 'hidden' function calls, but I'll admit that may be provocative/argumentative.) Your point about compile time optomization holds here: yes, the builtin syntax version allows much of that analysis to be done at compile time, while the object version would need to do all of the analysis on the fly at execution. However, as I noted -- the object implementation would allow customizing a restricted environment ( which is a simple security implementation than code analysis.) And having an explicit argument for the namespace allows more control, as well as reminding you of the magic going on behind the curtains. At least if there's a security problem, you have somewhere to look for holes other than the Python C source code. If I keep an eval based implementation, I probably ought to make a restricted __builtin__ the default. -- Steve
But if you're going to allow interpolation of the results of arbitrary function into a string, it's going to be a security problem whether or not you use 'eval' to do it. My code hides the eval in the object's python code. u" strings would hide the eval in the C code. How is one more or less secure than the other.
There is no security issue with PEP 215. $"$a and $b make $c" <==> ("%s and %s make %s" % (a, b, c)) These two are completely equivalent under PEP 215, and therefore equally secure. ## Jason Orendorff http://www.jorendorff.com/
The Jython 2cts. An eval implementation means that for Jython a code using it cannot be run in a Java sand-box context, eval does not work there.
If I keep an eval based implementation, I probably ought to make a restricted __builtin__ the default.
Jython does not support CPython restricted execution. Probably never will. For what it counts I don't care having string interpolation a la Perl in Python. cheers, Samuele Pedroni.
Steven Majewski writes:
How does Perl handle it if the tokens aren't whitespace separated? Is there an optional enclosing bracket as in shell syntax ?
Yes.
How do you do: "%(word)sly yours" % vocabulary ?
I've not a clue... manually scan the format string, perhaps? -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation
Paul> Sorry I wasn't clear. Let's say it's the second hour of our Paul> Perl/Python class.
Paul> Here's Perl:
Paul> $a = 5; Paul> $b = 6; Paul> print "$a $b";
...
Paul> Here's Python:
Paul> a = 5; Paul> b = 6; Paul> print "%(a)s %(b)s" % vars()
How does Perl handle it if the tokens aren't whitespace separated? Is there an optional enclosing bracket as in shell syntax ? How do you do: "%(word)sly yours" % vocabulary ? (Sorry-- I stopped Perling somewhere around version 4.) -- Steve Majewski
Would someone please explain to me what is seen as a "possible security issue" in PEP 215? Can anyone propose some real-life situation where PEP 215 causes a vulnerability, and the corresponding % syntax doesn't? ## Jason Orendorff http://www.jorendorff.com/
On Mon, 14 Jan 2002, Jason Orendorff wrote:
But if you're going to allow interpolation of the results of arbitrary function into a string, it's going to be a security problem whether or not you use 'eval' to do it. My code hides the eval in the object's python code. u" strings would hide the eval in the C code. How is one more or less secure than the other.
There is no security issue with PEP 215.
$"$a and $b make $c" <==> ("%s and %s make %s" % (a, b, c))
These two are completely equivalent under PEP 215, and therefore equally secure.
Your right. I'm confusing PEP 215 with the discussion on PEP 215, where that feature was requested. However, if you allow array and member access as well, which Paul suggests, then you open the security problem back up unless you do some code analysis (as he also suggests) to make sure that [index] or .member doesn't perform a hidden function call ( A virus infected __getitem__ for example. ) -- Steve
Jason Orendorff wrote:
There is no security issue with PEP 215.
$"$a and $b make $c" <==> ("%s and %s make %s" % (a, b, c))
These two are completely equivalent under PEP 215, and therefore equally secure.
Not exactly. Say you have the code: secret_key = "spam" x = raw_input() print $"You entered $x" Imagine that the user enters "I'm 3l337, give me the $secret_key" as the input. Neil
On Mon, 14 Jan 2002, Jason Orendorff wrote:
Would someone please explain to me what is seen as a "possible security issue" in PEP 215? Can anyone propose some real-life situation where PEP 215 causes a vulnerability, and the corresponding % syntax doesn't?
Do you mean the current '%' or my expanded example ? Any expanded version -- mine or PEP 215 introduces possible security holes. ( And I'm not even sure that the current "%" doesn't have a hole if it's used "the wrong way" ) But, as Paul said, it depends on the implementation. I said in an earlied post that I confused PEP 215 with the discussion of PEP 215, where some expanded capabilities were suggested. However, on looking at it again closer, I would say that the examples in PEP 215 contradict the Security Considerations paragraph. It has expressions in it that can't be evaluated at compile time, and any list index or member reference can, in Python, invoke a hidden function call. Any implementation is going to require some run time checks. But just in case I'm seeing it all wrong: could you explain to me how PEP 215 *doesn't* have the potential of introducing a security hole ? If the current proof-of-concept implementation does use eval (as Paul stated), then there is (I believe) a security problem with that implementation. Paul has proposed some other implementation tricks, but I'm, not convinced that you can get the same semantics suggested in PEP 215's examples without requiring runtime checks. Since eval is a know security hole, I think the burden of proof is on the proponents. ( And I'm not even demanding proof -- just a convincing argument without too much hand waving and we-have-ways-of-dealing-with-that! ) -- Steve Majewski
Neil Schemenauer wrote:
Jason Orendorff wrote:
There is no security issue with PEP 215.
$"$a and $b make $c" <==> ("%s and %s make %s" % (a, b, c))
These two are completely equivalent under PEP 215, and therefore equally secure.
Not exactly. Say you have the code:
secret_key = "spam" x = raw_input() print $"You entered $x"
Imagine that the user enters "I'm 3l337, give me the $secret_key" as the input.
import Itpl import sys sys.stdout = Itpl.filter()
secret_key = "spam" x = raw_input() I'm 3l337, give me the $secret_key print "You entered $x" You entered I'm 3l337, give me the $secret_key
The substitution only happens once. ## Jason Orendorff http://www.jorendorff.com/
Steven Majewski wrote:
...
Your right. I'm confusing PEP 215 with the discussion on PEP 215, where that feature was requested.
However, if you allow array and member access as well, which Paul suggests, then you open the security problem back up unless you do some code analysis (as he also suggests) to make sure that [index] or .member doesn't perform a hidden function call ( A virus infected __getitem__ for example. )
If you have a virus-infected __getitem__ you are screwed regardless. We can't defend against that. The whole point is that we are never evaluating code provided by the user. "Safe" programmer-supplied literal strings are differentated at compile time from arbitrary strings. The interpolation engine only works on safe strings. Calling an overriden __getitem__ or .member is as safe as if they had done it in the way they would today: "%s" % foo.bar() Think of it as pure, compile-time syntactic sugar. If you want it to act like eval, I guess you would do this: $"$(eval('....'))...." which would compile to: "%s" % eval('....') Paul Prescod
But just in case I'm seeing it all wrong: could you explain to me how PEP 215 *doesn't* have the potential of introducing a security hole ?
Gladly. Every $-string can be converted to equivalent code that uses only: a) whatever code the programmer explicitly typed in the $-string; b) str() or unicode(); and c) the + operator applied to strings. Therefore $ is exactly as secure or insecure as those three pieces. All three of these things are just as safe as the non-PEP-215 features that we're already using. Therefore $-strings do not introduce any new security hole. ## Jason Orendorff http://www.jorendorff.com/
Steven Majewski wrote:
On Mon, 14 Jan 2002, Jason Orendorff wrote:
Would someone please explain to me what is seen as a "possible security issue" in PEP 215? Can anyone propose some real-life situation where PEP 215 causes a vulnerability, and the corresponding % syntax doesn't?
Do you mean the current '%' or my expanded example ?
I mean the current %. Well? ## Jason Orendorff http://www.jorendorff.com/
Jason Orendorff wrote:
The substitution only happens once.
My example was not well thought out. I was thinking something more like: secret_key = "spam" user = "joe" x = "$user said: " + raw_input() print $x That wouldn't work either since $ only evaluates literals. Amazing what you learn by actually reading the PEP. Yes, I'm an idiot. After reading PEP 215 I like it a lot. The fact that $ can only apply to literals completely solves this issue. Has Guido weighed in on it yet? I didn't find anything in the mail archives from him. Neil
Skip Montanaro wrote:
...
So? There are some things Perl does better than Python, some things Python does better than Perl.
It doesn't have anything to do with competing with Perl. It is just about learning from things that other languages do better (in this case simpler) than Python. This feature came from the Bourne shell and is also present in DOS batch, TCL, Ruby, PHP. Python's "%" is much better than nothing (which is what Javascript has) but it is still a pain. First you use it with positional arguments and then realize that is getting confusing so you switch to dictionary arguments and then that gets unweildy because you're just declaring new names for existing variables so you use vars(). But then you want to interpolate the result of a function call or expression. So you have to set up a one-time-use variable. PEP 215 (which I did not write!) unifies all of the use cases into one syntax that can be taught in ten minutes. The % syntax is fine for totally different use cases: printf-style formatting and interpolation of strings that might be generated at runtime. Paul Prescod
On Mon, 14 Jan 2002, Jason Orendorff wrote:
But just in case I'm seeing it all wrong: could you explain to me how PEP 215 *doesn't* have the potential of introducing a security hole ?
Gladly.
Every $-string can be converted to equivalent code that uses only:
a) whatever code the programmer explicitly typed in the $-string; b) str() or unicode(); and c) the + operator applied to strings.
But the examples in PEP 215 don't follow those restrictions. That may be the source of the confusion. Maybe someone should revise the PEP for consistency before it's considered further. -- Steve.
Neil Schemenauer wrote:
Jason Orendorff wrote:
The substitution only happens once.
My example was not well thought out. I was thinking something more like:
secret_key = "spam" user = "joe" x = "$user said: " + raw_input() print $x
That wouldn't work either since $ only evaluates literals. Amazing what you learn by actually reading the PEP. Yes, I'm an idiot.
Sorry, I haven't followed this thread real closely, but I thought someone said eval() was used under the covers. If x is eval'ed and the string is as above, I get the following in 2.1: >>> secret_key = 'spam' >>> x = raw_input('? ') ? eval("secret_key") # Is the following commented print equivalent the the line below it? ### print "You entered $x" >>> print "You entered", eval(x) You entered spam >>> print "You entered %(x)s" % locals() You entered eval("secret_key") Not sure if that's the same as what you are talking about though. Neal
On Mon, 14 Jan 2002, Paul Prescod wrote:
... then realize that is getting confusing so you switch to dictionary arguments and then that gets unweildy because you're just declaring new names for existing variables so you use vars(). But then you want to interpolate the result of a function call or expression. So you have to set up a one-time-use variable.
PEP 215 (which I did not write!) unifies all of the use cases into one syntax that can be taught in ten minutes. The % syntax is fine for totally different use cases: printf-style formatting and interpolation of strings that might be generated at runtime.
But Jason just said that function calls are not allowed. ( We -- actually, he listed what was allowed, and function calls were definitely not among them. ) PEP 215's examples don't agree with the limitations in it's security section, and the proposal being discussed seems to be shifting under out feet. That's the reason I got the proposals given in the previous discussion of PEP 215 and PEP 215 itself confused. -- Steve
Steven Majewski wrote:
...
But Jason just said that function calls are not allowed. ( We -- actually, he listed what was allowed, and function calls were definitely not among them. )
I misread Jason's list at first myself. Jason was describing the *output* of the transformation. He said that the output of the transformation would be no more and no less than directly typed code with a) whatever code the programmer explicitly typed in the $-string; b) str() or unicode(); and "$" has the power to eval, but only to eval a literal. As described here (a string prefix rather than an operator c) the + operator applied to strings. "a)" embodies a whole host of things listed in the PEP: "A Python identifier optionally followed by any number of trailers, where a trailer consists of: - a dot and an identifier, - an expression enclosed in square brackets, or - an argument list enclosed in parentheses (This is exactly the pattern expressed in the Python grammar by "NAME trailer*", using the definitions in Grammar/Grammar.)" The PEP also has examples:
print $'References to $a: $sys.getrefcount(a)' References to 5: 15
PEP 215's examples don't agree with the limitations in it's security section,
To summarize the security section, it says: *All of the text that is ever processed by this mechanism is textually present in the Python program at compile time*. In other words, users of the program can never submit information and have it be evaluated by this mechanism. Paul Prescod
On Mon, 14 Jan 2002, Neil Schemenauer wrote:
Amazing what you learn by actually reading the PEP.
May i quote you on that? :) Just kidding. More seriously: there is no security issue introduced by PEP 215. I saw the concerns being raised in the previous e-mail messages on this topic, but every time i was about to compose a reply, i found that Jason Orendorff had already provided exactly the explanation i was about to give, or better. So, thank you, Jason. :) In short: PEP 215 suggests a syntactic transformation that turns $'the $quick brown $fox()' into the fully equivalent 'the %s brown %s' % (quick, fox()) The '$' prefix only applies to literals, and cannot be used as an operator in front of other expressions or variables. This issue is pointed out specifically in the PEP: '$' works like an operator and could be implemented as an operator, but that prevents the compile-time optimization and presents security issues. So, it is only allowed as a string prefix. Therefore, this transformation executes *only* code that was literally present in the original program. (An example of this transformation is given at the end of PEP 215 in the "Implementation" section.) (By the way, i myself am not yet fully convinced that a string interpolation feature is something that Python desperately needs. I do see some considerable potential for good, and so the purpose of PEP 215 was to put a concrete and plausible proposal on the table for discussion. Given that proposal, which i believe to be about as good as one could reasonably expect, we can hope to save ourselves the expense of re-arguing the same issues repeatedly, and make an informed decision about whether to add the feature. Among the possible drawbacks/complaints i see are: more work for automated source code tools, tougher editor syntax highlighting, too many messy string prefix characters, and the addition of yet one more Python feature to teach and document. Security, however, is not among them.) -- ?!ng
Steven Majewski wrote:
On Mon, 14 Jan 2002, Jason Orendorff wrote:
But just in case I'm seeing it all wrong: could you explain to me how PEP 215 *doesn't* have the potential of introducing a security hole ?
Gladly.
Every $-string can be converted to equivalent code that uses only:
a) whatever code the programmer explicitly typed in the $-string; b) str() or unicode(); and c) the + operator applied to strings.
But the examples in PEP 215 don't follow those restrictions.
I dunno, it looks like they do to me. $'a = $a, b = $b' ---> ('a = ' + str(a) + ', b = ' + str(b)) $u'uni${a}ode' ---> (u'uni' + unicode(a) + u'ode') $'\$a' ---> ('\\' + str(a)) $r'\$a' ---> ('\\' + str(a)) $'$$$a.$b' ---> ('$' + str(a) + '.' + str(b)) $'a + b = ${a + b}' ---> ('a + b = ' + str(a + b)) $'References to $a: $sys.getrefcount(a)' ---> ('References to ' + str(a) + ': ' + str(sys.getrefcount(a))) $"sys = $sys, sys = $sys.modules['sys']" ---> ('sys = ' + str(sys) + ', sys = ' + str(sys.modules['sys'])) $'BDFL = $sys.copyright.split()[4].upper()' ---> ('BDFL = ' + str(sys.copyright.split()[4].upper())) In every case, the equivalent uses a) some bits of code that the programmer explicitly typed in the $-string; b) str() or unicode(); c) and the + operator (to join the resulting strings). I guess you're thinking "but those bits of code are invoking other functions that aren't in your list". My point is, the equivalent print statement, or % expression (the existing %, not your proposed %) does the exact same thing. print $'here we go: $y maps to $x[y]' print 'here we go: %s maps to %s' % (y, x[y]) print 'here we go:', y, 'maps to', x[y] print 'here we go: ' + str(y) + ' maps to ' + str(x[y]) Is one of these less secure than the others somehow? There is no new security hole here. ## Jason Orendorff http://www.jorendorff.com/
The PEP:
'$' works like an operator and could be implemented as an operator, but that prevents the compile-time optimization and presents security issues. So, it is only allowed as a string prefix.
I'd like to see the '$' prefix replaced with an ordinary character such as 'i'. '$' is currently unused in Python and so can be used for future extension either as a new operator or as the basis for new operators. Interpolation strings consume this character so it can no longer be chosen as a new operator. Neil
Paul> But then you want to interpolate the result of a function call or Paul> expression. So you have to set up a one-time-use variable. As has been demonstrated, there are several ways to tackle this problem. I first saw something headed in this direction with Zope's (actually DocumentTemplate's) MultiMapping class several years ago. It only aimed to make it easy to interpolate named parameters from several dictionaries simultaneously. Steve Majewski and others have shown how you can do this with an EvalDict type of class, so it's not like you can't do this today. The point is for something to be really worth modifying the syntax of the language I think it has to demonstrate that it's significantly better than the alternatives. The security argument is a red herring. There are enough other ways programmers can blow their feet off. If someone is naive enough to execute the moral equivalent of print raw_input() % EvalDict3() in their programs they will probably learn fairly quickly that it's a questionable programming practice. Paul> PEP 215 (which I did not write!) unifies all of the use cases into Paul> one syntax that can be taught in ten minutes. It unifies all the use cases into *two* syntaxes. The preexisting %-formatted strings aren't going away anytime soon. They are suitable for most applications, so new users would have to contend with at least being able to read, if not write, both forms of string interpolation for the forseeable future if PEP 215 is adopted. It hasn't been demonstrated to me that Steve's EvalDict or something similar couldn't be taught in a similar amount of time. It has the added advantage that it's essentially the same syntax as the current % syntax. You can use expressions where before you had to restrict yourself to names. It requires no change to the language. Just drop it into a module in the std library and away you go. In fact, coded properly (which Steve is eminently capable of doing) it would be 100% backward compatible. People running essentially any version of Python could use it. (I believe Pythonware still makes a 1.4 installer available for Windows.) Paul> The % syntax is fine for totally different use cases: printf-style Paul> formatting and interpolation of strings that might be generated at Paul> runtime. What do you mean by "totally different"? Most examples I've seen so far have looked pretty much like print $"$a $b" which probably covers about 90% of common usage anyway. The examples in PEP-215 don't look any more different than an EvalDict-like class could comfortably handle today either. -- Skip Montanaro (skip@pobox.com - http://www.mojam.com/)
$'BDFL = $sys.copyright.split()[4].upper()' ---> ('BDFL = ' + str(sys.copyright.split()[4].upper())) How to you know when to stop gobbling after seeing a dollar sign in the string? -- Skip Montanaro (skip@pobox.com - http://www.mojam.com/)
On Mon, 14 Jan 2002, Skip Montanaro wrote:
$'BDFL = $sys.copyright.split()[4].upper()' ---> ('BDFL = ' + str(sys.copyright.split()[4].upper()))
How to you know when to stop gobbling after seeing a dollar sign in the string?
Parse using the "NAME trailer*" production in Grammar/Grammar. -- ?!ng
On Mon, 14 Jan 2002, Skip Montanaro wrote:
$'BDFL = $sys.copyright.split()[4].upper()' ---> ('BDFL = ' + str(sys.copyright.split()[4].upper()))
How to you know when to stop gobbling after seeing a dollar sign in the string?
Parse using the "NAME trailer*" production in Grammar/Grammar.
Except that whitespace is significant, at least in the sample implementation:
i = Itpl.itpl x=4 y=3 i("This is x: $x. This is y: $y.") # doesn't grab (x.This) 'This is x: 4. This is y: 3.' i("This is x: $x.This is y: $y.") # does grab (x.This) AttributeError: 'int' object has no attribute 'This'
This doesn't seem to be mentioned in the PEP. ## Jason Orendorff http://www.jorendorff.com/
Steve Majewski wrote:
But Jason just said that function calls are not allowed. ( We -- actually, he listed what was allowed, and function calls were definitely not among them. ) [...]
Well, when the $-string explicitly contains the name of the function to be called, then that falls into category (a). I wrote:
a) whatever code the programmer explicitly typed in the $-string;
I hope this makes things clearer and not worse. :-) ## Jason Orendorff http://www.jorendorff.com/
On Mon, 14 Jan 2002, Jason Orendorff wrote:
Steven Majewski wrote:
On Mon, 14 Jan 2002, Jason Orendorff wrote:
Would someone please explain to me what is seen as a "possible security issue" in PEP 215? Can anyone propose some real-life situation where PEP 215 causes a vulnerability, and the corresponding % syntax doesn't?
Do you mean the current '%' or my expanded example ?
I mean the current %.
Well?
Paul is the one who (rightly) brought up the issue of security with respect to double evaluated strings. But in addition, he seemed to be saying that you can do more with a compile time test than you can with a runtime test. I disagree with that. I think, for the same semantics, you get the same security issues. I think it's very similar to the compile time type checking vs. dynamic typing problem. (In fact, I think it reduces to the same problem.) There are clearly some advantages to doing things compile time, but you don't get more security without more restriction. -- Steve
Steven Majewski wrote:
On Mon, 14 Jan 2002, Jason Orendorff wrote:
Steven Majewski wrote:
On Mon, 14 Jan 2002, Jason Orendorff wrote:
Would someone please explain to me what is seen as a "possible security issue" in PEP 215? Can anyone propose some real-life situation where PEP 215 causes a vulnerability, and the corresponding % syntax doesn't?
Do you mean the current '%' or my expanded example ?
I mean the current %.
Well?
Paul is the one who (rightly) brought up the issue of security with respect to double evaluated strings. But in addition, he seemed to be saying that you can do more with a compile time test than you can with a runtime test. I disagree with that.
I think, for the same semantics, you get the same security issues. I think it's very similar to the compile time type checking vs. dynamic typing problem. (In fact, I think it reduces to the same problem.)
There are clearly some advantages to doing things compile time, but you don't get more security without more restriction.
As long as this "security issue" thread dies, I'm happy. ## Jason Orendorff http://www.jorendorff.com/
Steven Majewski wrote:
... Paul is the one who (rightly) brought up the issue of security with respect to double evaluated strings. But in addition, he seemed to be saying that you can do more with a compile time test than you can with a runtime test. I disagree with that. ... I think, for the same semantics, you get the same security issues.
Sure, for the same semantics. But EvalDict doesn't have the same semantics. Even if we ignore double interpolation there is the issue of code like this:
def double(): ... user_val = raw_input("Please enter a number:") ... print "%(2*user_val)" % EvalDict
double() Please enter a number: 3 + (os.system("rm -rm *"))
For EvalDict to have the same semantics as PEP 215 it would have to disallow interpolations on strings that were not string literals. This would make the EvalDict object somewhat different than any other object in the Python library. Plus it would require compiler support which would break compatibility with older Pythons. Paul Prescod
On Mon, 14 Jan 2002, Ka-Ping Yee wrote:
The '$' prefix only applies to literals, and cannot be used as an operator in front of other expressions or variables. This issue is pointed out specifically in the PEP:
I think the term "the '$' prefix" was one of the sources of my confusion, as '$' is both a string prefix and a symbol prefix within the string. I think I read "the '$' prefix" as referreing to the second kind where you meant the first. The same goes for discussion of '$' as an operator. (This misreading was the source of the inconsistency I thought I saw between the examples and other statements.)
Therefore, this transformation executes *only* code that was literally present in the original program. (An example of this transformation is given at the end of PEP 215 in the "Implementation" section.)
O.K. Jason's explaination finally got thru to me: it's more clear if I think of it as a preprocessor that really doesn't add any capabilities to the language. I should think of it more like the 'r' string prefix, which is just a syntactic convenience, rather than like the 'u' string prefix, which creates a special kind of (unicode) string. ( Well, it *does* create a special kind of string in the runtime, but you can't access that string to to do anything strange in Python, because as soon as it's assigned, it gets transformed into a 'normal string' . Thinking of it as a preprocessor makes that more obvious.)
(By the way, i myself am not yet fully convinced that a string interpolation feature is something that Python desperately needs. I do see some considerable potential for good, and so the purpose of PEP 215 was to put a concrete and plausible proposal on the table for discussion. Given that proposal, which i believe to be about as good as one could reasonably expect, we can hope to save ourselves the expense of re-arguing the same issues repeatedly, and make an informed decision about whether to add the feature.
Among the possible drawbacks/complaints i see are: more work for automated source code tools, tougher editor syntax highlighting, too many messy string prefix characters, and the addition of yet one more Python feature to teach and document. Security, however, is not among them.)
I'm not wild about more string prefixes, but we've already started down that road, so I can't complain too much. But, as you've already noted: it doesn't add any new capability, just new syntax. ( But it probably as justifiable as the raw string syntax. ) Although I've knocked the idea in the past, I'ld almost rather see some sort of 'macro' facility for python, than to see a bunch of special case syntax added to the language for every feature. -- Steve
Steve Majewski wrote:
[...] it's more clear if I think of it as a preprocessor that really doesn't add any capabilities to the language. I should think of it more like the 'r' string prefix, which is just a syntactic convenience, rather than like the 'u' string prefix, which creates a special kind of (unicode) string. ( Well, it *does* create a special kind of string in the runtime, but you can't access that string to to do anything strange in Python, because as soon as it's assigned, it gets transformed into a 'normal string' . Thinking of it as a preprocessor makes that more obvious.)
Yep, I agree, and I'm glad we're all at least seeing PEP 215 the same way now. :-) However, I don't think it would need a special kind of string in the runtime. Thinking of it as a preprocessor, I believe it would only need to generate some Python bytecode that uses the existing str or unicode types. Now I can go back to being neutral on PEP 215. :-) ## Jason Orendorff http://www.jorendorff.com/
On Mon, 14 Jan 2002, Paul Prescod wrote:
Sure, for the same semantics. But EvalDict doesn't have the same semantics. Even if we ignore double interpolation there is the issue of code like this:
def double(): ... user_val = raw_input("Please enter a number:") ... print "%(2*user_val)" % EvalDict
double() Please enter a number: 3 + (os.system("rm -rm *"))
But in EvalDict you have to explicitly pass it a namespace dict. You just don't pass it one with access to os.system ( or most other os calls. ) That's why I disliked an implicit namespace. But your example suggests to me:
input('?: ') ?: r'raw string' 'raw string'
input('?: ') ?: u'unicode string' u'unicode string'
input('?: ') ?: $'$os.system("rm -rm *" )'
I guess you need to special case that out of the compiler also. ( Are there any others lurking about ? ) -- Steve
"PP" == Paul Prescod
writes:
PP> He said that the output of the transformation would be no more PP> and no less than directly typed code with | a) whatever code the programmer explicitly typed | in the $-string; | b) str() or unicode(); and | "$" has the power to eval, but only to eval a literal. As | described here (a string prefix rather than an operator | c) the + operator applied to strings. PP> "a)" embodies a whole host of things listed in the PEP: PP> "A Python identifier optionally followed by any number of PP> trailers, where a trailer consists of: - a dot and an PP> identifier, - an expression enclosed in square brackets, or - PP> an argument list enclosed in parentheses (This is exactly the PP> pattern expressed in the Python grammar by "NAME trailer*", PP> using the definitions in Grammar/Grammar.)" Not to pick on Paul, but I'm having a hard time imagining how a newbie Python user being taught this new feature in his second hour will actually understand any of these rules. And how will you later answer their questions about why Python has both $'' literals and '' % dict interpolation when it seems like you can do basically the same task using either of them?
"KY" == Ka-Ping Yee
writes:
KY> In short: PEP 215 suggests a syntactic transformation that KY> turns KY> $'the $quick brown $fox()' KY> into the fully equivalent KY> 'the %s brown %s' % (quick, fox()) KY> The '$' prefix only applies to literals, and cannot be used as KY> an operator in front of other expressions or variables. This KY> issue is pointed out specifically in the PEP: [...then...] KY> Good point. Perhaps it is better to simply describe a KY> transformation using '%s' and '%' instead of 'str' and '+' KY> to avoid this potential confusion altogether. That would help <wink>. KY> (By the way, i myself am not yet fully convinced that a string KY> interpolation feature is something that Python desperately KY> needs. I am definitely not convinced that Python desperately needs PEP 215. I wonder if the same folks clamoring for it will be the same folks who raise their hands next month when asked again if they think Python is change too fast (naw, that won't happen :). How many of you use Itpl regularly? If Python were so deficient in this regard, I would expect to see a lot of hands. It's certainly easy enough to define in today's Python, a simple function call that adds only two characters to the proposal, so I don't buy that this /only/ has utility if were to apply to literals. I'm willing to accept that as applied only to literals it doesn't raise more security concerns, but it also isn't nearly as useful then IMO. And BTW, as I've told Ka-Ping before, I /am/ sympathetic to many of the ideas in this PEP and in Itpl. In fact, I have something very similar in Mailman that I use all the time[1]. Instead of $'...' I spell it _('...') which actually stands out better to me, and is only two extra characters. It's not as feature rich as PEP 215, but then about the /only/ thing I'd add would be attribute access. As it is, _('You owe me %(num)d dollars for that %(adj)s parrot') gets me there 9 times out of 10, while for the 10th bird = cage.bird state = bird.wake_up() days = int(time.time() - bird.lastmodtime) / 86400 _('That %(bird)s has been %(state)s for %(days)s') is really not much more onerous, and certainly less jarring to my eye than all those $ signs. -1 -Barry [1] I use _() ostensibly to mark translatable strings, but it has a side benefit in that it interpolates into the string named variables from the locals and globals of the calling context. It does this by using sys._getframe(1) in Python 2.1 and try/except hackery in older versions of Python. I find it quite handy, and admittedly magical, but then I'm not suggesting it become a standard Python feature. :)
"SM" == Steven Majewski
writes:
SM> Since PEP 216 on string interpolation is still active, I'ld
SM> appreciate it if some of it's supporters would comment on my
SM> revised alternative solution (posted on comp.lang.python and
SM> at google thru):
[Steve's EvalDict]
For completeness, here's a simplified version of Mailman's _()
function which does auto-interpolation from locals and globals of the
calling context. This version works in Python 2.1 or beyond and has
the i18n translation stuff stripped out. For the full deal, see
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/mailman/mailman/Mailman/i18n.py?rev=2.4&content-type=text/plain
Cheers,
-Barry
-------------------- snip snip --------------------dollar.py
import sys
from UserDict import UserDict
from types import StringType
class SafeDict(UserDict):
"""Dictionary which returns a default value for unknown keys."""
def __getitem__(self, key):
try:
return self.data[key]
except KeyError:
if isinstance(key, StringType):
return '%('+key+')s'
else:
return '
"Barry A. Warsaw" wrote:
...
Not to pick on Paul, but I'm having a hard time imagining how a newbie Python user being taught this new feature in his second hour will actually understand any of these rules.
It's relatively simple. "You can do attribute access and function or method calls. You can wrap things in parens do to more complicated expressions." I would also be interested in a version of PEP 215 that merely required parens all of the time. $"$(foo) $(5 + bar)" I have always been nervous when I start new languages about how the interpolation strings figure out where they end.
... And how will you later answer their questions about why Python has both $'' literals and '' % dict interpolation when it seems like you can do basically the same task using either of them?
One is for working with literals and the other for working with computed strings that arise in your code. It's one of those things where you use the simple way you are taught in class until you find a case where you can't use it any more and then you'll understand why you need the advanced way. Today's situation is that you are probably taught about three or four ways in class because none of them is really particularly "advanced".
...
I am definitely not convinced that Python desperately needs PEP 215.
I don't think anybody is convinced that Python desperately needs PEP AFAIK, it hasn't been touched since July 2000. How could a 10 year old language desperately need ANY syntactic sugar? If we survived until now without something then we could probably survive another few years.
I wonder if the same folks clamoring for it will be the same folks who raise their hands next month when asked again if they think Python is change too fast (naw, that won't happen :).
Ummm. Who is clamoring for this feature? We were presented with a newer proposal to be compared with PEP 215. Some of us came to the conclusion that PEP 215 is better than the new proposal. Nobody has, AFAIK, proposed to complete or implement the PEP.
How many of you use Itpl regularly? If Python were so deficient in this regard, I would expect to see a lot of hands. ....
The hassle of an extra dependency is without a doubt greater than the hassle of working around Python in this regard. But then there are may features in today's Python that fell into that category originally. Like you could get a form of type/class unification from ExtensionClass. But who would bother to install ExtensionClass just for that? Anyhow, Mailman's code demonstrates that when the feature is provided at low cost (i.e. no dependency), people use it.
is really not much more onerous, and certainly less jarring to my eye than all those $ signs.
This from mister print >>? ;) Paul Prescod
But your example suggests to me:
input('?: ') ?: $'$os.system("rm -rm *" )'
I guess you need to special case that out of the compiler also. ( Are there any others lurking about ? )
The user could just as well type ?: os.system("rm -rf *") and save some keystrokes. input() is totally insecure. Always has been. Nothing new here. ## Jason Orendorff http://www.jorendorff.com/
On Tue, Jan 15, 2002 at 02:04:10AM -0500, Barry A. Warsaw wrote:
[1] I use _() ostensibly to mark translatable strings, but it has a side benefit in that it interpolates into the string named variables from the locals and globals of the calling context. It does this by using sys._getframe(1) in Python 2.1 and try/except hackery in older versions of Python. I find it quite handy, and admittedly magical, but then I'm not suggesting it become a standard Python feature. :)
This caught my eye. How will programs that use PEP215 for string interpolation be translatable? All translation systems use some method of identifying the strings in source code, then permitting mapping from the string identifiers to the real strings at runtime. With "gettext", the "string identifier" is typically the original-language string, and the marker/mapper is spelled _("string literal"). Given that short introduction, it's obvious how _("hi there, %s") % yourname works, and why _("hi there, %s" % yourname) doesn't work, but how will I use a similar scheme to translate $"hi there, $yourname" ? Obviously, _($"hi there, $yourname") won't work, because it's equivalent to the second, non-working translation example. Well, I guess we could add _ and $_ strings to Python, right? grumble-grumble'ly yours, Jeff
"jepler" ==
writes:
jepler> Well, I guess we could add _ and $_ strings to Python, jepler> right? Ug. t'' strings have been discussed before w.r.t. i18n markup, but I don't like it. I think it's a mistake to proliferate string prefixes. But search the i18n-sig for more discussion on the topic. -Barry
On Wed, Jan 16, 2002 at 08:17:54AM -0500, Barry A. Warsaw wrote:
"jepler" ==
writes: jepler> Well, I guess we could add _ and $_ strings to Python, jepler> right?
Ug. t'' strings have been discussed before w.r.t. i18n markup, but I don't like it.
... and you like $'' strings? That suggestion was intended to bring a bad taste to *everybody*'s mouth, as much as t'' alone does to yours. (Hmm, and then I might need a raw unicode interpolated translated string ... is that spelled $_ur'' or r_$u'' ?) Jeff
On Wed, Jan 16, 2002 at 08:17:54AM -0500, Barry A. Warsaw wrote:
Ug. t'' strings have been discussed before w.r.t. i18n markup, but I don't like it.
"JE" == Jeff Epler
writes:
JE> ... and you like $'' strings? No! :) JE> That suggestion was intended to bring a bad taste to JE> *everybody*'s mouth, as much as t'' alone does to yours. Ah, no wonder I've had to drink 3 sodas today. I wondered what that foul flavor was, especially since I made sure to brush my teeth this morning! JE> (Hmm, and then I might need a raw unicode interpolated JE> translated string ... is that spelled $_ur'' or r_$u'' ?) Exactly why I'm against adding more string prefixes. Remember that the _ thingie we currently recommend for gettext /isn't/ prefix proliferation. E.g.: _(u'translate this') _(ru'and this') It's just a function call with a convenient name (and even that's just a convention, of course). -Barry
participants (13)
-
barry@zope.com
-
Fred L. Drake, Jr.
-
Jason Orendorff
-
Jeff Epler
-
jepler@inetnebr.com
-
Ka-Ping Yee
-
Neal Norwitz
-
Neil Hodgson
-
Neil Schemenauer
-
Paul Prescod
-
Samuele Pedroni
-
Skip Montanaro
-
Steven Majewski