Yet another string formatting proposal
"\(a) + \(b) = \(a+b)\n" The expressions embedded in the string are parsed at compile time and any syntax errors in them are detected during compilation. The use of the backslash as introducer makes it unnecessary to add a new magic character ("$") along with a new escaping convention when this character needs to appear in the string ("$$") and a new string prefix (pep 215) or method (pep 292) to instruct the system to perform additional processing on this string. One advantage of using an operator, method or function over in-line formatting is that it enables the use of a template. A new string method can provide run-time evaluation of the same format: "\(a) + \(b) = \(a+b)\n" r"\(a) + \(b) = \(a+b)\n".cook() A raw string is used to defer the evaluation of all backslash escape sequences to some later time. The cook method evaluates backslash escapes in the string, including any embedded expressions. This runtime version may be used for internationalization, for example. By default, the cook method uses the global and local namespace of the calling scope, just like the built-in function eval(). Dictionary and/or named arguments may be used to override the namespace in which embedded expressions are evaluated: s = formatstring.cook(a=5, b=6) s = formatstring.cook(sys._getframe().f_locals) Security issues: Compile-time expression embedding should not have any special security concerns since there is no parsing of data from untrusted sources (if your SOURCE CODE is not trusted I can't help you there). In order to provide protection against evaluation of arbitrary code when an attacker has access to the format strings the cook() method could be limited to variable names only. A sparate cook_eval() method would support full expressions. The 'eval' in the method name should remind the programmer that it is potentially as dangerous as eval(). Drawbacks: Must use the full format "I like \(traffic) lights". There is no option for the shorter version "I like \traffic lights" because these combinations are already taken. May be considered an advantage: "There should be one-- and preferably only one --obvious way to do it." Not as familiar as $ for programmers from other languages. May also be considered an advantage :-) Oren
On Thu, Nov 21, 2002 at 08:24:54PM +0100, Fredrik Lundh wrote:
oren won't give up:
"\(a) + \(b) = \(a+b)\n"
The expressions embedded in the string are parsed at compile time and any syntax errors in them are detected during compilation.
note that "\(" is commonly used to escape parentheses in regular expression strings.
Yes, it might break some existing code that doesn't use proper \\ escaping or raw strings for regular expression. Note that such code is already broken in the sense that it uses an undefined escape. If this turns out to be a real problem a possible alternative is to use curly braces. There is a precedent for this in u"\N{UNICODE CHAR NAMES}" Braces are also more visually distinctive and less confusing when the expression itself contains parentheses: print "X=\{x}, y=\{calc_y(x)}" Oren
Oren Tirosh wrote:
Yes, it might break some existing code that doesn't use proper \\ escaping or raw strings for regular expression. Note that such code is already broken in the sense that it uses an undefined escape.
"not proper"? "broken"? "undefined"? Please read the section on string escapes in the *Python* language reference, and try again. Start here: http://www.python.org/doc/current/ref/strings.html
If this turns out to be a real problem a possible alternative is to use curly braces. There is a precedent for this in u"\N{UNICODE CHAR NAMES}"
Last time I checked, the N in \N was a character, not a curly brace. </F>
On Thu, Nov 21, 2002 at 10:22:48PM +0100, Fredrik Lundh wrote:
Oren Tirosh wrote:
Yes, it might break some existing code that doesn't use proper \\ escaping or raw strings for regular expression. Note that such code is already broken in the sense that it uses an undefined escape.
"not proper"? "broken"? "undefined"?
Please read the section on string escapes in the *Python* language reference, and try again. Start here:
Unlike Standard C, all UNRECOGNIZED escape sequences are left in the string unchanged, i.e., the backslash is left in the string. (This behavior is useful when deBUGging: if an escape sequence is MISTYPED, the resulting output is more easily recognized as BROKEN.) My mistake. I should have RTFM. There was no excuse for me calling such escape sequences "undefined" and not "proper" when the documentation describes escape sequences not listed in the table as merely "unrecognized" or possibly "mistyped". Sorry. Oren
participants (2)
-
Fredrik Lundh
-
Oren Tirosh