[Python-checkins] r54109 - sandbox/trunk/pep3101/pep_differences.txt
patrick.maupin
python-checkins at python.org
Sat Mar 3 21:28:36 CET 2007
Author: patrick.maupin
Date: Sat Mar 3 21:28:34 2007
New Revision: 54109
Added:
sandbox/trunk/pep3101/pep_differences.txt
Log:
Added pep_differences.txt to document initial implementation target.
Updated README.txt to move info into pep_differences.
Cleaned up escape-to-markup processing to fix bug and enable
easy alternate syntax testing.
Changed version number in setup.py to reflect the fact we're not at 1.0 yet.
Added: sandbox/trunk/pep3101/pep_differences.txt
==============================================================================
--- (empty file)
+++ sandbox/trunk/pep3101/pep_differences.txt Sat Mar 3 21:28:34 2007
@@ -0,0 +1,299 @@
+
+This file describes differences between PEP 3101 and the C implementation
+in this directory, and describes the reasoning behind the differences.
+
+PEP3101 is a well thought out, excellent starting point for advanced string
+formatting, but as one might expect, there are a few gaps in it which were
+not noticed until implementation, and there are almost certainly gaps in
+the implementation which will not be noticed until the code is widely used.
+Fortunately, the schedule for both Python 2.6 and Python 3.0 have enough
+slack in them that if we work diligently, we can widely distribute a working
+implementation, not just a theoretical document, well in advance of the code
+freeze dates. This should allow for a robust discussion about the merits or
+drawbacks of some of the fine points of the PEP and the implementation by
+people who are actually **using** the code.
+
+This nice schedule has made at least one of the implementers bold enough
+to consider the first cut of the implementation "experimental" in the sense
+that, since there is time to correct any problems, the implementation can
+diverge from the PEP (in documented ways!) both for perceived flaws in
+the PEP, and also to add minor enhancements. The code is being structured
+so that it should be easy to subsequently modify the operation to conform
+to consensus opinion.
+
+
+GOALS:
+
+ Replace %
+
+The primary goal of the advanced string formatting is to replace the %
+operator. Not in a coercive fashion. The goal is to be good enough
+that nobody wants to use the % operator.
+
+
+ Modular design for subfunction reuse
+
+The PEP explicitly disclaims any attempt to replace string.Template,
+concentrating exclusively on the % operator. While this narrow focus
+is very useful in removing things like conditionals and looping from
+the discussion about the PEP, it ignores the reality that it might
+be useful to REUSE some of the C implementation code (particularly
+the per-field formatting) in templating systems. So the design of
+the implementation adds the goal of being able to expose some lower-
+level functions.
+
+
+ Efficiency
+
+It is not claimed that the initial implementation is particularly
+efficient, but it is desirable to tweak the specification in such
+a fashion that an efficient implementation IS possible. Since the
+goal is to replace the % operator, it is particularly important
+that the formatting of small strings is not prohibitively expensive.
+
+
+ Security
+
+Security is a stated goal of the PEP, with an apparent goal of being
+able to accept a string from J. Random User and format it without
+potential adverse consequences. This may or may not be an achievable
+goal; the PEP certainly has some features that should help with this
+such as the restricted number of operators, and the implemetation has
+some additional features, such as not allowing leading underscores
+on attributes by default, but these may be attempts to solve an
+intractable problem, similar to the original restricted Python
+execution mode.
+
+In any case, security is a goal, and anything reasonable we can do to
+support it should be done. Unreasonable things to support security
+include things which would be very costly in terms of execution time,
+and things which rely on the by now very much discredited "security
+through obscurity" approach.
+
+
+ Older Python Versions
+
+Some of the implementers have very strong desires to use this formatting
+on older Python versions, and Guido has mentioned that any 3.0 features
+which do not break backward compatibility are potential candidates for
+inclusion in 2.6.
+
+
+ No global state
+
+The PEP states "The string formatting system has two error handling modes,
+which are controlled by the value of a class variable." As has been
+discussed on the developer's list, this might be problematic, especially in
+large systems where components are being aggregated from multiple sources.
+One component might deliberately throw and catch exceptions in the string
+processing, and disabling this on a global basis might cause this component
+to stop working properly. If the ability to control this on a global
+basis is desirable, it is easy enough to add in later, but if it is not
+desirable, then deciding that after the fact and changing the code could
+break code which has grown to rely on the feature.
+
+
+FORMATTING METADATA
+
+The basic desired operation of the PEP is to be able to write:
+
+ 'some format control string'.format(param1, param2, keyword1=whatever, ...)
+
+Unfortunately, there needs to be some mechanism to handle out of band
+data for some formatting and error handling options. This could
+be really costly, if multiple options are looked up in the **keywords
+on every single call on even short strings, so some tweaks on the
+initial implementation are designed to reduce the overhead of looking
+up metadata. Two techniques are used:
+
+ 1) Lazy evaluation where possible. For example, the code does not
+ need to look up error-handling options until an error occurs.
+
+ 2) Metadata embedded in the string where appropriate. This
+ saves a dictionary lookup on every call. However this
+ is only appropriate when (a) the metadata arguably relates
+ to the actual control string and not the function where it
+ is being used; and (b) there are no security implications.
+
+
+DIFFERENCES BETWEEN PEP AND INITIAL IMPLEMENTATION:
+
+ Support for old Python versions
+
+The original PEP is Python 3000 only, which implies a lack of regular
+string support (unicode only). To make the code compatible with 2.6,
+it has been written to support regular strings as well, and to make
+the code compatible with earlier versions, it has been written to be
+usable as an extension module as well as/instead of as a string method:
+
+ from pep3101 import format
+ format('control string', parameter1, ...)
+
+
+ format_item function
+
+A large portion of the code in the new advanced formatter is the code
+which formats a single field according to the given format specifier.
+(Thanks, Eric!) This code is useful on its own, especially for template
+systems or other custom formatting solutions. The initial implementation
+will have a format_item function which takes a format specifier and a
+single object and returns a formatted result for that object and specifier.
+
+
+ comments
+
+The PEP does not have a mechanism for comments embedded in the format
+strings. The usefulness of comments inside format strings may be
+debatable, but the implementation is easy and easy to understand:
+
+ {#This is a comment}
+
+
+ errors and exceptions
+
+The PEP defines a global flag for "strict" or "lenient" mode. The
+implementation eschews the use of a global flag (see more information
+in the goals section, above), and splits out the various error
+features discussed by the PEP into different options. It also adds
+an option.
+
+The first error option is controlled by the optional _leading_underscores
+keyword argument. If this is present and evaluates non-zero, then leading
+underscores are allowed on identifiers and attributes in the format string.
+The implementation will lazily look for this argument the first time it
+encounters a leading underscore.
+
+The next error option is controlled by metadata embedded in the string.
+If "{!useall}" appears in the string, then a check is made that all
+arguments are converted. The decision to embed this metadata in the
+string can certainly be changed later; the reasons for doing it this
+way in the initial implementation are as follows:
+
+ 1) In the original % operator, the error reporting that an
+ extra argument is present is orthogonal to the error reporting
+ that not enough arguments are present. Both these errors are
+ easy to commit, because it is hard to count arguments and %s,
+ etc. In theory, the new string formatting should make it easier
+ to get the arguments right, because all arguments in the format
+ string are numbered or even named.
+
+ 2) It is arguably not Pythonic to check that all arguments to
+ a function are actually used by the execution of the function,
+ and format() is, after all, just another function. So it seems
+ that the default should be to not check that all the arguments
+ are used. In fact, there are similar reasons for not using
+ all the arguments here as with any other function. For example,
+ for customization, the format method of a string might be called
+ with a superset of all the information which might be useful to
+ view.
+
+ 3) Assuming that the normal case is to not check all arguments,
+ it is much cheaper (especially for small strings) to notice
+ the {! and process the metadata in the strings that want it
+ than it is to look for a keyword argument for every string.
+
+XXX -- need to add info on displaying exceptions in string vs. passing
+them up for looked-up errors. Also adding or not of string position
+information.
+
+
+ Getattr and getindex rely on underlying object exceptions
+
+For attribute and index lookup, the PEP specifies that digits will be
+treated as numeric values, and non-digits should be valid Python
+identifiers. The implementation does not rigorously enforce this,
+instead deferring to the object's getattr or getindex to throw an
+exception for an invalid lookup. The only time this is not true
+is for leading underscores, which are disallowed by default.
+
+
+ User-defined Python format function
+
+The PEP specifies that an additional string method, cformat, can be
+used to call the same formatting machinery, but with a "hook" function
+that can intercept formatting on a per-field basis.
+
+The implementation does not have an additional cformat function/method.
+Instead, user format hooks are accomplished as follows:
+
+ 1) A format hook function, with call signature and semantics
+ as described in the PEP, may be passed to format() as the
+ keyword argument _hook. This argument will be lazily evaluated
+ the first time it is needed.
+
+ 2) If "{!hook}" appears in the string, then the hook function
+ will be called on every single format field.
+
+ 3) If the last character (the type specifier) in a format field
+ is "h" (for hook) then the hook function will be called for
+ that field, even if "{!hook}" has not been specified.
+
+
+ User-specified dictionary
+
+The call machinery to deal with keyword arguments is quite expensive,
+especially for large numbers of arguments. For this reason, the
+implementation supports the ability to pass in a dictionary as the
+_dict argument. The _dict argument will be lazily retrieved the first
+time the template requests a named parameter which was not passed
+in as a keyword argument.
+
+
+ Name mapping
+
+To support the user-specified dictionary, a name mapper will first
+look up names in the passed keywords arguments, then in the passed
+_dict (if any).
+
+
+ Automatic locals/globals lookup
+
+This is likely to be a contentious feature, but it seems quite useful,
+so in it goes for the initial implementation. For security reasons,
+this happens only if format() is called with no parameters. Since
+the whole purpose of format() is to apply parameters to a string,
+a call to format() without any parameters would otherwise be a
+silly thing to do. We can turn this degenerate case into something
+useful by using the caller's locals and globals. An example from
+Ian Bicking:
+
+ assert x < 3, "x has the value of {x} (should be < 3)".format()
+
+
+ Syntax modes
+
+The PEP correctly notes that the mechanism used to delineate markup
+vs. text is likely to be one of the most controversial features,
+and gives reasons why the chosen mechanism is better than others.
+
+The chosen mechanism is quite readable and reasonable, but different
+problem domains might have differing requirements. For example,
+C code generated using the current mechanism could get quite ugly
+with a large number of "{" and "}" characters.
+
+The initial implementation supports the notion of different syntax
+modes. This is bad from the "more than one way to do it" perspective,
+but is not quite so bad if the template itself has to indicate if it
+is not using the default mechanism. To give reviewers an idea of
+how this could work, the implementation supports 4 different modes:
+
+ "{!syntax0}" -- the mode as described in the PEP
+ "{!syntax1}" -- same as mode 0, except close-braces
+ do not need to be doubled
+ "{!syntax2}" -- Uses "${" for escape to markup, "$${" for
+ literal "${"
+ "{!syntax3}" -- Like syntax0 "{" for escape to markup,
+ except literal "{" is denoted by "{ "
+ or "{\n" (where the space is removed but
+ the newline isn't).
+
+
+ Syntax for metadata in strings
+
+There have been several examples in this document of metadata
+embedded inside strings, for "hook", "useall", and "syntax".
+
+The basic metadata syntax is "{!<keyword>}", however to allow
+more readable templates, in this case, if the "}" is immediately
+followed by "\n" or "\r\n", this whitespace will not appear in
+the formatted output.
More information about the Python-checkins
mailing list