PEP 292 is slated for inclusion in Python 2.4, according to PEP 320. At Pycon I checked in my code for this into the sandbox, which I've since updated, adding unit tests. I believe it's ready for inclusion in dist CVS, but I want to get some feedback first. My new stuff provides two classes, dstring() as described in PEP 292 and astring() as hinted at in the PEP. It also provides two dictionary subclasses called safedict() and nsdict() which are not required, but work nicely with dstring() and astring() -- safedict re-expands keys instead of throwing exceptions, and nsdict does namespace lookup and attribute path expansion. Brett and I (I forget who else was there for this) talked about where to situate the PEP 292 support. The interesting idea came up to turn the string module into a package, providing backward support for the existing string module API, then exporting my PEP 292 modules into this namespace. This would make the 'import string' useful again since it would be a place to collect future string related functionality without having to claim some dumb name like 'stringlib'. I believe we can still someday deprecate the old string module functions, retaining the useful constants, as well as new string-y features. This is actually not hard to do, and I have this working (and passing the existing unit tests) in a local checkout. You simply create the Lib/string directory, copy or move Lib/string.py to Lib/string/string.py and do a bit of import-* in Lib/string/__init__.py. The unit tests all passed with no changes necessary. I'd drop my pep292.py file and safedict.py file into Lib/string and import the useful names out of there, exposing them in the string namespace. Is this a good idea? I dunno, but it seems better to me than adding two more top-level modules with largely contrived names; nothing better jumps out to me. I also really want to include safedict.py if we're including pep292.py because they're quite useful and complimentary, IMO, and I can't think of a better place to put those classes either. I'm open to suggestions. I have not yet written docs for these new classes, but will do so once we agree on where they're getting added. The code and test cases are in python/nondist/sandbox/string. -Barry
PEP 292 is slated for inclusion in Python 2.4,
For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}. For instance, what should happen with: "A dangling $" "A $!invalid_identifier" "A $identfier&followed_by_nonwhitespace_punctuation"
My new stuff provides two classes, dstring() as described in PEP 292 and astring() as hinted at in the PEP. It also provides two dictionary subclasses called safedict() and nsdict() which are not required, but work nicely with dstring() and astring() -- safedict re-expands keys instead of throwing exceptions, and nsdict does namespace lookup and attribute path expansion.
Brett and I (I forget who else was there for this) talked about where to situate the PEP 292 support. The interesting idea came up to turn the string module into a package, providing backward support for the existing string module API, then exporting my PEP 292 modules into
The names dstring(), astring(), safedict(), and nsdict() could likely be improved to be more suggestive of what they do. this
namespace. This would make the 'import string' useful again since it would be a place to collect future string related functionality without having to claim some dumb name like 'stringlib'. I believe we can still someday deprecate the old string module functions, retaining the useful constants, as well as new string-y features.
-1 The inclusion of string.py breathes life into something that needs to disappear. One of the reasons for deprecating these functions is to reduce the number of things you need to learn and remember. Interspersing a handful of new functions and classes is contrary to that goal. It becomes hard to introduce simplified substitutions without talking about all the other string functions that you're better off not knowing about. A separate module is preferable. Also, I don't see any benefit into rolling a package with safedict and nsdict in a separate module from dstring and astring.
I also really want to include safedict.py if we're including pep292.py because they're quite useful and complimentary, IMO, and I can't think of a better place to put those classes either.
Can safedict.safedict() be made more general so that it has value outside of string substitutions. Ideally, the default format would be customizable and would include an option to leave the entry unchanged. Right now, the implementation is specific to string substitution formats. It is not even useful with normal % formatting.
I'm open to suggestions. I have not yet written docs for these new classes, but will do so once we agree on where they're getting added. The code and test cases are in python/nondist/sandbox/string.
Given the simplicity of the PEP, the sandbox implementation is surprisingly intricate. Is it possible to simplify it with a function based rather than class based approach? I can imagine alternatives which encapsulate the whole idea in something similar to this: import re nondotted = re.compile(r'(\${2})|\$([_a-z][_a-z0-9]*)|\$({[_a-z][_a-z0-9]*})', re.IGNORECASE) dotted= re.compile(r'(\${2})|\$([_a-z][_.a-z0-9]*)|\$({[_a-z][_.a-z0-9]*})', re.IGNORECASE) def _convert(m): 'Convert $ formats to % formats' escaped, straight, bracketed = m.groups() if escaped is not None: return '$' if straight is not None: return '%(' + straight + ')s' return '%(' + bracketed[1:-1] + ')s' def subst(fmtstr, mapping, fmtcode=nondotted, _cache={}): if fmtstr not in _cache: _cache[fmtstr] = _fmtcode.sub(_convert, fmtstr) return _cache[fmtstr] % mapping
fmtstr = '$who owes me $$${what}.' mapping = dict(who='Guido', what='money')) print subst(fmtstr, mapping) Guido owes me $money.
Raymond
Raymond Hettinger wrote:
PEP 292 is slated for inclusion in Python 2.4,
For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}. For instance, what should happen with:
"A dangling $" "A $!invalid_identifier" "A $identfier&followed_by_nonwhitespace_punctuation"
Or, to pick a more common example: "$Id: rtp.py,v 1.40 2004/03/07 14:41:39 anthony Exp $"
On Tue, 2004-06-15 at 17:10, Raymond Hettinger wrote:
For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}.
Good point, I've pushed out an update.
The names dstring(), astring(), safedict(), and nsdict() could likely be improved to be more suggestive of what they do.
The 'd' is a mnemonic for 'dollar strings'. Similarly 'a' is for 'attribute path'. 'safedict' is meant to imply that it will not throw KeyError exceptions, and 'nsdict' indicates that namespace lookups are used. I'm certainly open to alternative suggestions, although sorry Tim, I'll reject 'hamstring' outright.
Brett and I (I forget who else was there for this) talked about where to situate the PEP 292 support. The interesting idea came up to turn the string module into a package
-1
:(
The inclusion of string.py breathes life into something that needs to disappear. One of the reasons for deprecating these functions is to reduce the number of things you need to learn and remember. Interspersing a handful of new functions and classes is contrary to that goal. It becomes hard to introduce simplified substitutions without talking about all the other string functions that you're better off not knowing about.
A separate module is preferable. Also, I don't see any benefit into rolling a package with safedict and nsdict in a separate module from dstring and astring.
Here's the point: we know that some of the names in the current string module will always be useful. I'd hate to see us have to come up with some contrived new module for things like string.letters to live in (e.g. 'stringlib' would suck). 'string' seems like such a useful name to keep as a place to collect future useful string-related constants, utilities, and functionality, of which PEP 292 support is perhaps just the first example. I'd be perfectly happy splitting string.py into two parts after moving it into Lib/string. One would be named 'deprecated.py' and that would contain all the someday-to-be-deleted functions. The other might be called 'constants.py' for lack of a better name, and would contain things we know we want to keep, like letters, hexdigits, etc. string/__init__.py can hide all that and it would be a simple matter once we ever decide to actually remove the deprecated functions <wink> to do it in two steps (strawman: remove the 'from deprecated import *' from Lib/string/__init__.py but leave the module around for diehards, then eventually remove the module). I don't think documentation is a problem. I'd propose (and would even write) splitting the current string module so that the deprecated functions are described in a subsection that doesn't appear on the main module page. That way, the documentation just describes the constants we want to keep and the new PEP 292 support (perhaps in another new subsection).
Can safedict.safedict() be made more general so that it has value outside of string substitutions.
It's such a trivial matter to subclass from dict and write your own __getitem__() that I doubt it's worth it.
Given the simplicity of the PEP, the sandbox implementation is surprisingly intricate. Is it possible to simplify it with a function based rather than class based approach?
Take away all the comments, and it's really a fairly simple implementation. I really want to use traditional % syntax to perform the substitutions since that's the Pythonically natural way to spell string interpolation. The only complication in the implementation is the cache of the converted-to-%s string in the subclass, but this is critical. In an i18n application you need the original string for catalog lookup, and the transformed string is only useful for the mod operation. -Barry
For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}.
Good point, I've pushed out an update.
The names dstring(), astring(), safedict(), and nsdict() could
Thanks. likely be
improved to be more suggestive of what they do.
The 'd' is a mnemonic for 'dollar strings'. Similarly 'a' is for 'attribute path'. 'safedict' is meant to imply that it will not throw KeyError exceptions, and 'nsdict' indicates that namespace lookups are used. I'm certainly open to alternative suggestions
I don't think documentation is a problem. I'd propose (and would even write) splitting the current string module so that the deprecated functions are described in a subsection that doesn't appear on the
Since this is in a string module, the "string" part of the name can be more abbreviated and the qualifier should be less abbreviated. dstring: dollarstr, formatstr, dollarfmt, template, kwdonly astring: attrstr, attrlookup, dottedfmt, kwdattr safedict: defaultdict, nsdict: nslookup, namespace, envdict Cheetah has been through several versions. Perhaps, they have worked out some better naming conventions. main
module page. That way, the documentation just describes the constants we want to keep and the new PEP 292 support (perhaps in another new subsection).
That's reasonable. A string module is the natural place to locate the simplified substitutions. Splitting out the old functions seems like a good way to re-use the string module without breathing life into things that I was hoping that people would forget ever existed (otherwise, we will never be rid of them). Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).
Can safedict.safedict() be made more general so that it has value outside of string substitutions.
It's such a trivial matter to subclass from dict and write your own __getitem__() that I doubt it's worth it.
True enough. Do consider having an optional argument for setting the default string. Ideally, the class should be useful with both $ formatted and % formatted strings (for instance, make it return the key unchanged when the key is not found). Also, since the implementation is so tightly bound to $ formatting, it makes no sense to put it in a separate module.
Given the simplicity of the PEP, the sandbox implementation is surprisingly intricate. Is it possible to simplify it with a function based rather than class based approach?
Take away all the comments, and it's really a fairly simple implementation. I really want to use traditional % syntax to perform the substitutions since that's the Pythonically natural way to spell string interpolation.
The overall goal of the PEP is simplification. It takes very little complexity before $ formatting becomes more complicated than % formatting. The % syntax has its share of issues (hard to find in the docs; precedence is more appropriate for integer modulo; tuple vs single string argument). If you give up the % syntax, you get perfectly pythonic method calls and an opportunity to do the whole job with only one exposed, differentiating the approaches with various method names: t = Template('$who owes me ${what') t.subst_from_dict(mydict) t.subst_from_env() t.subst_from_attr() t.subst(mydict, noexception=True) Something like this would mean that you don't need several different classes to do the job. Also, compare the following for obviousness and readability: Template('$name loves $spouse').subst(mydict, noexception=True) dstring('$name loves $spouse') % SafeDict(mydict)
In an i18n application you need the original string for catalog lookup, and the transformed string is only useful for the mod operation.
That settles that one. Raymond
On Wed, Jun 16, 2004, Raymond Hettinger wrote:
Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).
That's true. However, there has been a regular low-level discussion about creating a ``text`` package; why not simply name it ``string``? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Typing is cheap. Thinking is expensive." --Roy Smith, c.l.py
On Jun 24, 2004, at 10:27 AM, Aahz wrote:
On Wed, Jun 16, 2004, Raymond Hettinger wrote:
Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).
That's true. However, there has been a regular low-level discussion about creating a ``text`` package; why not simply name it ``string``?
If nothing else, that would cause hell for people who would like to use a backport of the package for Python N, where N is less than the first version that had this feature but still had the string module. -bob
On Thu, Jun 24, 2004, Bob Ippolito wrote:
On Jun 24, 2004, at 10:27 AM, Aahz wrote:
On Wed, Jun 16, 2004, Raymond Hettinger wrote:
Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).
That's true. However, there has been a regular low-level discussion about creating a ``text`` package; why not simply name it ``string``?
If nothing else, that would cause hell for people who would like to use a backport of the package for Python N, where N is less than the first version that had this feature but still had the string module.
This actually makes it *easier* to backport; you only take the submodule you want. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Typing is cheap. Thinking is expensive." --Roy Smith, c.l.py
On Jun 28, 2004, at 7:05 AM, Aahz wrote:
On Thu, Jun 24, 2004, Bob Ippolito wrote:
On Jun 24, 2004, at 10:27 AM, Aahz wrote:
On Wed, Jun 16, 2004, Raymond Hettinger wrote:
Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).
That's true. However, there has been a regular low-level discussion about creating a ``text`` package; why not simply name it ``string``?
If nothing else, that would cause hell for people who would like to use a backport of the package for Python N, where N is less than the first version that had this feature but still had the string module.
This actually makes it *easier* to backport; you only take the submodule you want.
Why is calling it string instead of text easier? You can't easily replace string, because site-packages comes late in sys.path. -bob
[Aahz]
[Raymond Hettinger]
Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).
That's true. However, there has been a regular low-level discussion about creating a ``text`` package; why not simply name it ``string``?
Please do not use, as package names, identifiers that users would likely want to keep for themselves. `text' and `string' are bad ideas for package names. `stringlib' seems much more likely do not hurt people. I know that `string' and `socket' exist, despite `string' is evanescent, but they surely forced users at choosing other identifiers where `string' and `socket' would have been perfect. It is very good news that, now in Python 2.3, `string' is unneeded most of times. Let us not repeat previous mistakes, or even nail them further by trying to be compatible with them. -- François Pinard http://www.iro.umontreal.ca/~pinard
François Pinard wrote:
I know that `string' and `socket' exist, despite `string' is evanescent, but they surely forced users at choosing other identifiers where `string' and `socket' would have been perfect. It is very good news that, now in Python 2.3, `string' is unneeded most of times. Let us not repeat previous mistakes, or even nail them further by trying to be compatible with them.
I would suggest that bare type names are rarely appropriate for use a variable names, except in toy examples. If I'm reading someone else's code, and they create a string or a socket, I want to know what it is _for_, rather than the mere fact this it is a string or a socket. If the type is all that is important, then prepending some simple word such as 'a_string' or 'the_string' or 'my_string' makes it clear to the maintainer that the object doesn't really have any significant semantic meaning beyond its type. Regards, Nick. -- Nick Coghlan | Brisbane, Australia Email: ncoghlan@email.com | Mobile: +61 409 573 268
[Nick Coghlan]
François Pinard wrote:
I know that `string' and `socket' [modules] exist, despite `string' is evanescent, but they surely forced users at choosing other identifiers where `string' and `socket' would have been perfect.
I would suggest that bare type names are rarely appropriate for use a variable names, except in toy examples.
Or small enough functions. Small functions are not necessarily toys.
If I'm reading someone else's code, and they create a string or a socket, I want to know what it is _for_, rather than the mere fact this it is a string or a socket.
If I write a function receiving a string as an argument, and the effect of the function being already documented, I see no point writing `parameter_string' or `the_argument_of_the_function' instead of `string', which is clear, clean and simple. Some people would write `s' instead, but for one, I stopped overly liking algebraic notation in programs after I left FORTRAN :-). When you speak to someone else about the argument of a simple function, don't you say "then the function takes the string, it massages the string this way, etc.". I like naming my variables the way I would speak about them! :-)
If the type is all that is important, then prepending some simple word such as 'a_string' or 'the_string' or 'my_string' makes it clear to the maintainer that the object doesn't really have any significant semantic meaning beyond its type.
Come on, be serious! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard
François Pinard wrote:
[Nick Coghlan]
I would suggest that bare type names are rarely appropriate for use a variable names, except in toy examples.
Or small enough functions. Small functions are not necessarily toys.
Hmm, I hadn't considered that case. I guess I tend not to write too many support functions where generic names would be appropriate (most of my Python code is very domain specific). Cheers, Nick. -- Nick Coghlan | Brisbane, Australia Email: ncoghlan@email.com | Mobile: +61 409 573 268
For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}.
Good point, I've pushed out an update.
One other thought, please reconsider the key lookup for ${identifier}. I think retaining the braces in the key is a mistake. The purpose of the braces was to allow trailing characters without intervening whitespace. Extending it to have special meaning for SafeDicts was probably not the way to go. As a result, the example in the PEP doesn't work anymore:
mapping = dict(name='Guido', country='the Netherlands') s = dstring('${name} was born in ${country}') print s % mapping
Traceback (most recent call last): File "C:\nondist\sandbox\string\pep292.py", line 124, in -toplevel- print s % mapping File "C:\nondist\sandbox\string\pep292.py", line 108, in __mod__ return self._modstr % other KeyError: '{name}' Raymond P.S. The PEP example is also missing the rightmost single quotation mark.
Barry Warsaw wrote:
On Tue, 2004-06-15 at 17:10, Raymond Hettinger wrote:
For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}.
Good point, I've pushed out an update.
The names dstring(), astring(), safedict(), and nsdict() could likely be improved to be more suggestive of what they do.
The 'd' is a mnemonic for 'dollar strings'. Similarly 'a' is for 'attribute path'. 'safedict' is meant to imply that it will not throw KeyError exceptions, and 'nsdict' indicates that namespace lookups are used. I'm certainly open to alternative suggestions, although sorry Tim, I'll reject 'hamstring' outright.
Ah, that's why. Perhaps we can denote this fact in the final docs if the name is kept? I personally have no issue with it now that I know what they stand for. +0.
[SNIP]
The inclusion of string.py breathes life into something that needs to disappear. One of the reasons for deprecating these functions is to reduce the number of things you need to learn and remember. Interspersing a handful of new functions and classes is contrary to that goal. It becomes hard to introduce simplified substitutions without talking about all the other string functions that you're better off not knowing about.
A separate module is preferable. Also, I don't see any benefit into rolling a package with safedict and nsdict in a separate module from dstring and astring.
Here's the point: we know that some of the names in the current string module will always be useful. I'd hate to see us have to come up with some contrived new module for things like string.letters to live in (e.g. 'stringlib' would suck). 'string' seems like such a useful name to keep as a place to collect future useful string-related constants, utilities, and functionality, of which PEP 292 support is perhaps just the first example.
I'd be perfectly happy splitting string.py into two parts after moving it into Lib/string. One would be named 'deprecated.py' and that would contain all the someday-to-be-deleted functions. The other might be called 'constants.py' for lack of a better name, and would contain things we know we want to keep, like letters, hexdigits, etc. string/__init__.py can hide all that and it would be a simple matter once we ever decide to actually remove the deprecated functions <wink> to do it in two steps (strawman: remove the 'from deprecated import *' from Lib/string/__init__.py but leave the module around for diehards, then eventually remove the module).
I don't think documentation is a problem. I'd propose (and would even write) splitting the current string module so that the deprecated functions are described in a subsection that doesn't appear on the main module page. That way, the documentation just describes the constants we want to keep and the new PEP 292 support (perhaps in another new subsection).
It all sounds good to me. Unless str is going to be renamed 'string' in Python 3, sticking with 'string' seems fine (but then, as Barry said, we discussed this at PyCON so I have supported it for a while =). I know Guido suggested 'strings', and short of 'strtools', 'string' is the only other reasonable name to me. Tacking on 'lib' to every package will become rather tedious quickly, especially when the stdlib is reorganized in 3.0 . And Barry's factoring out stuff that can stand to go away also works for me. Making things we don't want people to use a little harder to reach, but still easily accessible in the docs seems like a good solution. +1 -Brett
Barry Warsaw wrote:
PEP 292 is slated for inclusion in Python 2.4, according to PEP 320. At Pycon I checked in my code for this into the sandbox, which I've since updated, adding unit tests. I believe it's ready for inclusion in dist CVS, but I want to get some feedback first.
My new stuff provides two classes, dstring() as described in PEP 292 and astring() as hinted at in the PEP.
I find these names to be arbitrary and not mnemonic or suggestive. How about "template" or "format" for "dstring"? I don't know what "astring" is. Paul Prescod
Barry Warsaw wrote:
Brett and I (I forget who else was there for this) talked about where to situate the PEP 292 support. The interesting idea came up to turn the string module into a package, providing backward support for the existing string module API, then exporting my PEP 292 modules into this namespace.
I'm fine with providing dstring inside the string module. -1 on making it a package. I see no advantage for having a package with three files, some of them containing only a single class, over having it all in a single module. Regards, Martin
What is the motivation for "safedict"? I can imagine two uses. One seems like it could lead to some kind of security problem. The "harmless" (?) use would be in debugging, so that the program would continue when a key was missing, but the programmer could see after the fact what that key was. The harmful case would be one where the string is substituted in several stages. Just like % substitutions, $-substitutions are not safe for repeated expansion. Here's an example: def something(user_controlled_string): mypassword = "drowssap" bar = "1/8 x 1 inch aluminum bar" s = dstring("${foo} is {$bar}") s = s % safedict({'foo': user_controlled_string}) s = s % nsdict() print s The malicious user supplies user_controlled_string: http://python.example.com/something?user_controlled_string=%24mypassword and gets back drowssap is 1/8 x 1 inch aluminum bar Jeff
participants (11)
-
"Martin v. Löwis"
-
Aahz
-
Anthony Baxter
-
Barry Warsaw
-
Bob Ippolito
-
Brett C.
-
François Pinard
-
Jeff Epler
-
Nick Coghlan
-
Paul Prescod
-
Raymond Hettinger