Mailman 3 PEP 292 for Python 2.4 - Python-Dev

newer
Pie-thon at OSCON 2004?

PEP 292 for Python 2.4

older
Re: [Python-Dev] Why is Bytecode...

Barry Warsaw

16 Jun 2004 16 Jun '04

3:05 a.m.

PEP 292 is slated for inclusion in Python 2.4, according to PEP 320. At Pycon I checked in my code for this into the sandbox, which I've since updated, adding unit tests. I believe it's ready for inclusion in dist CVS, but I want to get some feedback first. My new stuff provides two classes, dstring() as described in PEP 292 and astring() as hinted at in the PEP. It also provides two dictionary subclasses called safedict() and nsdict() which are not required, but work nicely with dstring() and astring() -- safedict re-expands keys instead of throwing exceptions, and nsdict does namespace lookup and attribute path expansion. Brett and I (I forget who else was there for this) talked about where to situate the PEP 292 support. The interesting idea came up to turn the string module into a package, providing backward support for the existing string module API, then exporting my PEP 292 modules into this namespace. This would make the 'import string' useful again since it would be a place to collect future string related functionality without having to claim some dumb name like 'stringlib'. I believe we can still someday deprecate the old string module functions, retaining the useful constants, as well as new string-y features. This is actually not hard to do, and I have this working (and passing the existing unit tests) in a local checkout. You simply create the Lib/string directory, copy or move Lib/string.py to Lib/string/string.py and do a bit of import-* in Lib/string/__init__.py. The unit tests all passed with no changes necessary. I'd drop my pep292.py file and safedict.py file into Lib/string and import the useful names out of there, exposing them in the string namespace. Is this a good idea? I dunno, but it seems better to me than adding two more top-level modules with largely contrived names; nothing better jumps out to me. I also really want to include safedict.py if we're including pep292.py because they're quite useful and complimentary, IMO, and I can't think of a better place to put those classes either. I'm open to suggestions. I have not yet written docs for these new classes, but will do so once we agree on where they're getting added. The code and test cases are in python/nondist/sandbox/string. -Barry

Attachments:

signature.asc (application/pgp-signature — 307 bytes)

Show replies by date

Raymond Hettinger

15 Jun 15 Jun

9:10 p.m.

...

PEP 292 is slated for inclusion in Python 2.4,

For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}. For instance, what should happen with: "A dangling $" "A $!invalid_identifier" "A $identfier&followed_by_nonwhitespace_punctuation"

...

My new stuff provides two classes, dstring() as described in PEP 292 and astring() as hinted at in the PEP. It also provides two dictionary subclasses called safedict() and nsdict() which are not required, but work nicely with dstring() and astring() -- safedict re-expands keys instead of throwing exceptions, and nsdict does namespace lookup and attribute path expansion.

...

Brett and I (I forget who else was there for this) talked about where to situate the PEP 292 support. The interesting idea came up to turn the string module into a package, providing backward support for the existing string module API, then exporting my PEP 292 modules into

The names dstring(), astring(), safedict(), and nsdict() could likely be improved to be more suggestive of what they do. this

...

namespace. This would make the 'import string' useful again since it would be a place to collect future string related functionality without having to claim some dumb name like 'stringlib'. I believe we can still someday deprecate the old string module functions, retaining the useful constants, as well as new string-y features.

-1 The inclusion of string.py breathes life into something that needs to disappear. One of the reasons for deprecating these functions is to reduce the number of things you need to learn and remember. Interspersing a handful of new functions and classes is contrary to that goal. It becomes hard to introduce simplified substitutions without talking about all the other string functions that you're better off not knowing about. A separate module is preferable. Also, I don't see any benefit into rolling a package with safedict and nsdict in a separate module from dstring and astring.

...

I also really want to include safedict.py if we're including pep292.py because they're quite useful and complimentary, IMO, and I can't think of a better place to put those classes either.

Can safedict.safedict() be made more general so that it has value outside of string substitutions. Ideally, the default format would be customizable and would include an option to leave the entry unchanged. Right now, the implementation is specific to string substitution formats. It is not even useful with normal % formatting.

...

I'm open to suggestions. I have not yet written docs for these new classes, but will do so once we agree on where they're getting added. The code and test cases are in python/nondist/sandbox/string.

Given the simplicity of the PEP, the sandbox implementation is surprisingly intricate. Is it possible to simplify it with a function based rather than class based approach? I can imagine alternatives which encapsulate the whole idea in something similar to this: import re nondotted = re.compile(r'(\${2})|\$([_a-z][_a-z0-9]*)|\$({[_a-z][_a-z0-9]*})', re.IGNORECASE) dotted= re.compile(r'(\${2})|\$([_a-z][_.a-z0-9]*)|\$({[_a-z][_.a-z0-9]*})', re.IGNORECASE) def _convert(m): 'Convert $ formats to % formats' escaped, straight, bracketed = m.groups() if escaped is not None: return '$' if straight is not None: return '%(' + straight + ')s' return '%(' + bracketed[1:-1] + ')s' def subst(fmtstr, mapping, fmtcode=nondotted, _cache={}): if fmtstr not in _cache: _cache[fmtstr] = _fmtcode.sub(_convert, fmtstr) return _cache[fmtstr] % mapping

...

...
...
fmtstr = '$who owes me $$${what}.' mapping = dict(who='Guido', what='money')) print subst(fmtstr, mapping) Guido owes me $money.

Raymond

Anthony Baxter

16 Jun 16 Jun

4:03 p.m.

Raymond Hettinger wrote:

...

...
PEP 292 is slated for inclusion in Python 2.4,

For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}. For instance, what should happen with:

"A dangling $" "A $!invalid_identifier" "A $identfier&followed_by_nonwhitespace_punctuation"

Or, to pick a more common example: "$Id: rtp.py,v 1.40 2004/03/07 14:41:39 anthony Exp $"

Barry Warsaw

7:35 p.m.

On Tue, 2004-06-15 at 17:10, Raymond Hettinger wrote:

...

For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}.

Good point, I've pushed out an update.

...

The names dstring(), astring(), safedict(), and nsdict() could likely be improved to be more suggestive of what they do.

The 'd' is a mnemonic for 'dollar strings'. Similarly 'a' is for 'attribute path'. 'safedict' is meant to imply that it will not throw KeyError exceptions, and 'nsdict' indicates that namespace lookups are used. I'm certainly open to alternative suggestions, although sorry Tim, I'll reject 'hamstring' outright.

...

...
Brett and I (I forget who else was there for this) talked about where to situate the PEP 292 support. The interesting idea came up to turn the string module into a package

...

-1

...

The inclusion of string.py breathes life into something that needs to disappear. One of the reasons for deprecating these functions is to reduce the number of things you need to learn and remember. Interspersing a handful of new functions and classes is contrary to that goal. It becomes hard to introduce simplified substitutions without talking about all the other string functions that you're better off not knowing about.

A separate module is preferable. Also, I don't see any benefit into rolling a package with safedict and nsdict in a separate module from dstring and astring.

Here's the point: we know that some of the names in the current string module will always be useful. I'd hate to see us have to come up with some contrived new module for things like string.letters to live in (e.g. 'stringlib' would suck). 'string' seems like such a useful name to keep as a place to collect future useful string-related constants, utilities, and functionality, of which PEP 292 support is perhaps just the first example. I'd be perfectly happy splitting string.py into two parts after moving it into Lib/string. One would be named 'deprecated.py' and that would contain all the someday-to-be-deleted functions. The other might be called 'constants.py' for lack of a better name, and would contain things we know we want to keep, like letters, hexdigits, etc. string/__init__.py can hide all that and it would be a simple matter once we ever decide to actually remove the deprecated functions <wink> to do it in two steps (strawman: remove the 'from deprecated import *' from Lib/string/__init__.py but leave the module around for diehards, then eventually remove the module). I don't think documentation is a problem. I'd propose (and would even write) splitting the current string module so that the deprecated functions are described in a subsection that doesn't appear on the main module page. That way, the documentation just describes the constants we want to keep and the new PEP 292 support (perhaps in another new subsection).

...

Can safedict.safedict() be made more general so that it has value outside of string substitutions.

It's such a trivial matter to subclass from dict and write your own __getitem__() that I doubt it's worth it.

...

Given the simplicity of the PEP, the sandbox implementation is surprisingly intricate. Is it possible to simplify it with a function based rather than class based approach?

Take away all the comments, and it's really a fairly simple implementation. I really want to use traditional % syntax to perform the substitutions since that's the Pythonically natural way to spell string interpolation. The only complication in the implementation is the cache of the converted-to-%s string in the subclass, but this is critical. In an i18n application you need the original string for catalog lookup, and the transformed string is only useful for the mod operation. -Barry

Raymond Hettinger

10:37 a.m.

...

...
For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}.

Good point, I've pushed out an update.

...

...
The names dstring(), astring(), safedict(), and nsdict() could

Thanks. likely be

...

...
improved to be more suggestive of what they do.

The 'd' is a mnemonic for 'dollar strings'. Similarly 'a' is for 'attribute path'. 'safedict' is meant to imply that it will not throw KeyError exceptions, and 'nsdict' indicates that namespace lookups are used. I'm certainly open to alternative suggestions

...

I don't think documentation is a problem. I'd propose (and would even write) splitting the current string module so that the deprecated functions are described in a subsection that doesn't appear on the

Since this is in a string module, the "string" part of the name can be more abbreviated and the qualifier should be less abbreviated. dstring: dollarstr, formatstr, dollarfmt, template, kwdonly astring: attrstr, attrlookup, dottedfmt, kwdattr safedict: defaultdict, nsdict: nslookup, namespace, envdict Cheetah has been through several versions. Perhaps, they have worked out some better naming conventions. main

...

module page. That way, the documentation just describes the constants we want to keep and the new PEP 292 support (perhaps in another new subsection).

That's reasonable. A string module is the natural place to locate the simplified substitutions. Splitting out the old functions seems like a good way to re-use the string module without breathing life into things that I was hoping that people would forget ever existed (otherwise, we will never be rid of them). Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).

...

...
Can safedict.safedict() be made more general so that it has value outside of string substitutions.

It's such a trivial matter to subclass from dict and write your own __getitem__() that I doubt it's worth it.

True enough. Do consider having an optional argument for setting the default string. Ideally, the class should be useful with both $ formatted and % formatted strings (for instance, make it return the key unchanged when the key is not found). Also, since the implementation is so tightly bound to $ formatting, it makes no sense to put it in a separate module.

...

...
Given the simplicity of the PEP, the sandbox implementation is surprisingly intricate. Is it possible to simplify it with a function based rather than class based approach?

Take away all the comments, and it's really a fairly simple implementation. I really want to use traditional % syntax to perform the substitutions since that's the Pythonically natural way to spell string interpolation.

The overall goal of the PEP is simplification. It takes very little complexity before $ formatting becomes more complicated than % formatting. The % syntax has its share of issues (hard to find in the docs; precedence is more appropriate for integer modulo; tuple vs single string argument). If you give up the % syntax, you get perfectly pythonic method calls and an opportunity to do the whole job with only one exposed, differentiating the approaches with various method names: t = Template('$who owes me ${what') t.subst_from_dict(mydict) t.subst_from_env() t.subst_from_attr() t.subst(mydict, noexception=True) Something like this would mean that you don't need several different classes to do the job. Also, compare the following for obviousness and readability: Template('$name loves $spouse').subst(mydict, noexception=True) dstring('$name loves $spouse') % SafeDict(mydict)

...

In an i18n application you need the original string for catalog lookup, and the transformed string is only useful for the mod operation.

That settles that one. Raymond

Aahz

24 Jun 24 Jun

2:27 p.m.

On Wed, Jun 16, 2004, Raymond Hettinger wrote:

...

Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).

That's true. However, there has been a regular low-level discussion about creating a ``text`` package; why not simply name it ``string``? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Typing is cheap. Thinking is expensive." --Roy Smith, c.l.py

Bob Ippolito

25 Jun 25 Jun

1:47 a.m.

On Jun 24, 2004, at 10:27 AM, Aahz wrote:

...

On Wed, Jun 16, 2004, Raymond Hettinger wrote:

...
Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).

That's true. However, there has been a regular low-level discussion about creating a ``text`` package; why not simply name it ``string``?

If nothing else, that would cause hell for people who would like to use a backport of the package for Python N, where N is less than the first version that had this feature but still had the string module. -bob

Aahz

28 Jun 28 Jun

2:05 p.m.

On Thu, Jun 24, 2004, Bob Ippolito wrote:

...

On Jun 24, 2004, at 10:27 AM, Aahz wrote:

...
On Wed, Jun 16, 2004, Raymond Hettinger wrote:

...
Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).

That's true. However, there has been a regular low-level discussion about creating a ``text`` package; why not simply name it ``string``?

If nothing else, that would cause hell for people who would like to use a backport of the package for Python N, where N is less than the first version that had this feature but still had the string module.

This actually makes it *easier* to backport; you only take the submodule you want. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Typing is cheap. Thinking is expensive." --Roy Smith, c.l.py

Bob Ippolito

2:18 p.m.

On Jun 28, 2004, at 7:05 AM, Aahz wrote:

...

On Thu, Jun 24, 2004, Bob Ippolito wrote:

...
On Jun 24, 2004, at 10:27 AM, Aahz wrote:

...
On Wed, Jun 16, 2004, Raymond Hettinger wrote:

...
Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).

That's true. However, there has been a regular low-level discussion about creating a ``text`` package; why not simply name it ``string``?

If nothing else, that would cause hell for people who would like to use a backport of the package for Python N, where N is less than the first version that had this feature but still had the string module.

This actually makes it *easier* to backport; you only take the submodule you want.

Why is calling it string instead of text easier? You can't easily replace string, because site-packages comes late in sys.path. -bob

François Pinard

13 Jul 13 Jul

1:04 p.m.

[Aahz]

...

[Raymond Hettinger]

...

...
Please do give consideration to putting all of this in a single module. IMO, this is too small of an addition to warrant splitting everything in to packages (which make it more difficult to understand and maintain as a collective unit).

...

That's true. However, there has been a regular low-level discussion about creating a ``text`` package; why not simply name it ``string``?

Please do not use, as package names, identifiers that users would likely want to keep for themselves. `text' and `string' are bad ideas for package names. `stringlib' seems much more likely do not hurt people. I know that `string' and `socket' exist, despite `string' is evanescent, but they surely forced users at choosing other identifiers where `string' and `socket' would have been perfect. It is very good news that, now in Python 2.3, `string' is unneeded most of times. Let us not repeat previous mistakes, or even nail them further by trying to be compatible with them. -- François Pinard http://www.iro.umontreal.ca/~pinard

Nick Coghlan

14 Jul 14 Jul

9:50 a.m.

François Pinard wrote:

...

I know that `string' and `socket' exist, despite `string' is evanescent, but they surely forced users at choosing other identifiers where `string' and `socket' would have been perfect. It is very good news that, now in Python 2.3, `string' is unneeded most of times. Let us not repeat previous mistakes, or even nail them further by trying to be compatible with them.

I would suggest that bare type names are rarely appropriate for use a variable names, except in toy examples. If I'm reading someone else's code, and they create a string or a socket, I want to know what it is _for_, rather than the mere fact this it is a string or a socket. If the type is all that is important, then prepending some simple word such as 'a_string' or 'the_string' or 'my_string' makes it clear to the maintainer that the object doesn't really have any significant semantic meaning beyond its type. Regards, Nick. -- Nick Coghlan | Brisbane, Australia Email: ncoghlan@email.com | Mobile: +61 409 573 268

François Pinard

12:22 p.m.

[Nick Coghlan]

...

François Pinard wrote:

...

...
I know that `string' and `socket' [modules] exist, despite `string' is evanescent, but they surely forced users at choosing other identifiers where `string' and `socket' would have been perfect.

...

I would suggest that bare type names are rarely appropriate for use a variable names, except in toy examples.

Or small enough functions. Small functions are not necessarily toys.

...

If I'm reading someone else's code, and they create a string or a socket, I want to know what it is _for_, rather than the mere fact this it is a string or a socket.

If I write a function receiving a string as an argument, and the effect of the function being already documented, I see no point writing `parameter_string' or `the_argument_of_the_function' instead of `string', which is clear, clean and simple. Some people would write `s' instead, but for one, I stopped overly liking algebraic notation in programs after I left FORTRAN :-). When you speak to someone else about the argument of a simple function, don't you say "then the function takes the string, it massages the string this way, etc.". I like naming my variables the way I would speak about them! :-)

...

If the type is all that is important, then prepending some simple word such as 'a_string' or 'the_string' or 'my_string' makes it clear to the maintainer that the object doesn't really have any significant semantic meaning beyond its type.

Come on, be serious! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

Nick Coghlan

15 Jul 15 Jul

1:31 p.m.

François Pinard wrote:

...

[Nick Coghlan]

...
I would suggest that bare type names are rarely appropriate for use a variable names, except in toy examples.

Or small enough functions. Small functions are not necessarily toys.

Hmm, I hadn't considered that case. I guess I tend not to write too many support functions where generic names would be appropriate (most of my Python code is very domain specific). Cheers, Nick. -- Nick Coghlan | Brisbane, Australia Email: ncoghlan@email.com | Mobile: +61 409 573 268

Raymond Hettinger

16 Jun 16 Jun

12:26 p.m.

...

...
For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}.

Good point, I've pushed out an update.

One other thought, please reconsider the key lookup for ${identifier}. I think retaining the braces in the key is a mistake. The purpose of the braces was to allow trailing characters without intervening whitespace. Extending it to have special meaning for SafeDicts was probably not the way to go. As a result, the example in the PEP doesn't work anymore:

...

...
...
mapping = dict(name='Guido', country='the Netherlands') s = dstring('${name} was born in ${country}') print s % mapping

Traceback (most recent call last): File "C:\nondist\sandbox\string\pep292.py", line 124, in -toplevel- print s % mapping File "C:\nondist\sandbox\string\pep292.py", line 108, in __mod__ return self._modstr % other KeyError: '{name}' Raymond P.S. The PEP example is also missing the rightmost single quotation mark.

Brett C.

23 Jun 23 Jun

8:47 p.m.

Barry Warsaw wrote:

...

On Tue, 2004-06-15 at 17:10, Raymond Hettinger wrote:

...
For completeness, perhaps update the PEP to specify what will happen with $ strings that do not fall under $$, $indentifier, or ${identifier}.

Good point, I've pushed out an update.

...
The names dstring(), astring(), safedict(), and nsdict() could likely be improved to be more suggestive of what they do.

The 'd' is a mnemonic for 'dollar strings'. Similarly 'a' is for 'attribute path'. 'safedict' is meant to imply that it will not throw KeyError exceptions, and 'nsdict' indicates that namespace lookups are used. I'm certainly open to alternative suggestions, although sorry Tim, I'll reject 'hamstring' outright.

Ah, that's why. Perhaps we can denote this fact in the final docs if the name is kept? I personally have no issue with it now that I know what they stand for. +0.

...

[SNIP]

...

...
The inclusion of string.py breathes life into something that needs to disappear. One of the reasons for deprecating these functions is to reduce the number of things you need to learn and remember. Interspersing a handful of new functions and classes is contrary to that goal. It becomes hard to introduce simplified substitutions without talking about all the other string functions that you're better off not knowing about.

A separate module is preferable. Also, I don't see any benefit into rolling a package with safedict and nsdict in a separate module from dstring and astring.

Here's the point: we know that some of the names in the current string module will always be useful. I'd hate to see us have to come up with some contrived new module for things like string.letters to live in (e.g. 'stringlib' would suck). 'string' seems like such a useful name to keep as a place to collect future useful string-related constants, utilities, and functionality, of which PEP 292 support is perhaps just the first example.

I'd be perfectly happy splitting string.py into two parts after moving it into Lib/string. One would be named 'deprecated.py' and that would contain all the someday-to-be-deleted functions. The other might be called 'constants.py' for lack of a better name, and would contain things we know we want to keep, like letters, hexdigits, etc. string/__init__.py can hide all that and it would be a simple matter once we ever decide to actually remove the deprecated functions <wink> to do it in two steps (strawman: remove the 'from deprecated import *' from Lib/string/__init__.py but leave the module around for diehards, then eventually remove the module).

I don't think documentation is a problem. I'd propose (and would even write) splitting the current string module so that the deprecated functions are described in a subsection that doesn't appear on the main module page. That way, the documentation just describes the constants we want to keep and the new PEP 292 support (perhaps in another new subsection).

It all sounds good to me. Unless str is going to be renamed 'string' in Python 3, sticking with 'string' seems fine (but then, as Barry said, we discussed this at PyCON so I have supported it for a while =). I know Guido suggested 'strings', and short of 'strtools', 'string' is the only other reasonable name to me. Tacking on 'lib' to every package will become rather tedious quickly, especially when the stdlib is reorganized in 3.0 . And Barry's factoring out stuff that can stand to go away also works for me. Making things we don't want people to use a little harder to reach, but still easily accessible in the docs seems like a good solution. +1 -Brett

Paul Prescod

19 Jun 19 Jun

10:38 a.m.

Barry Warsaw wrote:

...

PEP 292 is slated for inclusion in Python 2.4, according to PEP 320. At Pycon I checked in my code for this into the sandbox, which I've since updated, adding unit tests. I believe it's ready for inclusion in dist CVS, but I want to get some feedback first.

My new stuff provides two classes, dstring() as described in PEP 292 and astring() as hinted at in the PEP.

I find these names to be arbitrary and not mnemonic or suggestive. How about "template" or "format" for "dstring"? I don't know what "astring" is. Paul Prescod

"Martin v. Löwis"

20 Jun 20 Jun

4:19 p.m.

Barry Warsaw wrote:

...

Brett and I (I forget who else was there for this) talked about where to situate the PEP 292 support. The interesting idea came up to turn the string module into a package, providing backward support for the existing string module API, then exporting my PEP 292 modules into this namespace.

I'm fine with providing dstring inside the string module. -1 on making it a package. I see no advantage for having a package with three files, some of them containing only a single class, over having it all in a single module. Regards, Martin

Jeff Epler

23 Jun 23 Jun

10:53 p.m.

What is the motivation for "safedict"? I can imagine two uses. One seems like it could lead to some kind of security problem. The "harmless" (?) use would be in debugging, so that the program would continue when a key was missing, but the programmer could see after the fact what that key was. The harmful case would be one where the string is substituted in several stages. Just like % substitutions, $-substitutions are not safe for repeated expansion. Here's an example: def something(user_controlled_string): mypassword = "drowssap" bar = "1/8 x 1 inch aluminum bar" s = dstring("${foo} is {$bar}") s = s % safedict({'foo': user_controlled_string}) s = s % nsdict() print s The malicious user supplies user_controlled_string: http://python.example.com/something?user_controlled_string=%24mypassword and gets back drowssap is 1/8 x 1 inch aluminum bar Jeff

7224

Age (days ago)

7253

Last active (days ago)

List overview

Download

17 comments

11 participants

participants (11)

"Martin v. Löwis"
Aahz
Anthony Baxter
Barry Warsaw
Bob Ippolito
Brett C.
François Pinard
Jeff Epler
Nick Coghlan
Paul Prescod
Raymond Hettinger

PEP 292 for Python 2.4

François Pinard

François Pinard

tags

participants (11)