Pre-PEP: Refusing to guess in string formatting operations
Beni Cherniavsky
cben at techunix.technion.ac.il
Tue Mar 11 14:43:58 EST 2003
Here is something I cooked a week ago but managed to post only now
because it was buried in a damaged disk :-). Please comment on it.
Also availiable on
http://www.technion.ac.il/~cben/python/pep-strfmt.txt
http://www.technion.ac.il/~cben/python/pep-strfmt.html
--
Beni Cherniavsky <cben at tx.technion.ac.il>,
happy that cdrecord works in his linux - one less reason to reboot...
PEP: XXX
Title: Refusing to Guess in the String Formatting Operation
Version: $Revision: $
Last-Modified: $Date: $
Author: Beni Cherniavsky <cben at tx.technion.ac.il>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Mar-2003
Python-Version: 2.3
Post-History: 11-Mar-2003
Abstract
========
The `string formatting operation`_ - ``format % values`` - has a wart.
In the face of non-tuple values, it assumes a singleton tuple around
it. It's tempting to use this since singleton tuples look ugly but
code using this easily breaks (when the single object happens to be
tuple).
This PEP proposes several ways to fix this wart.
The problem was discussed repeatedly but the specific solutions
proposed in this PEP need more feedback.
Motivation
==========
In the face of ambiguity, refuse the temptation to guess.
-- The Zen of Python, by Tim Peters
The simple use ``format % (val1, val2, ...)`` with a tuple is
completely robust. However, since frequently there is only one value
and singleton tuples are quite awkward to write and read, a shorthand
is allowed - when the right object is not a tuple, it's interpreted
as if it was wrapped in a singleton tuple.
This shorthand is bad because:
1. When you pass a single object without the singleton tuple around
it, your code will break it the object happens to be a tuple:
>>> def decorate(obj):
... return '-> %s <-' % obj
...
>>> decorate(1)
'-> 1 <-'
>>> decorate((1,))
'-> 1 <-' # instead of '-> (1,) <-'
>>> decorate((1,2))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 2, in decorate
TypeError: not all arguments converted during string formatting
Your program can silently emit wrong results or crash. This is the
classic example of the imposibility to create a single robust
interface that will accept either an object or a sequence and do
the right thing. Any attempt to do so invites bugs. This has
bitten many Python newbies and was dicussed many times on
comp.lang.python.
2. This temptation will exist forever under the current design. The
purpose of this PEP is to provide altenative ways to format a
single object that are as convenient as formating several objects,
so that the temptation will disappear.
3. It is not possible to use sequences other than tuples (e.g. lists)
for passing multiple values to the formatting operation, because
the sequence is interpreted as a single object. For example, it's
useful to be able to use a transparent debugging proxy that logs
all accesses isntead of an object; if such a proxy for a tuple is
used on the right side of a formatting operation, the transparency
breaks. More generally, it's unpythonic to discriminate objects by
actual type instead of the interface they implement.
The formatting operator also has a third mode: when the right argument
should be a mapping. This mode is triggered deterministically by the
"%(...)..." syntax in the format rather than by checking the type of
the right argument, so it doesn't need fixing.
Single-value operator
=====================
Specification
-------------
Add a new string operator, ``/``, that will always accept a single
value:
>>> '-> %s <-' / 1
'-> 1 <-'
>>> '-> %s <-' / (1 ,)
'-> (1,) <-'
To be always availiable, this would need to be duplicated as both
``str.__div__`` and ``str.__truediv__``.
Rationale
---------
This allows unambiguos expression of formatting with a single value,
with minimal hassle when you change the code between one and several
values. It's probably the least-intrusive fix for the problem.
The ``/`` operator was choosen for this because this is the "closest
relative" of ``%``. Mnemonic: ``/`` looks like a subset of ``%`` but
with just one component :-).
Downsides: it's an arbitrary punctuation; the mnemonic isn't better
than perl's motivations for ``$]`` and friends... The div/truediv
duplication is a hack that highlights the fact that this has nothing
to do with division. Some people might start to wonder what ``//``
(floordiv) does with strings... (Should it do mapping formatting? I
don't see the need - leave it to ``%``.)
Deprecation of format % single-value
====================================
Once a unambiguos method to format a single value exist, there is
little need (except for backward compatibility) for ``%`` to accept
non-tuple right arguments. Thus it makes sense to deprecate this,
eventually removing it completely. This will break a lot of existing
code, not all of which is buggy, so it's better to proceed with this
slowly.
* As soon as a better alternative is implemented (2.3?), formating
operations with non-tuple right argument should raise
``PendingDeprecationWarning`` (except for the "%(...)..." mode that
expects mappings, of course). This is harmless (ignoring
performance) and can be continued until everybody is educated and no
newly written code uses it.
* At some point (2.4? 2.x? 3.0?), the warning becomes
``DeprecationWarning``. At this point all the late adopters go over
their code, perhaps fixing some instances of the
single-values-happens-to-be-tuple bug in the process :-).
* At the next stage, the guessing is completely dropped - you get a
``TypeError`` for non-tuples.
* After this forced all uses of non-tuple values to disappear, it's
possible to relax the demands and accept other seqeunces besides
tuples. Since formatting access the values only seqeuncially, any
iterable type can be accepted. This is a minor point in any case.
Alphabetic Method
=================
*This is not proposed as a sufficient solution but rather as an
additional thing that can optionally be implemented. Actually the PEP
author tends to think this is not needed but it is listed for
completeness - perhaps others will like it more.*
Specification
-------------
Add a new string method with an alphabetic (non-magic) name that would
accept a variable number of arguments (besides self) that are the
values for the formatting operation:
>>> '-> %s <-'.fmt(1)
'-> 1 <-'
>>> '-> %s %s <-'.fmt(1,2)
'-> 1 2 <-'
When the format selects values by names ("%(...)..."), there are
several possible interfaces for passing the mapping (options 2 or 3
are preferred):
1. Keyword arguments:
>>> '-> %(a)s <-'.fmt(a=1)
'-> 1 <-'
>>> '-> %(!@#$)s <-'.fmt(**{'!@#$': 1})
'-> 1 <-'
2. As a single mapping argument:
>>> '-> %(a)s <-'.fmt({'a': 1})
'-> 1 <-'
3. Support both ways.
Since this becomes very similar to the behaviour of the ``dict()``
constructor (since python 2.3), perhaps it makes sense to complete
the similirity and also support a sequence of items as an argument:
>>> # The above two ways work; in addition:
>>> '-> %(a)s <-'.fmt([('a', 1)])
'-> 1 <-'
All this could be easily implemented by just passing all arguments
of the formatting method to ``dict()``. Note however, that when
the argument is already a mapping (has a ``keys`` attribute),
``dict()`` should not be called on it, because that would copy the
whole dictionary (and frequently big mappings are used when the
format only accesses several keys).
Suggested names for the new method: ``fmt``, ``format``, ``sprintf``.
Rationale
---------
* This notation works equally well for one and multiple values thanks
to the call syntax of Python (at the price of some verbosity). This
benefit will only be realised if either of the following happens:
- Everybody completely switches to it and the % notation is
deprecated - this doesn't look like a good idea.
- The % notation is fixed by other means so that both notations are
applicable at any situation (without dependence on the number of
values), subject to the programmer's taste.
That's why this solution is not sufficient in any case.
* ``format.fmt`` looks more readable than ``format.__mod__`` when you
need the bound method.
* The duplication of operator functionality as an alphabetic method
would not be without precedent. Some examples:
- ``list.extend`` vs. ``list.__iadd__``
- ``dict.iterkeys`` vs. ``dict.__iter__``
- ``dict.has_key`` vs. ``dict.__contains__``
Note that in this case, the signature is not identical.
* The use of keyword arguments for mapping-based formatting works in
simple cases because the limitations on the keys are basically the
same - they must be arbitrary strings. However:
- Unicode formatting operations allow unicode strings as keys for
selecting values, whereas keyword arguments are restricted to
plain strings. Does anybody use unicode strings as formatting
keys at all?
- When you already have a mapping, you need to pass it using the
``**mapping`` syntax. This doesn't work for mappings that are not
real dicts and it seems to go over all keys validating they are
strings, which unnecesarily slows things down.
Therefore supporting only the keyword arguments interface would be
bad.
* One alternative to an alphabetic method name is ``__call__``:
>>> '-> %s <-'(1)
'-> 1 <-'
>>> '-> %s %s <-'(1, 2)
'-> 1 2 <-'
This has the same benefits of unambiguity inherited from the call
syntax and is very concise. This is also its undoing: it's too
cryptic. Using strings in function call position will deeply puzzle
anybody unfamiliar with it. The idea was probably discussed and
dismissed when ``%`` was originally introduced.
Quoting PEP 234, on rejection of ``__call__()`` as alternative name
for iterators' ``.next()`` method:
... there's a danger that every special-purpose object wants to
use __call__() for its most common operation, causing more
confusion than clarity.
Backwards Compatibility
=======================
The proposed addition(s) present no compatibility problems (unless
somebody intentionally checks the absence of such method(s), in which
case he doesn't deserve compatibility :-).
The deprecation is estimated to affect a lot of code - there is hardly
any Python programmer who has never used the single-value formatting
shorthand. It is completely optional and can be done as slowly as
needed to make the transition painless.
There is probably some APIs out there with functions that accept a
single value or a tuple and pass it to the formatting operation as-is.
The deprecation will force them to either to break the API in this
respect (possibly implementing alternatives similar to this PEP),
which is a cleaner idea anyway, or explicitly implement compatible
functionality, e.g.::
if isinstance(values, tuple):
do_something(format % values)
else:
do_something(format % (values ,))
When the mapping-based formats should also be supported, some more
code is needed...
References
==========
.. _String formatting operation:
http://www.python.org/doc/current/lib/typesseq-strings.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
End:
More information about the Python-list
mailing list