Pre-PEP: Refusing to guess in string formatting operations

Beni Cherniavsky cben at techunix.technion.ac.il
Tue Mar 11 20:43:58 CET 2003


Here is something I cooked a week ago but managed to post only now
because it was buried in a damaged disk :-).  Please comment on it.

Also availiable on

http://www.technion.ac.il/~cben/python/pep-strfmt.txt
http://www.technion.ac.il/~cben/python/pep-strfmt.html

-- 
Beni Cherniavsky <cben at tx.technion.ac.il>,
happy that cdrecord works in his linux - one less reason to reboot...


PEP: XXX
Title: Refusing to Guess in the String Formatting Operation
Version: $Revision: $
Last-Modified: $Date: $
Author: Beni Cherniavsky <cben at tx.technion.ac.il>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Mar-2003
Python-Version: 2.3
Post-History: 11-Mar-2003


Abstract
========

The `string formatting operation`_ - ``format % values`` - has a wart.
In the face of non-tuple values, it assumes a singleton tuple around
it.  It's tempting to use this since singleton tuples look ugly but
code using this easily breaks (when the single object happens to be
tuple).

This PEP proposes several ways to fix this wart.

The problem was discussed repeatedly but the specific solutions
proposed in this PEP need more feedback.


Motivation
==========

    In the face of ambiguity, refuse the temptation to guess.

    -- The Zen of Python, by Tim Peters

The simple use ``format % (val1, val2, ...)`` with a tuple is
completely robust.  However, since frequently there is only one value
and singleton tuples are quite awkward to write and read, a shorthand
is allowed - when the right object is not a tuple, it's interpreted
as if it was wrapped in a singleton tuple.

This shorthand is bad because:

1. When you pass a single object without the singleton tuple around
   it, your code will break it the object happens to be a tuple:

   >>> def decorate(obj):
   ...     return '-> %s <-' % obj
   ...
   >>> decorate(1)
   '-> 1 <-'
   >>> decorate((1,))
   '-> 1 <-'           # instead of '-> (1,) <-'
   >>> decorate((1,2))
   Traceback (most recent call last):
     File "<stdin>", line 1, in ?
     File "<stdin>", line 2, in decorate
   TypeError: not all arguments converted during string formatting

   Your program can silently emit wrong results or crash.  This is the
   classic example of the imposibility to create a single robust
   interface that will accept either an object or a sequence and do
   the right thing.  Any attempt to do so invites bugs.  This has
   bitten many Python newbies and was dicussed many times on
   comp.lang.python.

2. This temptation will exist forever under the current design.  The
   purpose of this PEP is to provide altenative ways to format a
   single object that are as convenient as formating several objects,
   so that the temptation will disappear.

3. It is not possible to use sequences other than tuples (e.g. lists)
   for passing multiple values to the formatting operation, because
   the sequence is interpreted as a single object.  For example, it's
   useful to be able to use a transparent debugging proxy that logs
   all accesses isntead of an object; if such a proxy for a tuple is
   used on the right side of a formatting operation, the transparency
   breaks.  More generally, it's unpythonic to discriminate objects by
   actual type instead of the interface they implement.

The formatting operator also has a third mode: when the right argument
should be a mapping.  This mode is triggered deterministically by the
"%(...)..." syntax in the format rather than by checking the type of
the right argument, so it doesn't need fixing.


Single-value operator
=====================

Specification
-------------

Add a new string operator, ``/``, that will always accept a single
value:

>>> '-> %s <-' / 1
'-> 1 <-'
>>> '-> %s <-' / (1 ,)
'-> (1,) <-'

To be always availiable, this would need to be duplicated as both
``str.__div__`` and ``str.__truediv__``.


Rationale
---------

This allows unambiguos expression of formatting with a single value,
with minimal hassle when you change the code between one and several
values.  It's probably the least-intrusive fix for the problem.

The ``/`` operator was choosen for this because this is the "closest
relative" of ``%``.  Mnemonic: ``/`` looks like a subset of ``%`` but
with just one component :-).

Downsides: it's an arbitrary punctuation; the mnemonic isn't better
than perl's motivations for ``$]`` and friends...  The div/truediv
duplication is a hack that highlights the fact that this has nothing
to do with division.  Some people might start to wonder what ``//``
(floordiv) does with strings...  (Should it do mapping formatting?  I
don't see the need - leave it to ``%``.)


Deprecation of format % single-value
====================================

Once a unambiguos method to format a single value exist, there is
little need (except for backward compatibility) for ``%`` to accept
non-tuple right arguments.  Thus it makes sense to deprecate this,
eventually removing it completely.  This will break a lot of existing
code, not all of which is buggy, so it's better to proceed with this
slowly.

* As soon as a better alternative is implemented (2.3?), formating
  operations with non-tuple right argument should raise
  ``PendingDeprecationWarning`` (except for the "%(...)..." mode that
  expects mappings, of course).  This is harmless (ignoring
  performance) and can be continued until everybody is educated and no
  newly written code uses it.

* At some point (2.4?  2.x?  3.0?), the warning becomes
  ``DeprecationWarning``.  At this point all the late adopters go over
  their code, perhaps fixing some instances of the
  single-values-happens-to-be-tuple bug in the process :-).

* At the next stage, the guessing is completely dropped - you get a
  ``TypeError`` for non-tuples.

* After this forced all uses of non-tuple values to disappear, it's
  possible to relax the demands and accept other seqeunces besides
  tuples.  Since formatting access the values only seqeuncially, any
  iterable type can be accepted.  This is a minor point in any case.


Alphabetic Method
=================

*This is not proposed as a sufficient solution but rather as an
additional thing that can optionally be implemented.  Actually the PEP
author tends to think this is not needed but it is listed for
completeness - perhaps others will like it more.*


Specification
-------------

Add a new string method with an alphabetic (non-magic) name that would
accept a variable number of arguments (besides self) that are the
values for the formatting operation:

>>> '-> %s <-'.fmt(1)
'-> 1 <-'
>>> '-> %s %s <-'.fmt(1,2)
'-> 1 2 <-'

When the format selects values by names ("%(...)..."), there are
several possible interfaces for passing the mapping (options 2 or 3
are preferred):

1. Keyword arguments:

   >>> '-> %(a)s <-'.fmt(a=1)
   '-> 1 <-'
   >>> '-> %(!@#$)s <-'.fmt(**{'!@#$': 1})
   '-> 1 <-'

2. As a single mapping argument:

   >>> '-> %(a)s <-'.fmt({'a': 1})
   '-> 1 <-'

3. Support both ways.

   Since this becomes very similar to the behaviour of the ``dict()``
   constructor (since python 2.3), perhaps it makes sense to complete
   the similirity and also support a sequence of items as an argument:

   >>> # The above two ways work; in addition:
   >>> '-> %(a)s <-'.fmt([('a', 1)])
   '-> 1 <-'

   All this could be easily implemented by just passing all arguments
   of the formatting method to ``dict()``.  Note however, that when
   the argument is already a mapping (has a ``keys`` attribute),
   ``dict()`` should not be called on it, because that would copy the
   whole dictionary (and frequently big mappings are used when the
   format only accesses several keys).

Suggested names for the new method: ``fmt``, ``format``, ``sprintf``.


Rationale
---------

* This notation works equally well for one and multiple values thanks
  to the call syntax of Python (at the price of some verbosity).  This
  benefit will only be realised if either of the following happens:

  - Everybody completely switches to it and the % notation is
    deprecated - this doesn't look like a good idea.

  - The % notation is fixed by other means so that both notations are
    applicable at any situation (without dependence on the number of
    values), subject to the programmer's taste.

  That's why this solution is not sufficient in any case.

* ``format.fmt`` looks more readable than ``format.__mod__`` when you
  need the bound method.

* The duplication of operator functionality as an alphabetic method
  would not be without precedent.  Some examples:

  - ``list.extend`` vs. ``list.__iadd__``
  - ``dict.iterkeys`` vs. ``dict.__iter__``
  - ``dict.has_key`` vs. ``dict.__contains__``

  Note that in this case, the signature is not identical.

* The use of keyword arguments for mapping-based formatting works in
  simple cases because the limitations on the keys are basically the
  same - they must be arbitrary strings.  However:

  - Unicode formatting operations allow unicode strings as keys for
    selecting values, whereas keyword arguments are restricted to
    plain strings.  Does anybody use unicode strings as formatting
    keys at all?

  - When you already have a mapping, you need to pass it using the
    ``**mapping`` syntax.  This doesn't work for mappings that are not
    real dicts and it seems to go over all keys validating they are
    strings, which unnecesarily slows things down.

  Therefore supporting only the keyword arguments interface would be
  bad.

* One alternative to an alphabetic method name is ``__call__``:

  >>> '-> %s <-'(1)
  '-> 1 <-'
  >>> '-> %s %s <-'(1, 2)
  '-> 1 2 <-'

  This has the same benefits of unambiguity inherited from the call
  syntax and is very concise.  This is also its undoing: it's too
  cryptic.  Using strings in function call position will deeply puzzle
  anybody unfamiliar with it.  The idea was probably discussed and
  dismissed when ``%`` was originally introduced.

  Quoting PEP 234, on rejection of ``__call__()`` as alternative name
  for iterators' ``.next()`` method:

      ... there's a danger that every special-purpose object wants to
      use __call__() for its most common operation, causing more
      confusion than clarity.


Backwards Compatibility
=======================

The proposed addition(s) present no compatibility problems (unless
somebody intentionally checks the absence of such method(s), in which
case he doesn't deserve compatibility :-).

The deprecation is estimated to affect a lot of code - there is hardly
any Python programmer who has never used the single-value formatting
shorthand.  It is completely optional and can be done as slowly as
needed to make the transition painless.

There is probably some APIs out there with functions that accept a
single value or a tuple and pass it to the formatting operation as-is.
The deprecation will force them to either to break the API in this
respect (possibly implementing alternatives similar to this PEP),
which is a cleaner idea anyway, or explicitly implement compatible
functionality, e.g.::

    if isinstance(values, tuple):
        do_something(format % values)
    else:
        do_something(format % (values ,))

When the mapping-based formats should also be supported, some more
code is needed...


References
==========

.. _String formatting operation:
   http://www.python.org/doc/current/lib/typesseq-strings.html


Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End:





More information about the Python-list mailing list