Format specification mini-language for list joining

Tobia Conforto tobia.conforto at
Sat Nov 10 11:26:28 CET 2012


Lately I have been writing a lot of list join() operations variously including (and included in) string format() operations.

For example:

  temps = [24.369, 24.550, 26.807, 27.531, 28.752]

  out = 'Temperatures: {0} Celsius'.format(
            ', '.join('{0:.1f}'.format(t) for t in temps)

  # => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'

This is just a simple example, my actual code has many more join and format operations, split into local variables as needed for clarity.

Then I remembered that Ye Old Common Lisp's format operator had built-in list traversing capabilities[1]:

  (format t "Temperatures: ~{~1$~^, ~} Celsius" temps)

That format string (the part in the middle that looks like line noise) is admittedly arcane, but it's parsed like this:

  ~{	take next argument (temp) and start iterating over its contents
  ~1$	output a floating point number with 1 digit precision
  ~^	break the loop if there are no more items available
  ", "	(otherwise) output a comma and space
  ~}	end of the loop body

Now, as much as I appreciate the heritage of Lisp, I won't deny than its format string mini-language is EVIL. As a rule, format string placeholders should not include *imperative statements* such as for, break, continue, and if. We don't need a Turing-complete language in our format strings. Still, this is the grand^n-father of Python's format strings, so it's interesting to look at how it used to approach the list joining issue.

Then I asked myself: can I take the list joining capability and port it over to Python's format(), doing away with the overall ugliness?

Here is what I came up with:

  out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

  # => 'Temperatures: 24.4, 24.6, 26.8, 27.5, 28.8 Celsius'

Here ", " is the joiner between the items and <.1f> is the format string for each item.

The way this would work is by defining a specific Format Specification Mini-Language for sequences (such as lists, tuples, and iterables).

A Format Specification Mini-Language (format_spec) is whatever follows the first colon in a curly brace placeholder, and is defined by the argument's class, so that it can vary wildly among different types.[2]

The root class (object) defines the generic format_spec we are accustomed to[3]:


But that doesn't mean that more complex types should not define extensions or replacements. I propose this extended format_spec for sequences:

  seq_format_spec  ::= join_string [":" item_format_spec] | format_spec
  join_string      ::= '"' join_string_char* '"' | "'" join_string_char* "'"
  join_string_char ::= <any character except "{", "}", newline, or the quote>
  item_format_spec ::= format_spec

That is, if the format_spec for a sequence starts with ' or " it would be interpreted as a join operation (eg. {0:", "} or {0:', '}) optionally followed by a format_spec for the single items: {0:", ":.1f}

If the format_spec does not start with ' or ", of if the quote is not balanced (does not appear again in the format_spec), then it's assumed to be a generic format string and the implementation would call super(). This is meant for backwards compatibility with existing code that may be using the generic format_spec over various sequences.

I do think that would be quite readable and useful. Look again at the example:

  out = 'Temperatures: {0:", ":.1f} Celsius'.format(temps)

As a bonus, it allows nested joins, albeit only for simple cases. For example we could format a dictionary's items:

  temps = {'Rome': 26, 'Paris': 21, 'New York': 18}

  out = 'Temperatures: {0:", ":" ":s}'.format(temps.items())

  # => 'Temperatures: Rome 26, Paris 21, New York 18'

Here the format_spec for temps.items() is <", ":" ":s>. Then ", " would be used as a joiner between the item tuples and <" ":s> would be passed over as the format_spec for each tuple. This in turn would join the tuple's items using a single space and output each item with its simple string format. This could go on and on as needed, adding a colon and joiner string for each nested join operation.

A more complicated mini-language would be needed to output dicts using different format strings for keys and values, but I think that would be veering over to unreadable territory.

What do you think?

I plan to write this as a module and propose it to Python's devs for inclusion in the main tree, but any criticism is welcome before I do that.



More information about the Python-list mailing list