```rst
Format Specification
--------------------
Most often a straight format-less ``{}`` will suffice where a more complex
format specification might have been used.
Most of `format()`'s `Format Specification Mini-Language`_ is supported:
[[fill]align][0][width][.precision][type]
The differences between `parse()` and `format()` are:
- The align operators will cause spaces (or specified fill character) to be
stripped from the parsed value. The width is not enforced; it just indicates
there may be whitespace or "0"s to strip.
- Numeric parsing will automatically handle a "0b", "0o" or "0x" prefix.
That is, the "#" format character is handled automatically by d, b, o
and x formats. For "d" any will be accepted, but for the others the correct
prefix must be present if at all.
- Numeric sign is handled automatically.
- The thousands separator is handled automatically if the "n" type is used.
- The types supported are a slightly different mix to the format() types. Some
format() types come directly over: "d", "n", "%", "f", "e", "b", "o" and "x".
In addition some regular expression character group types "D", "w", "W", "s"
and "S" are also available.
- The "e" and "g" types are case-insensitive so there is not need for
the "E" or "G" types. The "e" type handles Fortran formatted numbers (no
leading 0 before the decimal point).
===== =========================================== ========
Type Characters Matched Output
===== =========================================== ========
l Letters (ASCII) str
w Letters, numbers and underscore str
W Not letters, numbers and underscore str
s Whitespace str
S Non-whitespace str
d Digits (effectively integer numbers) int
D Non-digit str
n Numbers with thousands separators (, or .) int
% Percentage (converted to value/100.0) float
f Fixed-point numbers float
F Decimal numbers Decimal
e Floating-point numbers with exponent float
e.g. 1.1e-10, NAN (all case insensitive)
g General number format (either d, f or e) float
b Binary numbers int
o Octal numbers int
x Hexadecimal numbers (lower and upper case) int
ti ISO 8601 format date/time datetime
e.g. 1972-01-20T10:21:36Z ("T" and "Z"
optional)
te RFC2822 e-mail format date/time datetime
e.g. Mon, 20 Jan 1972 10:21:36 +1000
tg Global (day/month) format date/time datetime
e.g. 20/1/1972 10:21:36 AM +1:00
ta US (month/day) format date/time datetime
e.g. 1/20/1972 10:21:36 PM +10:30
tc ctime() format date/time datetime
e.g. Sun Sep 16 01:03:52 1973
th HTTP log format date/time datetime
e.g. 21/Nov/2011:00:07:11 +0000
ts Linux system log format date/time datetime
e.g. Nov 9 03:37:44
tt Time time
e.g. 10:21:36 PM -5:30
===== =========================================== ========Some examples of typed parsing with ``None`` returned if the typing
does not match:
.. code-block:: pycon
>>> parse('Our {:d} {:w} are...', 'Our 3 weapons are...')
<Result (3, 'weapons') {}>
>>> parse('Our {:d} {:w} are...', 'Our three weapons are...')
>>> parse('Meet at {:tg}', 'Meet at 1/2/2011 11:00 PM')
<Result (datetime.datetime(2011, 2, 1, 23, 0),) {}>
And messing about with alignment:
.. code-block:: pycon
>>> parse('with {:>} herring', 'with a herring')
<Result ('a',) {}>
>>> parse('spam {:^} spam', 'spam lovely spam')
<Result ('lovely',) {}>
Note that the "center" alignment does not test to make sure the value is
centered - it just strips leading and trailing whitespace.
Width and precision may be used to restrict the size of matched text
from the input. Width specifies a minimum size and precision specifies
a maximum. For example:
.. code-block:: pycon
>>> parse('{:.2}{:.2}', 'look') # specifying precision
<Result ('lo', 'ok') {}>
>>> parse('{:4}{:4}', 'look at that') # specifying width
<Result ('look', 'at that') {}>
>>> parse('{:4}{:.4}', 'look at that') # specifying both
<Result ('look at ', 'that') {}>
>>> parse('{:2d}{:2d}', '0440') # parsing two contiguous numbers
<Result (4, 40) {}>
Some notes for the date and time types:
- the presence of the time part is optional (including ISO 8601, starting
at the "T"). A full datetime object will always be returned; the time
will be set to 00:00:00. You may also specify a time without seconds.
- when a seconds amount is present in the input fractions will be parsed
to give microseconds.
- except in ISO 8601 the day and month digits may be 0-padded.
- the date separator for the tg and ta formats may be "-" or "/".
- named months (abbreviations or full names) may be used in the ta and tg
formats in place of numeric months.
- as per RFC 2822 the e-mail format may omit the day (and comma), and the
seconds but nothing else.
- hours greater than 12 will be happily accepted.
- the AM/PM are optional, and if PM is found then 12 hours will be added
to the datetime object's hours amount - even if the hour is greater
than 12 (for consistency.)
- in ISO 8601 the "Z" (UTC) timezone part may be a numeric offset
- timezones are specified as "+HH:MM" or "-HH:MM". The hour may be one or two
digits (0-padded is OK.) Also, the ":" is optional.
- the timezone is optional in all except the e-mail format (it defaults to
UTC.)
- named timezones are not handled yet.
Note: attempting to match too many datetime fields in a single parse() will
currently result in a resource allocation issue. A TooManyFields exception
will be raised in this instance. The current limit is about 15. It is hoped
that this limit will be removed one day.
.. _`Format String Syntax`:
http://docs.python.org/library/string.html#format-string-syntax.. _`Format Specification Mini-Language`:
http://docs.python.org/library/string.html#format-specification-mini-language