From https://github.com/r1chardj0n3s/parse/blob/master/README.rst :

```rst
Format Specification
--------------------

Most often a straight format-less ``{}`` will suffice where a more complex
format specification might have been used.

Most of `format()`'s `Format Specification Mini-Language`_ is supported:

   [[fill]align][0][width][.precision][type]

The differences between `parse()` and `format()` are:

- The align operators will cause spaces (or specified fill character) to be
  stripped from the parsed value. The width is not enforced; it just indicates
  there may be whitespace or "0"s to strip.
- Numeric parsing will automatically handle a "0b", "0o" or "0x" prefix.
  That is, the "#" format character is handled automatically by d, b, o
  and x formats. For "d" any will be accepted, but for the others the correct
  prefix must be present if at all.
- Numeric sign is handled automatically.
- The thousands separator is handled automatically if the "n" type is used.
- The types supported are a slightly different mix to the format() types.  Some
  format() types come directly over: "d", "n", "%", "f", "e", "b", "o" and "x".
  In addition some regular expression character group types "D", "w", "W", "s"
  and "S" are also available.
- The "e" and "g" types are case-insensitive so there is not need for
  the "E" or "G" types. The "e" type handles Fortran formatted numbers (no
  leading 0 before the decimal point).

===== =========================================== ========
Type  Characters Matched                          Output
===== =========================================== ========
l     Letters (ASCII)                             str
w     Letters, numbers and underscore             str
W     Not letters, numbers and underscore         str
s     Whitespace                                  str
S     Non-whitespace                              str
d     Digits (effectively integer numbers)        int
D     Non-digit                                   str
n     Numbers with thousands separators (, or .)  int
%     Percentage (converted to value/100.0)       float
f     Fixed-point numbers                         float
F     Decimal numbers                             Decimal
e     Floating-point numbers with exponent        float
      e.g. 1.1e-10, NAN (all case insensitive)
g     General number format (either d, f or e)    float
b     Binary numbers                              int
o     Octal numbers                               int
x     Hexadecimal numbers (lower and upper case)  int
ti    ISO 8601 format date/time                   datetime
      e.g. 1972-01-20T10:21:36Z ("T" and "Z"
      optional)
te    RFC2822 e-mail format date/time             datetime
      e.g. Mon, 20 Jan 1972 10:21:36 +1000
tg    Global (day/month) format date/time         datetime
      e.g. 20/1/1972 10:21:36 AM +1:00
ta    US (month/day) format date/time             datetime
      e.g. 1/20/1972 10:21:36 PM +10:30
tc    ctime() format date/time                    datetime
      e.g. Sun Sep 16 01:03:52 1973
th    HTTP log format date/time                   datetime
      e.g. 21/Nov/2011:00:07:11 +0000
ts    Linux system log format date/time           datetime
      e.g. Nov  9 03:37:44
tt    Time                                        time
      e.g. 10:21:36 PM -5:30
===== =========================================== ========


Some examples of typed parsing with ``None`` returned if the typing
does not match:

.. code-block:: pycon

    >>> parse('Our {:d} {:w} are...', 'Our 3 weapons are...')
    <Result (3, 'weapons') {}>
    >>> parse('Our {:d} {:w} are...', 'Our three weapons are...')
    >>> parse('Meet at {:tg}', 'Meet at 1/2/2011 11:00 PM')
    <Result (datetime.datetime(2011, 2, 1, 23, 0),) {}>

And messing about with alignment:

.. code-block:: pycon

    >>> parse('with {:>} herring', 'with     a herring')
    <Result ('a',) {}>
    >>> parse('spam {:^} spam', 'spam    lovely     spam')
    <Result ('lovely',) {}>

Note that the "center" alignment does not test to make sure the value is
centered - it just strips leading and trailing whitespace.

Width and precision may be used to restrict the size of matched text
from the input. Width specifies a minimum size and precision specifies
a maximum. For example:

.. code-block:: pycon

    >>> parse('{:.2}{:.2}', 'look')           # specifying precision
    <Result ('lo', 'ok') {}>
    >>> parse('{:4}{:4}', 'look at that')     # specifying width
    <Result ('look', 'at that') {}>
    >>> parse('{:4}{:.4}', 'look at that')    # specifying both
    <Result ('look at ', 'that') {}>
    >>> parse('{:2d}{:2d}', '0440')           # parsing two contiguous numbers
    <Result (4, 40) {}>

Some notes for the date and time types:

- the presence of the time part is optional (including ISO 8601, starting
  at the "T"). A full datetime object will always be returned; the time
  will be set to 00:00:00. You may also specify a time without seconds.
- when a seconds amount is present in the input fractions will be parsed
  to give microseconds.
- except in ISO 8601 the day and month digits may be 0-padded.
- the date separator for the tg and ta formats may be "-" or "/".
- named months (abbreviations or full names) may be used in the ta and tg
  formats in place of numeric months.
- as per RFC 2822 the e-mail format may omit the day (and comma), and the
  seconds but nothing else.
- hours greater than 12 will be happily accepted.
- the AM/PM are optional, and if PM is found then 12 hours will be added
  to the datetime object's hours amount - even if the hour is greater
  than 12 (for consistency.)
- in ISO 8601 the "Z" (UTC) timezone part may be a numeric offset
- timezones are specified as "+HH:MM" or "-HH:MM". The hour may be one or two
  digits (0-padded is OK.) Also, the ":" is optional.
- the timezone is optional in all except the e-mail format (it defaults to
  UTC.)
- named timezones are not handled yet.

Note: attempting to match too many datetime fields in a single parse() will
currently result in a resource allocation issue. A TooManyFields exception
will be raised in this instance. The current limit is about 15. It is hoped
that this limit will be removed one day.

.. _`Format String Syntax`:
  http://docs.python.org/library/string.html#format-string-syntax
.. _`Format Specification Mini-Language`:
  http://docs.python.org/library/string.html#format-specification-mini-language
```

On Thu, Sep 17, 2020 at 7:24 PM Wes Turner <wes.turner@gmail.com> wrote:
f"It's not {regex:d{2}}"

https://github.com/r1chardj0n3s/parse

> Parse strings using a specification based on the Python format() syntax.



On Thu, Sep 17, 2020 at 7:15 PM David Mertz <mertz@gnosis.cx> wrote:
I did actually "write the book" on text processing in Python. I think it's painful and awkward, and a terrible idea.

On Thu, Sep 17, 2020, 1:00 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Sep 18, 2020 at 8:54 AM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
> This is a terrible idea.

Python is an excellent language for text manipulation, and text manipulation is an incredibly useful real-world operation. I don't see what you're complaining at.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JOR62XZENA4IABUEWIGL72EL75DK4FWK/
Code of Conduct: http://python.org/psf/codeofconduct/