PEP 321: Date/Time Parsing and Formatting

Paul Moore pf_moore at yahoo.co.uk
Mon Nov 17 17:00:38 EST 2003


Gerrit Holl <gerrit at nl.linux.org> writes:

> Python 2.3 added a number of simple date and time types in the
> ``datetime`` module.  There's no support for parsing strings in various
> formats and returning a corresponding instance of one of the types.  
> This PEP proposes adding a family of predefined parsing function for
> several commonly used date and time formats, and a facility for generic 
> parsing.

I assume you're aware of Gustavo Niemeyer's DateUtil module
(https://moin.conectiva.com.br/DateUtil)?

I'm not 100% sure how the parser functionality fits in with this PEP.
It seems to me that this PEP is more focused on parsing specifically
formatted data (not something I need often) whereas Gustavo's function
is about parsing highly general "human input" formats.

As most of my date parsing need is for user input parameters and the
like, I prefer Gustavo's module :-)

[After reading through this PEP and commenting, I'd say that my
preference (which may not be Gustavo's!) would be to add dateutil to
the standard library, with the following changes/additions:

1. Add a dateutil.RFC822_FORMAT for output of RFC822-compliant dates.
2. Extend dateutil.parser.parse to handle additional (CVS-style)
   possibilities - today, tomorrow, yesterday, things like that.
3. Add dateutil.parser.strptime as a wrapper round time.strptime.

I think that's all.]

> * Formats commonly written by humans such as the American
>   "MM/DD/YYYY", the European "YYYY/MM/DD", and variants such as
>   "DD-Month-YYYY".

UK format DD/MM/YYYY is worth adding (in my UK-based opinion :-)) But
you can get all of these via strptime (wrapped to return a datetime
value).

> * CVS-style or tar-style dates ("tomorrow", "12 hours ago", etc.)

That would be nice. I assume it should be combined with a highly
flexible parser, so that the same function that handles "tomorrow"
will also handle "12-dec-2003". This would basically be like Gustavo's
parser, but with extended functionality (Gustavo's doesn't handle
things like "tomorrow").

> 3) Add a separate module (possible names: date, date_parse, parse_date)
>    or subpackage (possible names: datetime.parser) containing parsing 
>    functions::
>    
>    	import datetime
>    	d = datetime.parser.parse_iso8601("2003-09-15T10:34:54")

I'd go for this option. Actually, I'd support including Gustavo's
dateutil module in the standard library. This PEP then involves adding
a number of additional (specialised) parsers to the dateutil.parser
subpackage.

> * What exception to raise on errors?  ValueError, or a specialized exception?

ValueError seems perfectly adequate.

> * Should you know what type you're expecting, or should the parsing figure
>   it out?  (e.g. ``parse_iso8601("yyyy-mm-dd")`` returns a ``date`` instance,
>   but parsing "yyyy-mm-ddThh:mm:ss" returns a ``datetime``.) 

I don't think that the functions should return a type which depends on
the input (I'd push that as a general rule, but I've probably missed
an obvious counterexample - nevermind, I think it applies here
regardless).

>   Should there be an option to signal an error if a time is provided
>   where none is expected, or if no time is provided?

I think that returning a datetime always, with a zero time component
when no time is specified, should be enough. You can use the date()
method of datetime instances to get just the date part if you want it. 
But this is something that should be prototyped - real-world use is
far more important here than theoretical considerations.

> * Anything special required for I18N?  For time zones?

Scary. Do we need to parse "21-janvier-2001"? Only if in a
French-speaking locale?

> Generic Input Parsing
> =======================
>
> Is a strptime() implementation that returns ``datetime`` types sufficient?
>
> XXX if yes, describe strptime here.  Can the existing pure-Python
> implementation be easily retargeted?

Not sufficient, but very useful. It effectively covers all of the
fixed-format cases (with a suitable format string). And it does I18N,
I believe (hard to tell in a UK locale...)

Options:

    * class methods on the 3 datetime classes. This might be hard,
      because datetime is a C extension, and strptime is Python.
    * Modify strptime to return a datetime value rather than a
      struct_time. But this isn't backward compatible, and so is
      probably not on. Shame, as it feels like the right answer.
    * Have a new function in the time module. Either just a wrapper
      round strptime, or a modified strptime, with strptime changed
      into a wrapper round the new function. But a good name is going
      to be hard to come up with.
    * Add a new parameter to strptime (datetime=True or something).
      Ugly, and violates my "functions shouldn't return different
      types depending on their arguments" comment above.
    * A function in a new module - something like
      dateutil.parser.strptime, as a wrapper round time.strptime. 
      (Excuse the subliminal advertising for Gustavo's module - change
      the name if you prefer :-))

> Output Formats
> =======================
>
> Not all input formats need to be supported as output formats, because it's 
> pretty trivial to get the ``strftime()`` argument right for simple things 
> such as YYYY/MM/DD.   Only complicated formats need to be supported; RFC2822
> is currently the only one I can think of.

An *output* format for RFC2822 compliant dates shouldn't be too hard,
surely? Ah, I see what you mean. It's possible, but hard to
*remember*, so it's best to define it somewhere. Good point.

> Options:
>
> 1) Provide predefined format strings, so you could write this::
>
> 	import datetime
> 	d = datetime.datetime(...)
> 	print d.strftime(d.RFC2822_FORMAT) # or datetime.RFC2822_FORMAT?

This is what I'd prefer. A module-level constant in a dateutil module
would be fine for me, too.

> 2) Provide new methods on all the objects::
> 	
> 	d = datetime.datetime(...)
> 	print d.rfc822_time()

Seems overkill. And I'd rather just have strftime for all date output
formatting - one way of doing things, and all that.

Paul.
-- 
This signature intentionally left blank




More information about the Python-list mailing list