PEP 321: Date/Time Parsing and Formatting

Gerrit Holl gerrit at nl.linux.org
Mon Nov 17 12:55:20 EST 2003


Hi,

PEP 321 reads:
> Python 2.3 added a number of simple date and time types in the
> ``datetime`` module.  There's no support for parsing strings in various
> formats and returning a corresponding instance of one of the types.  
> This PEP proposes adding a family of predefined parsing function for
> several commonly used date and time formats, and a facility for generic 
> parsing.

I was recently surprised by this fact. I don't know why there isn't
such a function/method. In my opinion, it isn't a question of whether
to add them or not, but how.

> Input Formats
> =======================
> 
> Useful formats to support include:
> 
> * `ISO8601`_
> * ARPA/`RFC2822`_
> * `ctime`_
> * Formats commonly written by humans such as the American
>   "MM/DD/YYYY", the European "YYYY/MM/DD", and variants such as
>   "DD-Month-YYYY".
> * CVS-style or tar-style dates ("tomorrow", "12 hours ago", etc.)

I think there should be a 'strptime' equivalent, on which the former
three input formats are build. I think the latter should be a class
method of Timedelta. Then, those examples would be used as such:

>>> datetime.datetime.iso8601("1985-08-13 15:03")
datetime(1985, 8, 13, 13, 5)
>>> datetime.datetime.rfc2822("Tue, 13 Aug 1985, 15:03:00 +0100")
datetime(1985, 8, 13, 13, 5)
>>> datetime.date.strptime("13/08/1985", "%d/%m/%Y")
date(1985, 8, 13)
>>> datetime.timedelta.fromstring("tomorrow")
timedelta(1)

A rising question, especially in the latter case, is how to deal with
locale's.

> 1) Add functions to the ``datetime`` module::

> 2) Add class methods to the various types.  There are already various 
>    class methods such as ``.now()``, so this would be pretty natural.::
> 
> 	import datetime
> 	d = datetime.date.parse_iso8601("2003-09-15T10:34:54")
> 	
> 3) Add a separate module (possible names: date, date_parse, parse_date)
>    or subpackage (possible names: datetime.parser) containing parsing 
>    functions::

I prefer solution 2. I think it is the most object-oriented way. And we
already have several date/time modules: datetime, time, calendar. I think
we should have only one, and have calendar integrated into time. I try
to avoid using the time module whenever I can. I don't like it. It doesn't
nicely fit into my brain, it isn't object-oriented, I think it is to much
low-level.

> * Naming convention to use.
> * What exception to raise on errors?  ValueError, or a specialized exception?

The current time.strptime raises a ValueError. This sounds like a good
idea. I personally like to have exceptions as specialized as reasonable,
so I would prefer a subclass of ValueError.

> * Should you know what type you're expecting, or should the parsing figure
>   it out?  (e.g. ``parse_iso8601("yyyy-mm-dd")`` returns a ``date`` instance,
>   but parsing "yyyy-mm-ddThh:mm:ss" returns a ``datetime``.)  Should 
>   there be an option to signal an error if a time is provided where
>   none is expected, or if no time is provided?

Well... datetime is a subclass of date, so any date is a legal datetime.
I propose that date implements a strptime with all applicable codes, and
datetime extends this implementation. Not sure about how to deal with time,
though. Shouldn't datetime be a subclass of both date and time?

> * Anything special required for I18N?  For time zones?

Using relative dates as input uses English input, so this one is suitable
for I18N. I'm not sure about .strptime() though... I don't think it should,
since 05/04/03 may yield entirely different results in different locale's,
which is not true for 'tomorrow'.

> Generic Input Parsing
> =======================
> 
> Is a strptime() implementation that returns ``datetime`` types sufficient?

It would at least need to be in the datetime module.

> Not all input formats need to be supported as output formats, because it's 
> pretty trivial to get the ``strftime()`` argument right for simple things 
> such as YYYY/MM/DD.   Only complicated formats need to be supported; RFC2822
> is currently the only one I can think of.
> 
> Options:
> 
> 1) Provide predefined format strings, so you could write this::

> 2) Provide new methods on all the objects::
> 	
> 	d = datetime.datetime(...)
> 	print d.rfc822_time()

I prefer implementation #2. I'm never very happy with using constants
defined inside modules. I have to type 'module.CONSTANT' all the time...
I think a method is a very suitable way to do this.

yours,
Gerrit.

-- 
205. If the slave of a freed man strike the body of a freed man, his
ear shall be cut off.
          -- 1780 BC, Hammurabi, Code of Law
-- 
Asperger Syndroom - een persoonlijke benadering:
	http://people.nl.linux.org/~gerrit/
Kom in verzet tegen dit kabinet:
	http://www.sp.nl/





More information about the Python-list mailing list