PEP 321: Date/Time Parsing and Formatting

John Roth newsgroups at jhrothjr.com
Mon Nov 17 17:59:24 EST 2003


"Gerrit Holl" <gerrit at nl.linux.org> wrote in message
news:mailman.803.1069091744.702.python-list at python.org...
> Hi,
>
> PEP 321 reads:
> > Python 2.3 added a number of simple date and time types in the
> > ``datetime`` module.  There's no support for parsing strings in various
> > formats and returning a corresponding instance of one of the types.
> > This PEP proposes adding a family of predefined parsing function for
> > several commonly used date and time formats, and a facility for generic
> > parsing.
>
> I was recently surprised by this fact. I don't know why there isn't
> such a function/method. In my opinion, it isn't a question of whether
> to add them or not, but how.
>
> > Input Formats
> > =======================
> >
> > Useful formats to support include:
> >
> > * `ISO8601`_
> > * ARPA/`RFC2822`_
> > * `ctime`_
> > * Formats commonly written by humans such as the American
> >   "MM/DD/YYYY", the European "YYYY/MM/DD", and variants such as
> >   "DD-Month-YYYY".

I didn't notice this going past the first time: YYYY/MM/DD is the
ISO standard format, DD/MM/YYYY is the European variant to
the American MM/DD/YYYY.

> > * CVS-style or tar-style dates ("tomorrow", "12 hours ago", etc.)

The question here is class responsibilities. The datetime module is
at a lower conceptual layer than the tarfile module. If the format
is specific to the tarfile module, then it should be that module's
responsibility to do the conversion. If it is a generally useful
capability, then it should be the datetime module's responsibility.
I'd like to see that discussed.

> I think there should be a 'strptime' equivalent, on which the former
> three input formats are build. I think the latter should be a class
> method of Timedelta. Then, those examples would be used as such:
>
> >>> datetime.datetime.iso8601("1985-08-13 15:03")
> datetime(1985, 8, 13, 13, 5)
> >>> datetime.datetime.rfc2822("Tue, 13 Aug 1985, 15:03:00 +0100")
> datetime(1985, 8, 13, 13, 5)
> >>> datetime.date.strptime("13/08/1985", "%d/%m/%Y")
> date(1985, 8, 13)
> >>> datetime.timedelta.fromstring("tomorrow")
> timedelta(1)

As long as it's kept simple. I have a real problem with the second
example; there are simply too many variations out there of the
alphabetic month and day of the week to cover them all.

> A rising question, especially in the latter case, is how to deal with
> locale's.

In the context of an strptime implemetation, there are really
two issues: numeric date format and alphabetic date format.
As far as numeric date format is concerned, a locale dependent
"mm" and "dd" that would switch around depending on whether
the locale used mm/dd/yyyy or dd/mm/yyyy dates would be
adequate. yyyy/mm/dd can always be disambiguated for
dates later than 1300 CE. (the 5th and 6th characters
distinguish them adequately.)

In terms of character months and days of the week, you
need a facility to supply a function. This is getting too complex
for my tastes.


> > 1) Add functions to the ``datetime`` module::
>
> > 2) Add class methods to the various types.  There are already various
> >    class methods such as ``.now()``, so this would be pretty natural.::
> >
> > import datetime
> > d = datetime.date.parse_iso8601("2003-09-15T10:34:54")
> >
> > 3) Add a separate module (possible names: date, date_parse, parse_date)
> >    or subpackage (possible names: datetime.parser) containing parsing
> >    functions::
>
> I prefer solution 2. I think it is the most object-oriented way.

I agree. Class methods allow the system to be extended, module
funtions don't.

> And we
> already have several date/time modules: datetime, time, calendar. I think
> we should have only one, and have calendar integrated into time. I try
> to avoid using the time module whenever I can. I don't like it. It doesn't
> nicely fit into my brain, it isn't object-oriented, I think it is to much
> low-level.

I would *not* go for changing the existing calendar module. Probably
too much of an impact to existing code, and datetime is supposed to be
a replacement for time.

>
> > * Naming convention to use.
> > * What exception to raise on errors?  ValueError, or a specialized
exception?

> > * Should you know what type you're expecting, or should the parsing
figure
> >   it out?  (e.g. ``parse_iso8601("yyyy-mm-dd")`` returns a ``date``
instance,
> >   but parsing "yyyy-mm-ddThh:mm:ss" returns a ``datetime``.)  Should
> >   there be an option to signal an error if a time is provided where
> >   none is expected, or if no time is provided?

Explicit is better than implicit. I'd prefer a class method on the proper
class.

> > * Anything special required for I18N?  For time zones?
>
> Using relative dates as input uses English input, so this one is suitable
> for I18N. I'm not sure about .strptime() though... I don't think it
should,
> since 05/04/03 may yield entirely different results in different locale's,
> which is not true for 'tomorrow'.

Locale dependent format parsing is relatively easy: all it requires is
a generic parse code that maps to two or three output variables.
In other words, one parse code takes care of "mm/dd/yyyy",
"dd/mm/yyyy" and "yyyy/mm/dd". The locale disambiguates which
of the first two is required, with the ISO format (yyyy/mm/dd) being
chosen based on positions 5 and 6). Notice that there is never an
ambiguity if you have an alpha month and a 4 digit year.

>
> > Generic Input Parsing
> > =======================
> >
> > Is a strptime() implementation that returns ``datetime`` types
sufficient?

Should be a class method on all four classes, and return that class.

> It would at least need to be in the datetime module.
>
> > Not all input formats need to be supported as output formats, because
it's
> > pretty trivial to get the ``strftime()`` argument right for simple
things
> > such as YYYY/MM/DD.   Only complicated formats need to be supported;
RFC2822
> > is currently the only one I can think of.

If it's going to support I18N, then it should support the more common
formats directly, or strftime() should support "macro" codes that generate
a complete date with separators as one code.

> >
> > Options:
> >
> > 1) Provide predefined format strings, so you could write this::
>
> > 2) Provide new methods on all the objects::
> >
> > d = datetime.datetime(...)
> > print d.rfc822_time()
>
> I prefer implementation #2. I'm never very happy with using constants
> defined inside modules. I have to type 'module.CONSTANT' all the time...
> I think a method is a very suitable way to do this.

I'd like to reiterate my comment from earlier: Internet data handling
modules are at a higher logical level than datetime. If the formats are
not generally useful, it should be the using module that is responsible
for them: that is, the email and news modules for RFC822. If they
are, then it should be datetime.

John Roth
>
> yours,
> Gerrit.
>






More information about the Python-list mailing list