On Thu, Jun 16, 2011 at 9:18 AM, Benjamin Root <ben.root@ou.edu> wrote:
On Wednesday, June 15, 2011, Mark Wiebe <mwwiebe@gmail.com> wrote:
> Towards a reasonable behavior with regard to local times, I've made the default repr for datetimes use the C standard library to print them in a local ISO format. Combined with the ISO8601-prescribed behavior of interpreting datetime strings with no timezone specifier to be in local times, this allows the following cases to behave reasonably:
>>>> np.datetime64('now')numpy.datetime64('2011-06-15T15:16:51-0500','s')
>>>> np.datetime64('2011-06-15T18:00')
> numpy.datetime64('2011-06-15T18:00-0500','m')
> As noted in another thread, there can be some extremely surprising behavior as a consequence:
>>>> np.array(['now', '2011-06-15'], dtype='M')array(['2011-06-15T15:18:26-0500', '2011-06-14T19:00:00-0500'], dtype='datetime64[s]')
> Having the 15th of June print out as 7pm on the 14th of June is probably not what one would generally expect, so I've come up with an approach which hopefully deals with this in a good way.
> One firm principal of the datetime in NumPy is that it is always stored as a POSIX time (referencing UTC), or a TAI time. There are two categories of units that can be used, which I will call date units and time units. The date units are 'Y', 'M', 'W', and 'D', while the time units are 'h', 'm', 's', ..., 'as'. Time zones are only applied to datetimes stored in time units, so there's a qualitative difference between date and time units with respect to string conversions and calendar operations.
> I would like to place an 'unsafe' casting barrier between the date units and the time units, so that the above conversion from a date into a datetime will raise an error instead of producing a confusing result. This only applies to datetimes and not timedeltas, because for timedeltas the day <-> hour case is fine, it is just the year/month <-> other units which has issues, and that is already treated with an 'unsafe' casting barrier.
> Two new functions will facilitate the conversions between datetimes with date units and time units:
> date_as_datetime(datearray, hour, minute, second, microsecond, timezone='local', unit=None, out=None), which converts the provided dates into datetimes at the specified time, according to the specified timezone. If 'unit' is specified, it controls the output unit, otherwise it is the units in 'out' or the amount of precision specified in the function.
> datetime_as_date(datetimearray, timezone='local', out=None), which converts the provided datetimes into dates according to the specified timezone.
> In both functions, timezone can be any of 'UTC', 'TAI', 'local', '+/-####', or a datetime.tzinfo object. The latter will allow NumPy datetimes to work with the pytz library for flexible time zone support.
> I would also like to extend the 'today' input string parsing to accept strings like 'today 12:30' to allow a convenient way to express different local times occurring today, mostly useful for interactive usage.
> I welcome any comments on this design, particularly if you can find a case where this doesn't produce a reasonable behavior.
> Cheers,Mark

Is the output for the given usecase above with the mix of 'now' and a
datetime string without tz info intended to still be correct?

No, that case would fail. The resolution of 'now' is seconds, and the resolution of a date string is days, so the case would require a conversion across the date unit/time unit boundary. 
I personally have misgivings about interpreating phrases like "now" and
"today" at this level.  I think it introduces a can of worms that
would be difficult to handle.

I like the convenience it gives at the interactive prompt, but maybe a datetime_from_string function where you can selectively enable/disable allowing of these special values and local times can provide control over this. This is similar to the datetime_as_string function which gives more flexibility than simple conversion to a string.

Consider some arbitrary set of inputs to the array function for
datetime objects.  If they all contain no tz info, then they are all
interpreated the same as-is.  However, if even one element has 'now',
then the inputs are interpreated entirely differently.  This will
confuse people.

The element 'now' has no effect on the other inputs, except to possibly promote the unit to a seconds level of precision. All datetimes are in UTC, and when timezone information is given, that is only used for parsing the input, it is not preserved. 

Just thinking out loud here, What about a case where the inputs are
such that some do not specify tz and some others specify a mix of
timezones? Should that be any different from the case given above?

I think this has the same answer, everything gets converted to UTC. 

It has been awhile for me, but how different is this from Perl's
floating tz for its datetime module?  Maybe we could combine its
approach with your "unsafe" barrier for the ambiguous situations that
perl's datetime module mentions?

I'd rather not attach timezone information to the numpy datetime, the pytz library appears to already support this kind of thing, and I see no reason to duplicate that effort, but rather support the pytz timezone objects in certain datetime manipulation routines.


Ben Root
NumPy-Discussion mailing list