How Can I Increase the Speed of a Large Number of Date Conversions
Larry Bates
larry.bates at websafe.com
Fri Jun 8 00:12:37 EDT 2007
James T. Dennis wrote:
> Some Other Guy <bgates at microsoft.com> wrote:
>> vdicarlo wrote:
>>> I am a programming amateur and a Python newbie who needs to convert
>>> about 100,000,000 strings of the form "1999-12-30" into ordinal dates
>>> for sorting, comparison, and calculations. Though my script does a ton
>>> of heavy calculational lifting (for which numpy and psyco are a
>>> blessing) besides converting dates, it still seems to like to linger
>>> in the datetime and time libraries. (Maybe there's a hot module in
>>> there with a cute little function and an impressive set of
>>> attributes.)
>> ...
>>> dateTuple = time.strptime("2005-12-19", '%Y-%m-%d')
>>> dateTuple = dateTuple[:3]
>>> date = datetime.date(dateTuple[0], dateTuple[1],
>>> dateTuple[2])
>>> ratingDateOrd = date.toordinal()
>
>> There's nothing terribly wrong with that, although strptime() is overkill
>> if you already know the date format. You could get the date like this:
>
>> date = apply(datetime.date, map(int, "2005-12-19".split('-')))
>
>> But, more importantly... 100,000,000 individual dates would cover 274000
>> years! Do you really need that much?? You could just precompute a
>> dictionary that maps a date string to the ordinal for the last 50 years
>> or so. That's only 18250 entries, and can be computed in less than a second.
>> Lookups after that will be near instantaneous:
>
> For that matter why not memoize the results of each conversion
> (toss it in a dictionary and precede each conversion with a
> check like: if this_date in datecache: return datecache[this_date]
> else: ret=convert(this_date); datecache[this_date]=ret; return ret)
>
> (If you don't believe that will help, consider that a memo-ized
> implementation of a recursive Fibonacci function runs about as quickly
> as iterative approach).
>
>
Even better do something like (not tested):
try: dateord=datedict[cdate]
except KeyError:
dateord=datetime.date(*[int(x) for x in "2005-12-19".split('-'))
datedict[cdate]=dateord
hat way you build the cache on the fly and there is no penalty if
lookup key is already in the cache.
More information about the Python-list
mailing list