How Can I Increase the Speed of a Large Number of Date Conversions

Larry Bates larry.bates at websafe.com
Fri Jun 8 06:12:37 CEST 2007


James T. Dennis wrote:
> Some Other Guy <bgates at microsoft.com> wrote:
>> vdicarlo wrote:
>>> I am a programming amateur and a Python newbie who needs to convert
>>> about 100,000,000 strings of the form "1999-12-30" into ordinal dates
>>> for sorting, comparison, and calculations. Though my script does a ton
>>> of heavy calculational lifting (for which numpy and psyco are a
>>> blessing) besides converting dates, it still seems to like to linger
>>> in the datetime and time libraries.  (Maybe there's a hot module in
>>> there with a cute little function and an impressive set of
>>> attributes.)
>> ...
>>> dateTuple = time.strptime("2005-12-19", '%Y-%m-%d')
>>>             dateTuple = dateTuple[:3]
>>>             date = datetime.date(dateTuple[0], dateTuple[1],
>>> dateTuple[2])
>>>             ratingDateOrd = date.toordinal()
> 
>> There's nothing terribly wrong with that, although strptime() is overkill
>> if you already know the date format.  You could get the date like this:
> 
>>   date = apply(datetime.date, map(int, "2005-12-19".split('-')))
> 
>> But, more importantly... 100,000,000 individual dates would cover 274000
>> years!  Do you really need that much??  You could just precompute a
>> dictionary that maps a date string to the ordinal for the last 50 years
>> or so. That's only 18250 entries, and can be computed in less than a second.
>> Lookups after that will be near instantaneous:
> 
>  For that matter why not memoize the results of each conversion
>  (toss it in a dictionary and precede each conversion with a
>  check like: if this_date in datecache: return datecache[this_date]
>  else: ret=convert(this_date); datecache[this_date]=ret; return ret)
> 
>  (If you don't believe that will help, consider that a memo-ized
>  implementation of a recursive Fibonacci function runs about as quickly
>  as iterative approach).
> 
> 
Even better do something like (not tested):

try: dateord=datedict[cdate]
except KeyError:
    dateord=datetime.date(*[int(x) for x in "2005-12-19".split('-'))
    datedict[cdate]=dateord


hat way you build the cache on the fly and there is no penalty if
lookup key is already in the cache.



More information about the Python-list mailing list