How Can I Increase the Speed of a Large Number of Date Conversions
Josiah Carlson
josiah.carlson at sbcglobal.net
Fri Jun 8 01:29:04 EDT 2007
Some Other Guy wrote:
> vdicarlo wrote:
>> I am a programming amateur and a Python newbie who needs to convert
>> about 100,000,000 strings of the form "1999-12-30" into ordinal dates
>> for sorting, comparison, and calculations. Though my script does a ton
>> of heavy calculational lifting (for which numpy and psyco are a
>> blessing) besides converting dates, it still seems to like to linger
>> in the datetime and time libraries. (Maybe there's a hot module in
>> there with a cute little function and an impressive set of
>> attributes.)
> ...
>> dateTuple = time.strptime("2005-12-19", '%Y-%m-%d')
>> dateTuple = dateTuple[:3]
>> date = datetime.date(dateTuple[0], dateTuple[1],
>> dateTuple[2])
>> ratingDateOrd = date.toordinal()
>
> There's nothing terribly wrong with that, although strptime() is overkill
> if you already know the date format. You could get the date like this:
>
> date = apply(datetime.date, map(int, "2005-12-19".split('-')))
>
> But, more importantly... 100,000,000 individual dates would cover 274000
> years! Do you really need that much?? You could just precompute a
> dictionary that maps a date string to the ordinal for the last 50 years
> or so. That's only 18250 entries, and can be computed in less than a second.
> Lookups after that will be near instantaneous:
>
>
> import datetime
>
> days = 365*50 # about 50 years worth
> dateToOrd = {} # dict. of date string to ordinal
...
Then there's the argument of "why bother using real dates?" I mean, all
that is necessary is a mapping of date -> number for sorting. Who needs
accuracy?
for date in inp:
y, m, d = map(int, date.split('-'))
ordinal = (y-1990)*372 + (m-1)*31 + d-1
Depending on the allowable range of years, one could perhaps adjust the
1990 up, and get the range of date ordinals down to about 12 bits (if
one packs netflix data properly, you can get everything in memory).
With a bit of psyco, the above is pretty speedy.
- Josiah
More information about the Python-list
mailing list