[ python-Bugs-1065388 ] calendar day/month name lookup too slow
SourceForge.net
noreply at sourceforge.net
Wed Nov 17 02:08:02 CET 2004
Bugs item #1065388, was opened at 2004-11-12 14:12
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1065388&group_id=5470
Category: Python Library
Group: Python 2.4
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: Guido van Rossum (gvanrossum)
Assigned to: Skip Montanaro (montanaro)
Summary: calendar day/month name lookup too slow
Initial Comment:
The day and month lookups in calendar.py recompute the
entire array of names on each __getitem__ call. This
caused me some embarrassment when a colleague put a
month name lookup in a critical inner loop, and
removing the lookup sped it up by an order of magnitude
(and there was plenty going on in that loop!). More to
the point, something that *looks* like a quick lookup
operation should behave as such.
I understand this is hard to fix, because we have no
way of trapping locale changes -- perhaps adding a
mechanism to do that would be the start for fixing this.
----------------------------------------------------------------------
>Comment By: Guido van Rossum (gvanrossum)
Date: 2004-11-16 20:08
Message:
Logged In: YES
user_id=6380
Thanks for the fix, I knew there was something I forgot!
One comment:
[Tim]
> Can't imagine what "*looks* like a quick lookup" could mean in
> any objective sense. I assume your colleague was using "[]"
> notation. Doesn't necessarily look quick to me <wink>.
>
> Any inner loop that expects to run long enough that the
> month may change is necessarily not a critical inner loop <0.5
> wink>.
Well, the loop was something like (not literally):
for year, month, day in <source of many dates>:
monthname = calendar.month_name[month]
key = (year, monthname, day)
<now use key for look up in some other table>
The other table happened to be indexed by (year, monthname,
day) instead of using the numeric month, and IMO
calendar.month_name[month] is indeed the "obvious" way to do
the translation from numeric month to month name if you
don't want to have to type them yourself (never mind that
it's not documented -- but it's in __all__ and tested by the
test suite so I consider that a doc bug).
Since upgrading to Python 2.4 isn't in the cards anytime
soon and we don't want to go down the slippery path of
costum changes to Python, we fixed it by changing the way
the other table was indexed.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-11-13 11:21
Message:
Logged In: YES
user_id=31435
I checked in a variant of the patch. Since this should have
no effect on visible semantics, and is a straightforward
change, there's no reason to wait for 2.5:
Lib/calendar.py 1.35
Lib/test/test_calendar.py 1.8
Misc/NEWS 1.1188
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2004-11-13 10:26
Message:
Logged In: YES
user_id=44345
Looks good to me. I'll check in the change and close out the
report unless you think this should wait until after 2.4 is released.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-11-12 18:09
Message:
Logged In: YES
user_id=31435
Guido, see my earlier patch (which is attached). Looping
cannot be eliminated because i may be a slice object, in
which case __getitem__ needs to return a list. The patch
avoids looping when i is not a slice object. It also builds the
date objects just once. I believe it's the simplest approach
of this kind that's strong enough so that test_calendar still
passes.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2004-11-12 16:47
Message:
Logged In: YES
user_id=6380
The loop could be eliminated from __getitem__ by writing
this instead:
def __getitem__(self, i):
return datetime.date(2001, i, 1).strftime(self.format)
and similar for _localized_day.
But we could improve even upon that by creating an instance
variable of 13 date objects like this:
# in __init__
self._datelist = [None] + [datetime.date(2001, j+1, 1)
for j in range(12)]
def __getitem__(self, i):
return self._datelist[i].strftime(self.format)
(I would make this change myself but I want someone to
review this because I recall there were subtle compatibility
issues around this...)
That's still suboptimal because the strftime is done over
and over, but at least it's done only once per __getitem__
call rather than 12x.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2004-11-12 15:00
Message:
Logged In: YES
user_id=31435
The entire array is recomputed because the index may be a
slice object that references the entire array.
Can't imagine what "*looks* like a quick lookup" could mean in
any objective sense. I assume your colleague was using "[]"
notation. Doesn't necessarily look quick to me <wink>.
Any inner loop that expects to run long enough that the
month may change is necessarily not a critical inner loop <0.5
wink>.
Maybe the attached patch is good enough. It creates
appropriate datetime objects just once, caches their strftime
() methods, and recomputes only as much as a specific
__getitem__ invocation needs.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1065388&group_id=5470
More information about the Python-bugs-list
mailing list