[Python-Dev] Broken strptime in Python 2.3a1 & CV

Brett Cannon bac@OCF.Berkeley.EDU
Tue, 14 Jan 2003 17:25:53 -0800 (PST)


[Tim Peters]

> [Brett Cannon]
> > ...
> > And to comment on the speed drawback: there is already a partial solution
> > to this.  ``_strptime`` has the ability to return the regex it creates to
> > parse the data string and then subsequently have the user pass that in
> > instead of a format string::
>
> You're carrying restructured text too far <wink>::
>

=)  Need the practice; giving a lightning tutorial on it at PyCon.  But I
will cut back on the literal markup.

> I expect it would be better for strptime to maintain its own internal cache
> mapping format strings to compiled regexps (as a dict, indexed by format
> strings).  Dict lookup is cheap.  In most programs, this dict will remain
> empty.  In most of the rest, it will have one entry.  *Some* joker will feed
> it an unbounded number of distinct format strings, though, so blow the cache
> away if it gets "too big":
>
>     regexp = cache.get(fmtstring)
>     if regexp is None:
>         regexp = compile_the_regexp(fmtstring)
>         if len(cache) > 30:  # whatever
>             cache.clear()
>         cache[fmtstring] = regexp
>
> Then you're robust against all comers (it's also thread-safe).
>

Hmm.  Could do that.  Could also cache the locale information that I
discover (only one copy should be enough; don't think people swap between
locales that often).  Caching the object that stores locale info, called
TimeRE (see, no `` `` markup; fast learner I am =), would speed up value
calculations (have to compare against it to figure out what month it is,
etc.) along with creating multiple regexes (since the locale info won't
have to be recalculated).  And then the cache that you are suggesting,
Tim, would completely replace the need to be able to return regex objects.
Spiffy.  =)

OK, so, with the above-mentioned improvements I can rip out the returning
of regex objects functionality.  I am going to assume no one has any issue
with this design idea, so I will do another patch for this (now I have one
on SF dealing with a MacOS 9 issue, going to have one doing default values
and making the %y directive work the way most people expect it to along
with doc changes specifying that you *can* expect reliable behavior, and
now a speed-up patch which will also remove my one use of the string
module; fun  =).

Now all I need is Alex to step in here and fiddle with Tim's code and then
Christian and Raymond to come in and speed up the underlying C code for
Tim's code that Alex touched and we will be in business.  =)

sometimes-I-think-I-read-too-much-python-dev-mail-ly y'rs,
Brett