[Patches] [ python-Patches-658820 ] regex fixes for _strptime

noreply@sourceforge.net noreply@sourceforge.net
Mon, 30 Dec 2002 13:57:50 -0800


Patches item #658820, was opened at 2002-12-26 17:41
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658820&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Neal Norwitz (nnorwitz)
Summary: regex fixes for _strptime

Initial Comment:
Neal Norwitz discovered that the regex for the Julian
day would catch a value of 0, which is invalid.  He
asked if this and two other values  should allow 0. 
The Python docs say no (according to what values the
time tuple should have), so I fixed the regexes that
should not catch 0 to not.  I also cleaned the order 
of some of them.  I also made 'W' just use 'U' directly
instead of having it  be done using a copy-n-paste in
the code.

One possible  issue that I forsee is that 'Y' expects
exactly 4 digits for  the year.  Is  this reasonable,
or should it be more like ``\d+?``?  I don't know what
the valid range is, but since the docs specify that it
has the century digits, I figured  it should be 4.  But
what about when we pass the year 9999?  If any
quantifier is put on to 'Y', it  must be  non-greedy;
otherwise something like ``20021226`` would not be
parsed as 2002-12-26 as it should be.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-12-30 16:57

Message:
Logged In: YES 
user_id=6380

Neal, is this ready for checkin?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-12-29 20:22

Message:
Logged In: YES 
user_id=357491

Damn SF messed up again (happened on another one of my patches).

Yes, it is a real-world problem.  Part of the reason the
regexes are nice is that not only parse the input  but also
do basic bounds-checking on that input.  So yes, it is an
actual issue.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-12-29 00:12

Message:
Logged In: YES 
user_id=80475

There is no patch attached.
Also, is this a real world problem or just a theoretical 
neatness issue?

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-12-26 17:51

Message:
Logged In: YES 
user_id=357491

I forgot to pose the question as to whether the testing
suite  should be changed so as to test all numeric values
for all regexes?  That would have caught  these problems,
but  now that they are fixed I doubt  it  will be an issue
ever unless _strptime is completely refactored to not  use 
regexes for parsinng.  I only hesitate because that  would
be a lot of regex comparisons (could just do edge cases or
could loop through every possible digit combination for a
number plus  ones that shouldn't pass) which would take a
large amount of time.  Perhaps it  could be done but  only
be run with ``test_support.use_large_resources``.  This 
would allow testing for every digit which would be the most
thorough and best since the regexes are not just a bunch of
``\d\d`` regexes.

I am also initially assigning this patch to Neal (nnorwitz)
since he found the original problem.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658820&group_id=5470