[Patches] [ python-Patches-658820 ] regex fixes for _strptime
noreply@sourceforge.net
noreply@sourceforge.net
Sun, 29 Dec 2002 17:22:09 -0800
Patches item #658820, was opened at 2002-12-26 14:41
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658820&group_id=5470
Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Neal Norwitz (nnorwitz)
Summary: regex fixes for _strptime
Initial Comment:
Neal Norwitz discovered that the regex for the Julian
day would catch a value of 0, which is invalid. He
asked if this and two other values should allow 0.
The Python docs say no (according to what values the
time tuple should have), so I fixed the regexes that
should not catch 0 to not. I also cleaned the order
of some of them. I also made 'W' just use 'U' directly
instead of having it be done using a copy-n-paste in
the code.
One possible issue that I forsee is that 'Y' expects
exactly 4 digits for the year. Is this reasonable,
or should it be more like ``\d+?``? I don't know what
the valid range is, but since the docs specify that it
has the century digits, I figured it should be 4. But
what about when we pass the year 9999? If any
quantifier is put on to 'Y', it must be non-greedy;
otherwise something like ``20021226`` would not be
parsed as 2002-12-26 as it should be.
----------------------------------------------------------------------
>Comment By: Brett Cannon (bcannon)
Date: 2002-12-29 17:22
Message:
Logged In: YES
user_id=357491
Damn SF messed up again (happened on another one of my patches).
Yes, it is a real-world problem. Part of the reason the
regexes are nice is that not only parse the input but also
do basic bounds-checking on that input. So yes, it is an
actual issue.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2002-12-28 21:12
Message:
Logged In: YES
user_id=80475
There is no patch attached.
Also, is this a real world problem or just a theoretical
neatness issue?
----------------------------------------------------------------------
Comment By: Brett Cannon (bcannon)
Date: 2002-12-26 14:51
Message:
Logged In: YES
user_id=357491
I forgot to pose the question as to whether the testing
suite should be changed so as to test all numeric values
for all regexes? That would have caught these problems,
but now that they are fixed I doubt it will be an issue
ever unless _strptime is completely refactored to not use
regexes for parsinng. I only hesitate because that would
be a lot of regex comparisons (could just do edge cases or
could loop through every possible digit combination for a
number plus ones that shouldn't pass) which would take a
large amount of time. Perhaps it could be done but only
be run with ``test_support.use_large_resources``. This
would allow testing for every digit which would be the most
thorough and best since the regexes are not just a bunch of
``\d\d`` regexes.
I am also initially assigning this patch to Neal (nnorwitz)
since he found the original problem.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=658820&group_id=5470