[Patches] [ python-Patches-474274 ] Pure Python strptime() (PEP 42)

noreply@sourceforge.net noreply@sourceforge.net
Tue, 04 Jun 2002 16:33:37 -0700


Patches item #474274, was opened at 2001-10-23 19:15
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Brett Cannon (bcannon)
Assigned to: Skip Montanaro (montanaro)
Summary: Pure Python strptime() (PEP 42)

Initial Comment:
The attached file contains a pure Python version of
strptime().  It attempts to operate as much like
time.strptime() within reason.  Where vagueness or
obvious platform dependence existed, I tried to
standardize and be reasonable.

PEP 42 makes a request for a portable, consistent
version of time.strptime():

- Add a portable implementation of time.strptime() that
works in
      clearly defined ways on all platforms.

This module attempts to close that feature request.

The code has been tested thoroughly by myself as well
as some other people who happened to have caught the
post I made to c.l.p a while back and used the module.

It is available at the Python Cookbook
(http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/56036).
 It has been approved by the editors there and thus is
listed as approved.  It is also being considered for
inclusion in the book (thanks, Alex, for encouraging
this submission).

A PyUnit testing suite for the module is available at
http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime
along with the code for the function itself.
Localization has been handled in a modular way using
regexes.  All of it is self-explanatory in the doc
strings.  It is very straight-forward to include your
own localization settings or modify the two languages
included in the module  (English and Swedish).

If the code needs to have its license changed, I am
quite happy to do it (I have already given the OK to
the Python Cookbook).

-Brett Cannon

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-04 19:33

Message:
Logged In: YES 
user_id=33168

Hopefully, I'm looking at the correct patch this time. :-)

To answer one question you had (re:  'yY' vs. ('y', 'Y')),
I'm not sure people really care.  It's not big to me.
Although 'yY' is faster than ('y', 'Y').

In order to try to reduce the lines where you raise an error
(in __init__)
you could change 'sequence of ... must be X items long' to
'... must have/contain X items'.

Generally, it would be nice to make sure none of the lines
are over 72-79 chars (see PEP 8).

Instead of doing:
    newlist = list(orig)
    newlist.append('')
    x = tuple(newlist)

you could do:
    x = tuple(orig[:])
or something like that.  Perhaps a helper function?

In __init__ do you want to check the params against 'is None'
If someone passes a non-sequence that doesn't evaluate
to False, the __init__ won't raise a TypeError which it
probably should.

What is the magic date used in __calc_weekday()?
  (1999/3/15+ 22:44:55)  is this significant, should there
be a comment?
  (magic dates are used elsewhere too, e.g., __calc_month,
__calc_am_pm, many more)

__calc_month() doesn't seem to take leap year into account?
  (not sure if this is a problem or not)
In __calc_date_time(), you use date_time[offset] repetatively,
  couldn't you start the loop with something like dto =
date_time[offset] and then use dto
  (dto is not a good name, I'm just making an example)

Are you supposed to use __init__ when deriving from
built-ins (TimeRE(dict)) or __new__?
  (sorry, I don't remember the answer)

In __tupleToRE.sorter(), instead of the last 3 lines, you
can do:
  return cmp(b_length, a_length)

Note:  you can do x = y = z = -1, instead of x = -1 ; y = -1
; z = -1

It could be problematic to compare x is -1.  You should
probably just use ==.
It would be a problem if NSMALLPOSINTS or NSMALLNEGINTS
were not defined in Objects/intobject.c.

This docstring seems backwards:
def gregToJulian(year, month, day):
    """Calculate the Gregorian date from the Julian date."""
I know a lot of these things seem like a pain.
And it's not that bad now, but the problem is maintaining
the code.  It will be easier for everyone else if the code
is similar to the rest.

BTW, protocol on python-dev is pretty loose and friendly. :-)

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 18:33

Message:
Logged In: YES 
user_id=357491

Thanks for being so prompt with your response, Skip.

I found the problem with your %c.  If you look at your
output you will notice that the day of the month is '4', but
if you look at the docs for time.strftime() you will notice
that is specifies the day of the month (%d) as being in the
range [01,31].  The regex for %d (simplified) is
'(3[0-1])|([0-2]\d)'; not being represented by 2 digits
caused the regex to fail.

Now the question becomes do we follow the spec and chaulk
this up to a non-standard strftime() implementation, or do
we adapt strptime to deal with possible improper output from
strftime()?  Changing the regexes should not be a big issue
since I could just tack on '\d' as the last option for all
numerical regexes. 

As for the test error from time.strptime(), I don't know
what is causing it.  If you look at the test you will notice
that all it basically does is parsetime(time.strftime("%Z"),
"%Z").  Now how that can fail I don't know.  The docs do say
that strptime() tends to be buggy, so perhaps this is a case
of this.

One last thing.  Should I wait until the bugs are worked out
before I post to python-dev asking to either add this as a
module to the standard library or change time to a Python
stub and rename timemodule.c?  Should I ask now to get the
ball rolling?  Since I just joined python-dev literally this
morning I don't know what the protocol is.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 01:55

Message:
Logged In: YES 
user_id=44345

Here ya go...

% ./python
Python 2.3a0 (#185, Jun  1 2002, 23:19:40) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> now = time.localtime(time.time())
>>> now
(2002, 6, 4, 0, 53, 39, 1, 155, 1)
>>> time.strftime("%c", now)
'Tue Jun  4 00:53:39 2002'
>>> time.tzname
('CST', 'CDT')
>>> time.strftime("%Z", now)
'CDT'


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-04 01:35

Message:
Logged In: YES 
user_id=357491

I have uploaded a verision 2.0.1 which fixes the %b format
bug (stupid typo on a variable name).

As for the %c directive, I pass that test.  Can you please
send the output of strftime and the time tuple used to
generate it?

As for the time.strptime() failure, I don't have
time.strptime() on any system available to me, so could you
please send me the output you have for strftime('%Z'), and
time.tzname?

I don't know how much %Z should be worried about since its
use is deprecated (according to the time module's
documentation).  Perhaps strptime() should take the
initiative and not support it?

-Brett

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2002-06-04 00:52

Message:
Logged In: YES 
user_id=44345

Brett,

Please see the drastically shortened test_strptime.py.  (Basically all I'm
interested in here is whether or not strptime.strptime and time.strptime
will pass the tests.)  Near the top are two lines, one commented out:

  parsetime = time.strptime
  #parsetime = strptime.strptime

Regardless which version of parsetime I get, I get some errors.  If 
parsetime == time.strptime I get

======================================================================
ERROR: Test timezone directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 69, in test_timezone
    strp_output = parsetime(strf_output, "%Z")
ValueError: unconverted data remains: 'CDT'

If parsetime == strptime.strptime I get

ERROR: *** Test %c directive. ***
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 75, in test_date_time
    self.helper('c', position)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 380, in strptime
    found_dict = found.groupdict()
AttributeError: NoneType object has no attribute 'groupdict'

======================================================================
ERROR: Test for month directives.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_strptime.py", line 31, in test_month
    self.helper(directive, 1)
  File "test_strptime.py", line 17, in helper
    strp_output = parsetime(strf_output, '%'+directive)
  File "strptime.py", line 393, in strptime
    month = list(locale_time.f_month).index(found_dict['b'])
ValueError: list.index(x): x not in list

This is with a very recent interpreter (updated from CVS in the past 
day) running on Mandrake Linux 8.1.

Can you reproduce either or both problems?  Got fixes for the 
strptime.strptime problems?

Thx,

Skip


----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-06-02 03:44

Message:
Logged In: YES 
user_id=357491

I'm afraid you looked at the wrong patch!  My fault since I
accidentally forgot to add a description for my patch.  So
the file with no description is the newest one and
completely supercedes the older file.  I am very sorry about
that.  Trust me, the new version is much better.

I realized the other day that since the time module is a C
extension file, would getting this accepted require getting
BDFL approval to add this as a separate module into the
standard library?  Would the time module have to have a
Python interface module where this is put and all other
methods in the module just pass directly to the extension file?

As for the suggestions, here are my replies to the ones that
still apply to the new file:
* strings are sequences, so instead of if found in ('y',
'Y') you can do if found in 'yY'
-> True, but I personally find it easier to read using the
tuple.  If it is standard practice in the standard library
to do it the suggested way, I will change it.

* daylight should use the new bools True, False (this also
applies to any other flags)
-> Oops.  Since I wrote this under Python 2.2.1 I didn't
think about it.  I will go through the code and look for
places where True and False should be used.

-Brett C.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-06-01 09:46

Message:
Logged In: YES 
user_id=33168

Overall, the patch looks pretty good.  
I didn't check for completeness or consistency, though.

 * You don't need: from exceptions import Exception
 * The comment "from strptime import * will only export
strptime()" is not correct.
 * I'm not sure what should be included for the license.
 * Why do you need success flag in CheckIntegrity, you raise
an exception?
    (You don't need to return anything, raise an exception,
else it's ok)
 * In return_time(), could you change xrange(9) to
range(len(temp_time))
    this removes a dependancy.
 * strings are sequences, so instead of if found in ('y', 'Y')
    you can do if found in 'yY'
 * daylight should use the new bools True, False
   (this also applies to any other flags) * The formatting
doesn't follow the standard (see PEP 8)
    (specifically, spaces after commas, =, binary ops,
comparisons, etc)
 * Long lines should be broken up
The test looks pretty good too.  I didn't check it for
completeness.
The URL is wrong (too high up), the test can be found here:
 http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/code/Python/Scripts/test_strptime.py
I noticed a spelling mistake in the test: anme -> name.

Also, note that PEP 42 has a comment about a python strptime.
So if this gets implemented, we need to update PEP 42.
Thanks.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-05-27 17:38

Message:
Logged In: YES 
user_id=357491

Version 2 of strptime() has now been uploaded.  This nearly
complete rewrite includes the removal of the need to input
locale-specific time info.  All need locale info is gleaned
from time.strftime().  This makes it able to behave exactly
like time.strptime().

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-24 18:15

Message:
Logged In: YES 
user_id=35752

Go ahead and reuse this item.  I'll wait for the updated
version.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2002-03-24 18:01

Message:
Logged In: YES 
user_id=357491

Oops.  I thought I had removed the clause.  Feel free to
remove it.

I am going to be cleaning up the module, though, so if you
would rather not bother reviewing this version and wait on
the cleaned-up one, go ahead.

Speaking of which, should I just reply to this bugfix when I
get around to the update, or start a new patch?

----------------------------------------------------------------------

Comment By: Neil Schemenauer (nascheme)
Date: 2002-03-23 17:41

Message:
Logged In: YES 
user_id=35752

I'm pretty sure this code needs a different license before
it can be accepted.  The current license contains the
"BSD advertising clause".  See
http://www.gnu.org/philosophy/bsd.html.



----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=474274&group_id=5470