[ python-Bugs-1039270 ] time zone tests fail on Windows

Tue Oct 5 21:12:24 CEST 2004

Bugs item #1039270, was opened at 2004-10-03 12:44
Message generated for change (Comment added) made by quiver
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1039270&group_id=5470

Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: George Yoshida (quiver)
Assigned to: Brett Cannon (bcannon)
Summary: time zone tests fail on Windows

Initial Comment:
Following tests fail on Win 2K(Japanese locale):

# test_strptime.py
test_compile (__main__.TimeRETests) ... FAIL
test_bad_timezone (__main__.StrptimeTests) ... ERROR
test_timezone (__main__.StrptimeTests) ... ERROR
test_day_of_week_calculation 
(__main__.CalculationTests) ... ERROR
test_gregorian_calculation 
(__main__.CalculationTests) ... ERROR
test_julian_calculation (__main__.CalculationTests) ... 
ERROR

# test_time.py
test_strptime (test.test_time.TimeTestCase) ... FAIL
===
They all stem from time zone tests and can be divided 
into two groups:

FAIL of test_compile is basically same as #bug 883604.
 http://www.python.org/sf/883604
Local time values include regular expression's 
metacharacters, but they are not escaped.

The rest is caused because strptime can't parse the 
values of strftime.

>>> import time
>>> time.tzname
('\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)', '\x93
\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)')
>>> time.strptime(time.strftime('%Z', time.gmtime()))

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in -toplevel-
    time.strptime(time.strftime('%Z', time.gmtime()))
  File "C:\Python24\lib\_strptime.py", line 291, in strptime
    raise ValueError("time data did not match format:  
data=%s  fmt=%s" %
ValueError: time data did not match format:  data=&#26481;&#20140; 
(&#27161;&#28310;&#26178;)  fmt=%a %b %d %H:%M:%S %Y

The output of running test_time.py and test_strptime.py 
is attached.

----------------------------------------------------------------------

>Comment By: George Yoshida (quiver)
Date: 2004-10-06 04:12

Message:
Logged In: YES 
user_id=671362

Correct my previous post.
There's nothing wrong with strptime on IDLE.

>>> import time
>>> time.strptime(time.strftime('%Z'), '%Z')
(1900, 1, 1, 0, 0, 0, 0, 1, 0)

Please close this bug and apply the patches.
Thanks Brett!

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2004-10-06 03:56

Message:
Logged In: YES 
user_id=671362

bcannon write:

> The .lower() call is intended to normalize since 
capitalization 
> is not standard across OSs.  But if it is a Unicode string it
> should be fine.  And even if it isn't, it is all lowercased for
> comparison anyway, so as long as it is consistent, shouldn't 
it
> still work?
Hmm.

> As for your example of strptime not being able to parse, you 
have
> a bug in it; you forgot the format string.  It should have 
been 
> ``time.strptime(time.strftime('%Z'), '%Z')``.  Give that a 
run
> and let me know what the output is.

Yeah, it's my fault. I forget to specify a format. Even so,
strptime couldn't parse timezone.

> As for this whole multi-byte issue, is it all being returned as
> Unicod  e strings, or is it just a regular string?  In other
> words, what is ``type(time.tzname[0])`` spitting out?  And 
what
> character encoding is all of this in (i.e., what should I pass
> to unicode so as to not have it raise UnicodeDecodeError)?

It returns strings(not a unicode), and the encoding is cp932.
This is a default encoding of Japanese Windows.

  >>> unicode(time.tzname[0], 'cp932')
  u'\u6771\u4eac (\u6a19\u6e96\u6642)'

> And finally, for the regex metacharacter stuff, why the hell 
ar
> e there parentheses in a timezone?!?  Whoever decided 
that wa
> s good did it just to upset me.

Ask M$ Japan :-;

I don't regard 'Tokyo (standard time)' as an acceptable
representation for time zone at all, but this is what Windows
returns as a time zone on my box.

> That does need to be fixed.  Apply the patch I just 
uploaded and let 
> me know if it at least deals with that problem.

With your patch, all tests succeed without any Error or Fail, 
and
strftime <-> strptime conversions work well. This is a backport
candidate, so I created a new patch against Python 2.3 with
listcomps instead of genexprs.

But there is one problem left.

On IDLE, strptime still can't parse. I haven't looked into it in
details, but probably patch #590913 has something to do with 
it.
This patch sets locale at IDLE's start up time and this can 
affect
behaviors of string-related functions and constants.

  [PEP 263 support in IDLE]
  http://www.python.org/sf/590913

# patch applied
>>> time.strptime(time.strptime('%Z'), '%Z')

Traceback (most recent call last):
  File "<pyshell#93>", line 1, in -toplevel-
    time.strptime(time.strptime('%Z'), '%Z')
  File "C:\Python24\lib\_strptime.py", line 291, in strptime
    if not found:
ValueError: time data did not match format:  data=%Z  fmt=%
a %b %d %H:%M:%S %Y
>>> import locale
>>> locale.getlocale()
['Japanese_Japan', '932']  # culprit?

> Have I mentioned I hate timezones?  In case I haven't, I do.

I agree with you one hundred percent.

--George

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2004-10-04 08:16

Message:
Logged In: YES 
user_id=357491

The .lower() call is intended to normalize since capitalization is not 
standard across OSs.  But if it is a Unicode string it should be fine.  And 
even if it isn't, it is all lowercased for comparison anyway, so as long as 
it is consistent, shouldn't it still work?

As for your example of strptime not being able to parse, you have a bug 
in it; you forgot the format string.  It should have been 
``time.strptime(time.strftime('%Z'), '%Z')``.  Give that a run and let me 
know what the output is.

As for this whole multi-byte issue, is it all being returned as Unicode 
strings, or is it just a regular string?  In other words, what is 
``type(time.tzname[0])`` spitting out?  And what character encoding is 
all of this in (i.e., what should I pass to unicode so as to not have it raise 
UnicodeDecodeError)?

And finally, for the regex metacharacter stuff, why the hell are there 
parentheses in a timezone?!?  Whoever decided that was good did it just 
to upset me.  That does need to be fixed.  Apply the patch I just 
uploaded and let me know if it at least deals with that problem.

Have I mentioned I hate timezones?  In case I haven't, I do.  Thanks for 
catching this all, though, George.

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2004-10-04 00:05

Message:
Logged In: YES 
user_id=671362

I've found another bug.
Line 167 & 169 of Lib/_strptime.py contains the expression:
 time.tzname[0].lower()

I guess this is intended to normalize alphabets, but for 
multibyte characters this is really dangerous.

>>> import time
>>> time.tzname[0]
'\x93\x8c\x8b\x9e (\x95W\x8f\x80\x8e\x9e)'
>>> _.lower()
'\x93\x8c\x8b\x9e (\x95w\x8f\x80\x8e\x9e)'

\x95W and \x95w is not the same character.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1039270&group_id=5470