[Python-bugs-list] [ python-Bugs-554916 ] test_unicode fails in wide unicode build
SourceForge.net
noreply@sourceforge.net
Sun, 19 Jan 2003 15:02:05 -0800
Bugs item #554916, was opened at 2002-05-11 18:25
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=554916&group_id=5470
Category: Unicode
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Hudson (mwh)
Assigned to: M.-A. Lemburg (lemburg)
Summary: test_unicode fails in wide unicode build
Initial Comment:
Assigned somewhat arbitrarily.
It's a roundtrip test, I think.
----------------------------------------------------------------------
>Comment By: M.-A. Lemburg (lemburg)
Date: 2003-01-20 00:02
Message:
Logged In: YES
user_id=38388
Michael, is the test still failing or can I close this ?
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2002-10-10 17:30
Message:
Logged In: YES
user_id=38388
I'm not exactly sure why things work again, but I do
know that I looked into this some time ago. Perhaps I
simply forgot to close the bug or one of the UTF-8
codec overhauls remedied the problem.
Here's what I get with python 2.3 UCS4:
>>> len(u'\U000d0000')
1
>>> len(u"\udb00\udc00")
2
>>> u'\U000d0000' == u"\udb00\udc00"
False
>>> len(unicode(u"\udb00\udc00".encode('utf-8'), 'utf-8'))
1
>>> len(unicode(u'\U000d0000'.encode('utf-8'), 'utf-8'))
1
This is what I get with Python 2.2.1:
>>> len(u'\U000d0000')
2
>>> len(u"\udb00\udc00")
2
>>> u'\U000d0000' == u"\udb00\udc00"
1
>>> len(unicode(u"\udb00\udc00".encode('utf-8'), 'utf-8'))
2
>>> len(unicode(u'\U000d0000'.encode('utf-8'), 'utf-8'))
2
There's still a difference there, but the UTF-8 codec behaves
consistently.
----------------------------------------------------------------------
Comment By: Michael Hudson (mwh)
Date: 2002-10-09 14:57
Message:
Logged In: YES
user_id=6656
Hmm. The test has stopped failing, so maybe we can close this.
I'd be happier if I knew why, though.
----------------------------------------------------------------------
Comment By: Michael Hudson (mwh)
Date: 2002-05-13 16:06
Message:
Logged In: YES
user_id=6656
Even better:
$ ./python
Adding parser accelerators ...
Done.
Python 2.2.1 (#1, May 13 2002, 15:02:01)
[GCC 2.96 20000731 (Red Hat Linux 7.1 2.96-98)] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>> unicode(u"\udb00\udc00".encode("utf-8"), "utf-8") ==
u"\udb00\udc00"
0
[18762 refs]
but the test passes. And there was me thinking that it
wasn't a problem on the release22-maint branch.
----------------------------------------------------------------------
Comment By: Michael Hudson (mwh)
Date: 2002-05-13 15:58
Message:
Logged In: YES
user_id=6656
>>> a = u"\udb00\udc00"
[20811 refs]
>>> b = unicode(a.encode("utf-8"), "utf-8")
[21061 refs]
>>> a, b
(u'\U000d0000', u'\U000d0000')
[21063 refs]
>>> len(a), len(b)
(2, 1)
[21063 refs]
Erm...?
----------------------------------------------------------------------
Comment By: Walter Dörwald (doerwalter)
Date: 2002-05-13 15:38
Message:
Logged In: YES
user_id=89016
The minimal failing testcase is:
>>> unicode(u"\udb00\udc00".encode("utf-8"), "utf-8") ==
u"\udb00\udc00"
False
which is strange, because they *seem* to be the same:
u"\udb00\udc00"
u'\U000d0000'
>>> unicode(u"\udb00\udc00".encode("utf-8"), "utf-8")
u'\U000d0000'
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=554916&group_id=5470