[Python-Dev] FW: Unicode character name hashing

Favas, Mark (EM, Floreat) Mark.Favas@per.dem.csiro.au
Fri, 14 Jul 2000 13:18:42 +0800


Forgot to copy Python-Dev...

-----Original Message-----
From: Mark Favas [mailto:m.favas@per.dem.csiro.au]
Sent: Friday, 14 July 2000 7:00 AM
To: Bill Tutt
Subject: Re: Unicode character name hashing


Just tried it, and got the same message:

test_ucn
test test_ucn crashed -- exceptions.UnicodeError : Unicode-Escape
decoding error: Invalid Unicode Character Name

Cheers,
	Mark

Bill Tutt wrote:
> 
> Does this patch happen to fix it?
> I'm afraid my skills relating to signed overflow is a bit rusty... :(
> 
> Bill
> 
> ===================================================================
> RCS file: /cvsroot/python/python/dist/src/Modules/ucnhash.c,v
> retrieving revision 1.2
> diff -u -r1.2 ucnhash.c
> --- ucnhash.c   2000/06/29 00:06:39     1.2
> +++ ucnhash.c   2000/07/13 21:41:07
> @@ -30,12 +30,12 @@
> 
>      len = cch;
>      p = (unsigned char *) key;
> -    x = 1694245428;
> +    x = (long)0x64fc2234;
>      while (--len >= 0)
> -        x = (1000003*x) ^ toupper(*(p++));
> +        x = ((0xf4243 * x) & 0xFFFFFFFF) ^ toupper(*(p++));
>      x ^= cch + 10;
> -    if (x == -1)
> -        x = -2;
> +    if (x == (long)0xFFFFFFFF)
> +        x = (long)0xfffffffe;
>      x %= k_cHashElements;
>      /* ensure the returned value is positive so we mimic Python's %
> operator */
>      if (x < 0)
> @@ -52,12 +52,12 @@
> 
>      len = cch;
>      p = (unsigned char *) key;
> -    x = -1917331657;
> +    x = (long)0x8db7d737;
>      while (--len >= 0)
> -        x = (1000003*x) ^ toupper(*(p++));
> +        x = ((0xf4243 * x) & 0xFFFFFFFF) ^ toupper(*(p++));
>      x ^= cch + 10;
> -    if (x == -1)
> -        x = -2;
> +    if (x == (long)0xFFFFFFFF)
> +        x = (long)0xfffffffe;
>      x %= k_cHashElements;
>      /* ensure the returned value is positive so we mimic Python's %
> operator */
>      if (x < 0)
> 
>  -----Original Message-----
> From:   Mark Favas [mailto:m.favas@per.dem.csiro.au]
> Sent:   Thursday, July 13, 2000 1:16 PM
> To:     python-dev@python.org; Bill Tutt
> Subject:        Unicode character name hashing
> 
> [Bill has epiphany]
> >I just had a rather unhappy epiphany this morning.
> >F1, and f2 in ucnhash.c might not work on machines where sizeof(long) >!=
> 32 bits.
> 
> I get the following from test_ucn on an Alpha running Tru64 Unix:
> 
> python Lib/test/test_ucn.py
> UnicodeError: Unicode-Escape decoding error: Invalid Unicode Character
> Name
> 
> This is with the current CVS - and it's been failing this test for some
> time now. I'm happy to test any fixes...
> 
> --
> Email  - m.favas@per.dem.csiro.au        Mark C Favas
> Phone  - +61 8 9333 6268, 0418 926 074   CSIRO Exploration & Mining
> Fax    - +61 8 9383 9891                 Private Bag No 5, Wembley
> WGS84  - 31.95 S, 115.80 E               Western Australia 6913

-- 
Email  - m.favas@per.dem.csiro.au        Mark C Favas
Phone  - +61 8 9333 6268, 0418 926 074   CSIRO Exploration & Mining
Fax    - +61 8 9383 9891                 Private Bag No 5, Wembley
WGS84  - 31.95 S, 115.80 E               Western Australia 6913