[Patches] [ python-Patches-438013 ] Remove 2-byte Py_UCS2 assumptions

noreply@sourceforge.net noreply@sourceforge.net
Thu, 09 Aug 2001 12:46:50 -0700


Patches item #438013, was opened at 2001-07-02 12:43
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=438013&group_id=5470

Category: core (C code)
Group: None
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Tim Peters (tim_one)
>Assigned to: Tim Peters (tim_one)
Summary: Remove 2-byte Py_UCS2 assumptions

Initial Comment:
The patch changes PyUnicode_EncodeUTF16 and 
PyUnicode_DecodeUTF16 to work without assuming the 
existence of a (exactly) 2-byte type.

There are no more references remaining in the code 
base to Py_UCS2, except for what looks to be a now-
pointless complaint in unicodeobject.h.


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2001-08-09 12:46

Message:
Logged In: YES 
user_id=31435

Changed to Accepted and assigned back to me.  I'll check it 
in later tonight.  Thanks!

Macros are defined and expanded in the preprocessor stage; 
they don't care about function boundaries, in part because 
the preprocessor has no idea what anything "means" (i.e., 
the source code is just a giant string of characters to the 
preprocessor).

I haven't timed it, but wouldn't be surprised if it were 
actually faster now:  it removes all need for runtime 
enddianess tests and branches in the inner loops.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-09 12:26

Message:
Logged In: YES 
user_id=38388

Sorry, it turned out that I was looking a the wrong part of the patch... what worried me was these compares, but 
they are part of the original, not your patched version:

! 	if (*q == 0xFEFF) {
! 	    q++;
  	    bo = -1;
! 	} else if (*q == 0xFFFE) {
! 	    q++;
  	    bo = 1;
  	}

Given my mistake, I'd say, the patch looks OK :-) 

Just two nits: 
1. #defining macros inside functions isn't portable AFAIK. Better put the STORECHAR before the function and then 
#undef it just behind it. 
2. The hiliho thingie is likely going to slow down the codec; we'll leave that for the next generation, if you don't mind 
;-)

Please check it in. Thanks.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-08-09 11:22

Message:
Logged In: YES 
user_id=31435

Marc-Andre, if there's some specific line or section of 
code here you're worried about, please try to explain the 
hangup in detail.  The code looks "almost obviously 
correct" to me, and I assume it did to /F too.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-08-08 09:25

Message:
Logged In: YES 
user_id=31435

The switch to unsigned char is coupled with code to read 
and write one byte at a time; it should work fine on any 
box where an unsigned char is 8 bits (incl. Palm Pilots and 
Crays); and on any box where it's larger than 8 bits (I 
don't of any such box) provided input routines arrange to 
store only 8 bits per native machine byte.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-08 01:58

Message:
Logged In: YES 
user_id=38388

It seems that you have replaced Py_UCS2 with "unsigned char" in the latest patch. This won't work for obvious 
reasons (maybe on Crays, don't know ;-). 

Shouldn't this have been "unsigned int" ?!

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-08-03 16:56

Message:
Logged In: YES 
user_id=31435

New patch attached, and back to MAL.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-08-02 22:59

Message:
Logged In: YES 
user_id=31435

Marked Out of Date as per MAL's remark, and assigned back 
to me.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-02 09:39

Message:
Logged In: YES 
user_id=38388

Tim, please resubmit the patch -- it no longer applies to
the current CVS tree. Thanks.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-07-02 13:27

Message:
Logged In: YES 
user_id=31435

Isn't Py_UNICODE always big enough to hold a UCS-2 code 
point?  If the latter is 16 bits (which I assume), C 
guarantees an unsigned short is big enough to hold it (and 
doesn't guarantee an int is bigger than that -- although 
Python would be pretty useless if an int weren't bigger!).

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2001-07-02 13:09

Message:
Logged In: YES 
user_id=38376

+1 from here.

Py_UCS2 should either go away, or be redefined as "large 
enough to hold a UCS-2 code point" (maybe there's some 
codec that may want to use such a data type?  in real 
life, "unsigned int" is probably a decent approximation...)

</F>

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=438013&group_id=5470