[Patches] [ python-Patches-438013 ] Remove 2-byte Py_UCS2 assumptions
noreply@sourceforge.net
noreply@sourceforge.net
Thu, 09 Aug 2001 12:46:50 -0700
Patches item #438013, was opened at 2001-07-02 12:43
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=438013&group_id=5470
Category: core (C code)
Group: None
Status: Open
>Resolution: Accepted
Priority: 5
Submitted By: Tim Peters (tim_one)
>Assigned to: Tim Peters (tim_one)
Summary: Remove 2-byte Py_UCS2 assumptions
Initial Comment:
The patch changes PyUnicode_EncodeUTF16 and
PyUnicode_DecodeUTF16 to work without assuming the
existence of a (exactly) 2-byte type.
There are no more references remaining in the code
base to Py_UCS2, except for what looks to be a now-
pointless complaint in unicodeobject.h.
----------------------------------------------------------------------
>Comment By: Tim Peters (tim_one)
Date: 2001-08-09 12:46
Message:
Logged In: YES
user_id=31435
Changed to Accepted and assigned back to me. I'll check it
in later tonight. Thanks!
Macros are defined and expanded in the preprocessor stage;
they don't care about function boundaries, in part because
the preprocessor has no idea what anything "means" (i.e.,
the source code is just a giant string of characters to the
preprocessor).
I haven't timed it, but wouldn't be surprised if it were
actually faster now: it removes all need for runtime
enddianess tests and branches in the inner loops.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-09 12:26
Message:
Logged In: YES
user_id=38388
Sorry, it turned out that I was looking a the wrong part of the patch... what worried me was these compares, but
they are part of the original, not your patched version:
! if (*q == 0xFEFF) {
! q++;
bo = -1;
! } else if (*q == 0xFFFE) {
! q++;
bo = 1;
}
Given my mistake, I'd say, the patch looks OK :-)
Just two nits:
1. #defining macros inside functions isn't portable AFAIK. Better put the STORECHAR before the function and then
#undef it just behind it.
2. The hiliho thingie is likely going to slow down the codec; we'll leave that for the next generation, if you don't mind
;-)
Please check it in. Thanks.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-08-09 11:22
Message:
Logged In: YES
user_id=31435
Marc-Andre, if there's some specific line or section of
code here you're worried about, please try to explain the
hangup in detail. The code looks "almost obviously
correct" to me, and I assume it did to /F too.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-08-08 09:25
Message:
Logged In: YES
user_id=31435
The switch to unsigned char is coupled with code to read
and write one byte at a time; it should work fine on any
box where an unsigned char is 8 bits (incl. Palm Pilots and
Crays); and on any box where it's larger than 8 bits (I
don't of any such box) provided input routines arrange to
store only 8 bits per native machine byte.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-08 01:58
Message:
Logged In: YES
user_id=38388
It seems that you have replaced Py_UCS2 with "unsigned char" in the latest patch. This won't work for obvious
reasons (maybe on Crays, don't know ;-).
Shouldn't this have been "unsigned int" ?!
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-08-03 16:56
Message:
Logged In: YES
user_id=31435
New patch attached, and back to MAL.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-08-02 22:59
Message:
Logged In: YES
user_id=31435
Marked Out of Date as per MAL's remark, and assigned back
to me.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-08-02 09:39
Message:
Logged In: YES
user_id=38388
Tim, please resubmit the patch -- it no longer applies to
the current CVS tree. Thanks.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-07-02 13:27
Message:
Logged In: YES
user_id=31435
Isn't Py_UNICODE always big enough to hold a UCS-2 code
point? If the latter is 16 bits (which I assume), C
guarantees an unsigned short is big enough to hold it (and
doesn't guarantee an int is bigger than that -- although
Python would be pretty useless if an int weren't bigger!).
----------------------------------------------------------------------
Comment By: Fredrik Lundh (effbot)
Date: 2001-07-02 13:09
Message:
Logged In: YES
user_id=38376
+1 from here.
Py_UCS2 should either go away, or be redefined as "large
enough to hold a UCS-2 code point" (maybe there's some
codec that may want to use such a data type? in real
life, "unsigned int" is probably a decent approximation...)
</F>
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=438013&group_id=5470