[issue8941] utf-32be codec failing on UCS-2 python build for 32-bit value
Marc-Andre Lemburg
report at bugs.python.org
Wed Jun 9 18:36:56 CEST 2010
Marc-Andre Lemburg <mal at egenix.com> added the comment:
Antoine Pitrou wrote:
>
> Antoine Pitrou <pitrou at free.fr> added the comment:
>
> Here is a new patch with tests.
>
>> I wonder whether it wouldn't be better to preallocate
>> a Unicode object with size of e.g. size/4 + 16 and
>> then resize the object as necessary in case a surrogate
>> pair needs to be created (won't happen that often in
>> practice).
>>
>> The extra scan for pairs can take long depending on
>> how much data you have to decode and likely doesn't
>> go down well with CPU caches.
>
> Perhaps, but I think this should measured and be the target of a separate issue. We're in rc phase and we should probably minimize potential disruption.
Fair enough.
Here's a little optimization:
- if (qq[iorder[3]] != 0 || qq[iorder[2]] != 0)
+ if (qq[iorder[2]] != 0 || qq[iorder[3]] != 0)
For non-BMP code points, it's more likely that byte 2
will be non-zero.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8941>
_______________________________________
More information about the Python-bugs-list
mailing list