[I18n-sig] How does Python Unicode treat surrogates?
Paul Prescod
paulp@ActiveState.com
Mon, 25 Jun 2001 17:43:05 -0700
"Martin v. Loewis" wrote:
>
> > I agree. But I'd add that if different people really need different
> > performance/simplicity trade-offs then maybe we need multiple variants
> > of the Unicode object.
>
> The question really is: Those people that require a 16-bit Py_UNICODE,
> would they ever need characters outside the BMP?
Hard to tell. People usually want to have their cake and eat it too.
i.e. I want the performance of 16-bit Py_UNICODE but I want to support
the occasional non-BMP character that happens to show up in a document.
> My guess is no, so Fredrik's proposal sounds good to me.
I'm not clear on what Fredrik's proposal is. He says: "let's use either
UCS-2 or UCS-4 for the internal storage". Is he saying:
1. let's choose one or the other today
2. let's make it a compile-time switch
3. make it a runtime option
I could live with 1. for a while longer...I haven't heard of a real user
complaint about our current model. The longer we put it off, the more
acceptable UCS-4 is.
I wouldn't be thrilled with 2., because it makes Python code harder to
move between machines (depends on your build options!)
3 would be okay if it is handled intelligently.
Any of these is better to me than exposing the details of UTF-16 to the
Python programmer in our Unicode type!
--
Take a recipe. Leave a recipe.
Python Cookbook! http://www.ActiveState.com/pythoncookbook