[I18n-sig] Re: How does Python Unicode treat surrogates?
Guido van Rossum
guido@digicool.com
Tue, 26 Jun 2001 17:08:19 -0400
> Mark Davis wrote:
> >
> > That is an interesting approach; one that basically amounts to some
> > convenience functions. For example, instead of writing:
> >
> > myString.substring(myString.cpToIndex(3), myString.cpToIndex(5));
> >
> > you could write:
> >
> > myString.substring(3, 5, myString.CODEPOINT);
> >
> > This hides some of the work, when someone is working in code points. The
> > performance cost is still there, of course; using code point indexes
> > requires each operation to examine every code unit up to that point, which
> > is much more expensive.
>
> Good idea !
>
> > For a general programming language or string library, I'm not sure about
> > implementing this pattern throughout. I know in the ICU library, for
> > example, we have a significant number of functions that take offsets into
> > strings. Having such a parameter on all of them would be clumsy, when most
> > of the time people are simply working in code units.
>
> In Python this would certainly be an elegant way to add the
> code point indexing functionality (Python supports optional arguments
> with default values).
>
> --
> Marc-Andre Lemburg
I still think this should be an add-on module, to emphasize we're not
eager to do a whole lot of surrogate support.
--Guido van Rossum (home page: http://www.python.org/~guido/)