Re: [Python-Dev] PEP 393 Summer of Code Project

Sept. 9, 2011


      I, for one, am very interested. It sounds like the 'unicode' datatype
in Jython does not in fact have O(1) indexing characteristics if the
string contains any characters in the astral plane. Interesting. I
wonder if you have heard from anyone about this affecting their app's
performance?

--Guido

On Fri, Sep 9, 2011 at 12:58 PM, fwierzbicki@gmail.com
<fwierzbicki@gmail.com> wrote:
...
On Fri, Sep 9, 2011 at 10:16 AM, Terry Reedy <tjreedy@udel.edu> wrote:
...
I am curious how you index by code point rather than code unit with 16-bit
code units and how it compares with the method I posted. Is there anything I
can read? Reply off list if you want.
I'll post on-list until someone complains, just in case there are
interested onlookers :)
There aren't docs, but the code is here:
https://bitbucket.org/jython/jython/src/8a8642e45433/src/org/python/core/PyU...
Here are (I think) the most relevant bits for random access -- note
that getString() returns the internal representation of the PyUnicode
which is a java.lang.String
   @Override
   protected PyObject pyget(int i) {
       if (isBasicPlane()) {
           return Py.makeCharacter(getString().charAt(i), true);
       }
       int k = 0;
       while (i > 0) {
           int W1 = getString().charAt(k);
           if (W1 >= 0xD800 && W1 < 0xDC00) {
               k += 2;
           } else {
               k += 1;
           }
           i--;
       }
       int codepoint = getString().codePointAt(k);
       return Py.makeCharacter(codepoint, true);
   }
   public boolean isBasicPlane() {
       if (plane == Plane.BASIC) {
           return true;
       } else if (plane == Plane.UNKNOWN) {
           plane = (getString().length() == getCodePointCount()) ?
Plane.BASIC : Plane.ASTRAL;
       }
       return plane == Plane.BASIC;
   }
   public int getCodePointCount() {
       if (codePointCount >= 0) {
           return codePointCount;
       }
       codePointCount = getString().codePointCount(0, getString().length());
       return codePointCount;
   }
-Frank
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- 
--Guido van Rossum (python.org/~guido)