[Python-Dev] PEP 393 Summer of Code Project

fwierzbicki at gmail.com fwierzbicki at gmail.com
Fri Sep 9 21:58:41 CEST 2011


On Fri, Sep 9, 2011 at 10:16 AM, Terry Reedy <tjreedy at udel.edu> wrote:

> I am curious how you index by code point rather than code unit with 16-bit
> code units and how it compares with the method I posted. Is there anything I
> can read? Reply off list if you want.
I'll post on-list until someone complains, just in case there are
interested onlookers :)

There aren't docs, but the code is here:
https://bitbucket.org/jython/jython/src/8a8642e45433/src/org/python/core/PyUnicode.java

Here are (I think) the most relevant bits for random access -- note
that getString() returns the internal representation of the PyUnicode
which is a java.lang.String

    @Override
    protected PyObject pyget(int i) {
        if (isBasicPlane()) {
            return Py.makeCharacter(getString().charAt(i), true);
        }

        int k = 0;
        while (i > 0) {
            int W1 = getString().charAt(k);
            if (W1 >= 0xD800 && W1 < 0xDC00) {
                k += 2;
            } else {
                k += 1;
            }
            i--;
        }
        int codepoint = getString().codePointAt(k);
        return Py.makeCharacter(codepoint, true);
    }

    public boolean isBasicPlane() {
        if (plane == Plane.BASIC) {
            return true;
        } else if (plane == Plane.UNKNOWN) {
            plane = (getString().length() == getCodePointCount()) ?
Plane.BASIC : Plane.ASTRAL;
        }
        return plane == Plane.BASIC;
    }

    public int getCodePointCount() {
        if (codePointCount >= 0) {
            return codePointCount;
        }
        codePointCount = getString().codePointCount(0, getString().length());
        return codePointCount;
    }

-Frank


More information about the Python-Dev mailing list