I was directed to post this request to the general Python development community so hopefully this is on topic.

 

One of the weaknesses of the PyUnicode implementation is that the type is concrete and there is no option for an abstract proxy string to a foreign source.  This is an issue for an API like JPype in which java.lang.Strings are passed back from Java.   Ideally these would be a type derived from the Unicode type str, but that requires transferring the memory immediately from Java to Python even when that handle is large and will never be accessed from within Python.  For certain operations like XML parsing this can be prohibitable, so instead of returning a str we return a JString.   (There is a separate issue that Java method names and Python method names conflict so direct inheritance creates some problems.)

 

The JString type can of course be transferred to Python space at any time as both Python Unicode and Java string objects are immutable.  However the CPython API which takes strings only accepts the Unicode type objects which have a concrete implementation.  It is possible to extend strings, but those extensions do not allow for proxing as far as I can tell.  Thus there is no option currently to proxy to a string representation in another language.  The concept of the using the duck type ``__str__`` method is insufficient as this indices that an object can become a string, rather than “this object is effectively a string” for the purposes of the CPython API.

 

One way to address this is to use currently outdated copy of READY to extend Unicode objects to other languages.  A class like JString would be an unready Unicode object which when READY is called transfers the memory from Java, sets up the flags and sets up a pointer to the code point representation.  Unfortunately the READY concept is scheduled for removal and thus the chance to address the needs for proxying a Unicode to another languages representation may be limited. There may be other methods to accomplish this without using the concept of READY.  So long as access to the code points go through the Unicode API and the Unicode object can be extended such that the actual code points may be located outside of the Unicode object then a proxy can still be achieved if there are hooks in it to decided when a transfer should be performed.   Generally the transfer request only needs to happen once  but the key issue being that the number of code points (nor the kind of points) will not be known until the memory is transferred.

 

Java has much the same problem.   Although they defined an interface class “java.lang.CharacterArray” the actually “java.lang.String” class is concrete and almost all API methods take a String rather than the base interface even when the base interface would have been adequate.  Thus just like Python has difficulty treating a foreign string class as it would a native one, Java cannot treat a Python string as native one as well.  So Python strings get represented as CharacterArray type which effectively limits it use greatly.

 

Summary:

 

 

Are there any plans currently to address the concept of a proxy string in PyUnicode API?