[Python-ideas] Alternative Unicode implementations (NSString/NSMutableString)

Ronald Oussoren ronaldoussoren at mac.com
Wed Jul 19 02:07:07 EDT 2017


> On 19 Jul 2017, at 00:35, Jim J. Jewett <jimjjewett at gmail.com> wrote:
> 
> Ronald Oussoren came up with a concrete use case for wanting the
> interpreter to consider something a string, even if it isn't
> implemented with the default datastructure.
> 
> In https://mail.python.org/pipermail/python-ideas/2017-July/046407.html
> he writes:
> 
>   The reason I need to subclass str: in PyObjC I use
>   a subclass of str to represent Objective-C strings
>   (NSString/NSMutableString), and I need to keep track
>   of the original value; mostly because there are some
>   Objective-C APIs that use object identity. The worst
>   part is that fully initialising the PyUnicodeObject fields
>   often isn’t necessary as a lot of Objective-C strings
>   aren’t used as strings in Python code.
> 
> The PyUnicodeObject (via its leading PyASCIIObject member) currently
> uses 7 flag bits including 2 for kind.  Would it be worth adding an
> 8th big to indicate that string is a virtual subclass, and that the
> internals should not be touched directly?  (This would require
> changing some of the macros; at the time of PEP 393 it Martin ruled
> YAGNI ... but is this something that might reasonably be reconsidered,
> if someone did the work.  Which I am considering, but not committing
> to.)

The reason I subclass str is primarily that it isn’t possible to be accepted as string like by the C API otherwise (that is, PyArg_Parse and the like require a PyUnicode_Type instance when the caller asks for a string).  Adding a string equivalent of __index__ would most likely be a solution for my use case[1].

Without such a hook it would be nice to be able to postpone moving to PyUnicode_IS_READY state as long as possible, with a hook to provide the character buffer when the transition happens.  That would make it possible to avoid duplicating the string buffer until it is truly needed. 

Ronald

[1] Ignoring backward compatibility concerns on my side and without having fully thought through the consequences.

> 
> -jJ
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/



More information about the Python-ideas mailing list