[Python-ideas] Keyword/Symbol literals

Chris Kaynor ckaynor at zindagigames.com
Wed Jan 21 22:17:18 CET 2015


On Wed, Jan 21, 2015 at 12:58 PM, Chris Angelico <rosuav at gmail.com> wrote:
> On Thu, Jan 22, 2015 at 6:18 AM, Alexander Belopolsky
> <alexander.belopolsky at gmail.com> wrote:
> > Why can't we make interned strings a subclass of strings like bool is a
> > subclass of int?
>
> My gut feeling is that an interned string is exactly the same thing as
> a non-interned string, and that it should be possible for sys.intern()
> to return the original string unchanged.

I would tend to agree that there is no real difference between an
interned and non-interned string in Python. If Python had mutable
strings as the base type, that would be different; then having an
immutable version of string would be helpful for various reasons.

> But on the flip side, if you
> *know* that two strings are interned, you can compare their identities
> instead of their values; currently, identical interned strings compare
> equal quickly, but distinct ones still have to be compared
> character-for-character:
>
> >>> timeit.repeat("x==y","x='a'*65536+'q'; y='a'*65536+'q'")
> [2.535494999960065, 2.2198410001583397, 2.2210810002870858]
> >>> timeit.repeat("x==y","import sys; x=sys.intern('a'*65536+'q'); y=sys.intern('a'*65536+'q')")
> [0.056215000338852406, 0.05387900024652481, 0.04834900004789233]
> >>> timeit.repeat("x==y","x='a'*65536+'x'; y='a'*65536+'y'")
> [2.53101500030607, 2.2251750002615154, 2.228022000286728]
> >>> timeit.repeat("x==y","import sys; x=sys.intern('a'*65536+'x'); y=sys.intern('a'*65536+'y')")
> [2.583013000432402, 2.2194420001469553, 2.224679999984801]
>
> If, in the fourth example, the interpreter knew that both those string
> objects were interned, it could know that they can't possibly be the
> same string. So is that distinction worth coding into the class
> hierarchy?

If such an optimization were desirable, it would be better implemented
as a flag that is accessible on the string object, but would only be
writable internally (no public API for setting*). This would then
allow the string compare to add a check to see if both are interned
after checking the pointer values and early exit with a false result.
There is not real need (that I know of) to have a class distinction to
indicate interning vs a non-mutable string.

Chris K

*Note that hacks similar to the ones that can be used to change the
value of small integers in Python may still function. Depending on
internal implementation, it could also be viable to instead have the
flag actually perform a lookup into the intern table, but that may
negative (much of) the performance benefits.


More information about the Python-ideas mailing list