why () is () and [] is [] work in other way?

Tim Delaney timothy.c.delaney at gmail.com
Mon Apr 23 18:26:53 EDT 2012


On 24 April 2012 06:40, Devin Jeanpierre <jeanpierreda at gmail.com> wrote:

> On Mon, Apr 23, 2012 at 4:27 PM, Devin Jeanpierre
> <jeanpierreda at gmail.com> wrote:
> > Well, no. Immutable objects could always compare equal, for example.
> > This is more expensive though. is as-it-stands is very quick to
> > execute, which is probably attractive to some people (especially for
> > its used in detecting special constants).
>
> I don't know what made me write that so wrong. I meant "immutable
> objects that are equal could always compare the same via is".
>

And doing that would make zero sense, because it directly contradicts the
whole *point* of "is". The point of "is" is to tell you whether or not two
references are to the same object. This is a *useful* property.

I'll leave aside the question of how you determine if an object is
immutable, and restrict the discussion to a few built-in types that are
known to be immutable.

If two objects are not the same object, then lying and saying they are
would remove the opportunity for various programming techniques, such as
interning. Of course, you could say that all immutable objects should be
interned automatically. There are a couple problems with this that I can
think of off the top of my head.

The first problem is memory. If every immutable object is interned then
welcome to the world of ever-expanding memory usage. Ah - but Python has
got around this for interned strings! They're ejected from the intern cache
when there are no more references. Surely we could do the same for integers
and other immutables?

That brings us to performance. You do not want computations involving
immutable objects to suffer severe performance degradation just to make
equal immutable objects have the same identity. But if every single step of
a numerical calculation involved the following sequence of possible steps,
that's exactly what you would be doing:

1. Calculate result;

2. Lookup result in integer intern cache (involves hash() and ==);
- unavoidable

3. Add result to integer intern cache (involves hash() and ==, and maybe
resizing the cache);
- necessary if your result is not currently referenced anywhere else in the
Python VM

4. Lookup previous intermediate result in integer intern
cache (involves hash() and ==);
- necessary if you have a previous intermediate result

5. Eject previous intermediate result from integer intern
cache (involves hash() and ==).
- necessary if you have a previous intermediate result that is not
currently referenced anywhere else in the Python VM

Now think of the Python implementation of any checksum algorithm. Nearly
every intermediate result (a reasonably-large hash) is not going to be used
anywhere else in the VM, and will require all 4 extra steps. Ouch.

Instead, CPython makes the (sensible) choice to intern heavily-used
integers permanently - (-5, 256) IIRC - and leaves the rest up to the
programmer.

Strings are a different question. Unlike integers, where == is cheap, ==
for strings can be prohibitively expensive. Consider the case that for
whatever reason you create a 1GB string. Now imagine creating or deleting a
reference to *any* string potentially involves calling == on the 1GB
string. Ouch again.

Instead, CPython makes the (sensible) choice to automatically intern short
strings that look like names (in the Python sense) and leave everything
else up to the programmer. It's possible for the programmer to manually
intern their 1GB string, but they've then got to deal with the consequences
of doing so.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120424/7fd2afeb/attachment.html>


More information about the Python-list mailing list