On 20/09/13 14:15, Antoine Pitrou wrote:
> From http://docs.python.org/3.3/library/sys.html#sys.intern
> """sys.intern(string)
> Enter string in the table of “interned” strings and return the 
> interned string [...]"""
> In Python 3 context, "string" means "str".

I read that, Antoine. In fact I read the manual, I thought it was a
mistake carried over from 2.x documentation, I tried it just in case
before reporting the "documentation mistake", and I was surprised it
was actually true :-).

I know that intern is used for performance reasons internally to the
interpreter. But I am thinking about memory usage optimizations. For
instance, I have a pickle that is 14MB in size, when "interning" the
strings on it (there are a lot of redundancy), the new size is only
3MB and it loads faster. I can do it because most data in the pickle
are strings, I could NOT do it if I used bytes.

I could do a manual "intern" for hashable objects by hand using an
"object:object" dictionary (that would work for integers too), but I
wonder if extending builtin "sys.intern" would be something to consider.

Anyway, this pattern is easy enough:

Instead of

  object = sys.intern(object)

I could do

  interned = dict()
  object = interned.setdefault(object, object)

