[Python-Dev] sys.intern should work on bytes

Antoine Pitrou solipsis at pitrou.net
Fri Sep 20 15:44:15 CEST 2013


Le Fri, 20 Sep 2013 15:33:05 +0200,
Jesus Cea <jcea at jcea.es> a écrit :
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 20/09/13 14:15, Antoine Pitrou wrote:
> > From http://docs.python.org/3.3/library/sys.html#sys.intern
> > 
> > """sys.intern(string)
> > 
> > Enter string in the table of “interned” strings and return the 
> > interned string [...]"""
> > 
> > 
> > In Python 3 context, "string" means "str".
> 
> I read that, Antoine. In fact I read the manual, I thought it was a
> mistake carried over from 2.x documentation, I tried it just in case
> before reporting the "documentation mistake", and I was surprised it
> was actually true :-).
> 
> I know that intern is used for performance reasons internally to the
> interpreter. But I am thinking about memory usage optimizations. For
> instance, I have a pickle that is 14MB in size, when "interning" the
> strings on it (there are a lot of redundancy), the new size is only
> 3MB and it loads faster. I can do it because most data in the pickle
> are strings, I could NOT do it if I used bytes.
> 
> I could do a manual "intern" for hashable objects by hand using an
> "object:object" dictionary (that would work for integers too), but I
> wonder if extending builtin "sys.intern" would be something to
> consider.
> 
> Anyway, this pattern is easy enough:
> 
> Instead of
> 
>   object = sys.intern(object)
> 
> I could do
> 
>   interned = dict()
>   ...
>   object = interned.setdefault(object, object)

Yes. The main difference is that sys.intern() will remove the interned
strings when every external reference vanishes. It requires either
weakref'ability (which both str and bytes lack) or special cooperation
from the object destructor (which is why sys.intern() is restricted to
str instead of working with arbitrary objects).

Regards

Antoine.




More information about the Python-Dev mailing list