[Python-Dev] sys.intern should work on bytes

Jesus Cea jcea at jcea.es
Fri Sep 20 15:33:05 CEST 2013

Hash: SHA1

On 20/09/13 14:15, Antoine Pitrou wrote:
> From http://docs.python.org/3.3/library/sys.html#sys.intern
> """sys.intern(string)
> Enter string in the table of “interned” strings and return the 
> interned string [...]"""
> In Python 3 context, "string" means "str".

I read that, Antoine. In fact I read the manual, I thought it was a
mistake carried over from 2.x documentation, I tried it just in case
before reporting the "documentation mistake", and I was surprised it
was actually true :-).

I know that intern is used for performance reasons internally to the
interpreter. But I am thinking about memory usage optimizations. For
instance, I have a pickle that is 14MB in size, when "interning" the
strings on it (there are a lot of redundancy), the new size is only
3MB and it loads faster. I can do it because most data in the pickle
are strings, I could NOT do it if I used bytes.

I could do a manual "intern" for hashable objects by hand using an
"object:object" dictionary (that would work for integers too), but I
wonder if extending builtin "sys.intern" would be something to consider.

Anyway, this pattern is easy enough:

Instead of

  object = sys.intern(object)

I could do

  interned = dict()
  object = interned.setdefault(object, object)

- -- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/


More information about the Python-Dev mailing list