[Python-Dev] sys.intern should work on bytes
Jesus Cea
jcea at jcea.es
Fri Sep 20 15:33:05 CEST 2013
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 20/09/13 14:15, Antoine Pitrou wrote:
> From http://docs.python.org/3.3/library/sys.html#sys.intern
>
> """sys.intern(string)
>
> Enter string in the table of “interned” strings and return the
> interned string [...]"""
>
>
> In Python 3 context, "string" means "str".
I read that, Antoine. In fact I read the manual, I thought it was a
mistake carried over from 2.x documentation, I tried it just in case
before reporting the "documentation mistake", and I was surprised it
was actually true :-).
I know that intern is used for performance reasons internally to the
interpreter. But I am thinking about memory usage optimizations. For
instance, I have a pickle that is 14MB in size, when "interning" the
strings on it (there are a lot of redundancy), the new size is only
3MB and it loads faster. I can do it because most data in the pickle
are strings, I could NOT do it if I used bytes.
I could do a manual "intern" for hashable objects by hand using an
"object:object" dictionary (that would work for integers too), but I
wonder if extending builtin "sys.intern" would be something to consider.
Anyway, this pattern is easy enough:
Instead of
object = sys.intern(object)
I could do
interned = dict()
...
object = interned.setdefault(object, object)
- --
Jesús Cea Avión _/_/ _/_/_/ _/_/_/
jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
Twitter: @jcea _/_/ _/_/ _/_/_/_/_/
jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQCVAwUBUjxOkZlgi5GaxT1NAQIOVgQAhN36yRAAQP1YWbDsXGSamgZnhEULTloB
penRZYTYz/Ir/VM9l27GoXS7ThGrucAkkYZoJqXnUr2vyP0hq6rsfp+N5lzl61Nf
mDJBCtAczzKNdYqQSgMQ+Ugk43KnbEFFX7SB9Y5IkYroWCeWq7+5y6KX3ZKBspXG
lmXotLgpvW0=
=/RNw
-----END PGP SIGNATURE-----
More information about the Python-Dev
mailing list