[Python-Dev] dict.setdefault(object, object) instead of "sys.intern()" (was Re: sys.intern should work on bytes)
Jesus Cea
jcea at jcea.es
Fri Sep 20 15:46:41 CEST 2013
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 20/09/13 15:33, Benjamin Peterson wrote:
> Well, the pickler should memoize bytes objects if you have lots of
> the same one in a pickle...
Only if they are the very same object. Not diferent bytes objects with
the same value. Pickle doesn't do "a==b" but "id(a)==id(b)".
Yes, I know that "a==b" would break mutable objects. It is just an
example.
I don't want to pursue that path. Performance of pickle is already
appallingly slow.
In my project, I will do the redundancy removal on my own way, as
explained in ither message on this thread.
Example:
* Original pickle: 14416284 bytes
* Pickle with "interned" strings: 3004880 bytes
(quite an improvement, but this is particular to my case, I have a lot
of string duplications here. The pickle also loads a bit faster)
* Pickle including an extra dictionary of "interned" strings, created
using the "interned.setdefault(object,object)" pattern: 5126587 bytes.
Sniff.
Could I do this more compactly?.
- --
Jesús Cea Avión _/_/ _/_/_/ _/_/_/
jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
Twitter: @jcea _/_/ _/_/ _/_/_/_/_/
jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQCVAwUBUjxRwZlgi5GaxT1NAQKW8wP/dhVa/v3RZbOKvOtogpHGs5nZyjhtChwn
lFK1Lr1wl/+6IgCjgu9axkrRM0LLRaBN91HW+e9AkAM9XSFBQp6qAAqjJpI/jLDp
xRLW9fMRHpD21m1tG9zxziz4ACCLNNDnlsyY9l7oHHbMzaAX6Gbigyml3hEbj0uK
G5hk4VhyKEY=
=m/3T
-----END PGP SIGNATURE-----
More information about the Python-Dev
mailing list