Le mardi 30 novembre 2010 à 09:32 -0500, Alexander Belopolsky a écrit :
On Tue, Nov 30, 2010 at 8:38 AM, Antoine Pitrou
wrote: On Mon, 29 Nov 2010 22:46:33 -0500 Alexander Belopolsky
wrote: In practical terms, UCD comes at a price. The unicodedata module size is over 700K on my machine. This is almost half the size of the python executable and by far the largest extension module. (only CJK encodings come close.) Making builtins depend on the largest extension module for operation does not strike me as sound design.
Well, do they depend on it? _PyUnicode_EncodeDecimal seems to depend only on Objects/unicodectype.c.
My mistake. That was a late night post. I wonder why unicodedata.so is so big then.
It must be character names:
'\N{DIGIT ONE}'
$ python -v dlopen("/.../unicodedata.so", 2); import unicodedata # dynamically loaded from /.../unicodedata.so '1'
From a quick peek using hexdump, character names seem to only account for 1/4 of the module size. That said, I don't think the size is very important. For any non-trivial Python application, the size of unicodedata will be negligible compared to the size of Python objects. Regards Antoine.