individually updating unicodedata db?

MRAB python at mrabarnett.plus.com
Mon Mar 22 21:27:28 EDT 2010


Vlastimil Brom wrote:
> Hi all,
> I just tried to find some information about the unicodedata database
> and the possibilities of updating it to the latest version of the
> unicode standards (currently 5.2, while python supports 5.1 in the
> latest versions).
> An option to update this database individually might be useful as the
> unicode standard updates seem to be more frequent than the official
> python releases (and not every release is updated to the latest
> available unicode db version either).
> Am I right, that this is not possible without recompiling python from source?
> I eventually found the promissing file
> ...Python-src--2.6.5\Python-2.6.5\Tools\unicode\makeunicodedata.py
> which required the following files from the unicode database to be in
> the same folder:
> EastAsianWidth-3.2.0.txt
> UnicodeData-3.2.0.txt
> CompositionExclusions-3.2.0.txt
> UnicodeData.txt
> EastAsianWidth.txt
> CompositionExclusions.txt
> 
> and also
> Modules/unicodedata_db.h
> Modules/unicodename_db.h,
> Objects/unicodetype_db.h
> 
> After a minor correction - addig the missing "import re" - the script
> was able to run and recreate the above h files.
> I guess, I am stuck here, as I use the precompiled version supplied in
> the windows installer and can't compile python from source to obtain
> the needed unicodedata.pyd.
> Or are there any possibilities I missed to individually upgrade the
> unicodedata databese? (Using Python 2.6.5, Win XPh SP3)
> 
> Thanks in advance for any hints,
>    vbr

 From the look of it the Unicode data is compiled into the DLL, but I
don't see any reason, other than speed, why preprocessed data couldn't
be read from a file at startup by the DLL, provided that the format 
hasn't changed, eg new fields added, without affecting the DLL's
interface to the rest of Python.



More information about the Python-list mailing list