I've just checked on Python 2.7.5 and Python 3.3.2 (Win32 versions).

In Python 3.3.2 unicodedata.unidata_version is set to '6.1.0'.

In Python 2.7.5 it is set to '5.2.0' so it looks as though this version is no longer being updated.

Since my initial post I've downloaded the Python 2.7.5 source and have found the makeunicodedata.py script which creates this module.

Are there plans to add the extra data from the other UCD files to this module?  At the moment I am using a module from https://gist.github.com/anonymous/2204527 to obtain the script of a character but it would be nice if this was available from the standard library.


On 6 September 2013 16:38, MRAB <python@mrabarnett.plus.com> wrote:
On 06/09/2013 10:54, Andrew Miller wrote:
The unicodedata module only contains data up to Unicode 5.2 (October
2009), so attempting to reference any character from a later version e.g:

unicodedata.lookup("TURKISH LIRA SIGN")

results in a KeyError.

Also, it seems to be limited to properties in the UnicodeData.txt file
and does not contain any data from the other files from the Unicode
Character Database (the perl library Unicode::UCD is far more complete).

Are there any plans to update this module to the latest Unicode version
(6.2, with 6.3 being released shortly), or is there another module that
provides more up to date information?

Which version of Python are you talking about? Python 3.3 uses Unicode version 6.1.