
The unicodedata module only contains data up to Unicode 5.2 (October 2009), so attempting to reference any character from a later version e.g: unicodedata.lookup("TURKISH LIRA SIGN") results in a KeyError. Also, it seems to be limited to properties in the UnicodeData.txt file and does not contain any data from the other files from the Unicode Character Database (the perl library Unicode::UCD is far more complete). Are there any plans to update this module to the latest Unicode version (6.2, with 6.3 being released shortly), or is there another module that provides more up to date information? Thanks, Andrew

The unicodedata module only contains data up to Unicode 5.2 (October 2009), so attempting to reference any character from a later version e.g: unicodedata.lookup("TURKISH LIRA SIGN") results in a KeyError. Also, it seems to be limited to properties in the UnicodeData.txt file and does not contain any data from the other files from the Unicode Character Database (the perl library Unicode::UCD is far more complete). Are there any plans to update this module to the latest Unicode version (6.2, with 6.3 being released shortly), or is there another module that provides more up to date information? Thanks, Andrew

Andrew Miller, 06.09.2013 11:54:
The unicodedata module only contains data up to Unicode 5.2 (October 2009), so attempting to reference any character from a later version e.g:
unicodedata.lookup("TURKISH LIRA SIGN")
results in a KeyError.
Also, it seems to be limited to properties in the UnicodeData.txt file and does not contain any data from the other files from the Unicode Character Database (the perl library Unicode::UCD is far more complete).
Are there any plans to update this module to the latest Unicode version (6.2, with 6.3 being released shortly)
It's been updated to 6.2 almost a year ago, so Python 3.3 should have that. I don't think 6.3 support will be added before Python 3.4, assuming it's final by then. You should open a ticket so that it won't be forgotten before the release. http://bugs.python.org/ That being said, the module is (mostly) generated, so you might be able to fix it up yourself if you need it sooner in a local installation. Stefan

On Fri, 06 Sep 2013 17:33:47 +0200, Stefan Behnel <stefan_ml@behnel.de> wrote:
Andrew Miller, 06.09.2013 11:54:
The unicodedata module only contains data up to Unicode 5.2 (October 2009), so attempting to reference any character from a later version e.g:
unicodedata.lookup("TURKISH LIRA SIGN")
results in a KeyError.
Also, it seems to be limited to properties in the UnicodeData.txt file and does not contain any data from the other files from the Unicode Character Database (the perl library Unicode::UCD is far more complete).
Are there any plans to update this module to the latest Unicode version (6.2, with 6.3 being released shortly)
It's been updated to 6.2 almost a year ago, so Python 3.3 should have that.
3.3 shipped with 6.1. --David

On Fri, 06 Sep 2013 10:54:45 +0100, Andrew Miller <A.J.Miller@bcs.org.uk> wrote:
Are there any plans to update this module to the latest Unicode version (6.2, with 6.3 being released shortly), or is there another module that provides more up to date information?
Python 3.4 currently has 6.2. If 6.3 gets released before the first RC, I'm guessing we will probably upgrade to it. --David

On 06/09/2013 10:54, Andrew Miller wrote:
The unicodedata module only contains data up to Unicode 5.2 (October 2009), so attempting to reference any character from a later version e.g:
unicodedata.lookup("TURKISH LIRA SIGN")
results in a KeyError.
Also, it seems to be limited to properties in the UnicodeData.txt file and does not contain any data from the other files from the Unicode Character Database (the perl library Unicode::UCD is far more complete).
Are there any plans to update this module to the latest Unicode version (6.2, with 6.3 being released shortly), or is there another module that provides more up to date information?
Which version of Python are you talking about? Python 3.3 uses Unicode version 6.1.

I've just checked on Python 2.7.5 and Python 3.3.2 (Win32 versions). In Python 3.3.2 unicodedata.unidata_version is set to '6.1.0'. In Python 2.7.5 it is set to '5.2.0' so it looks as though this version is no longer being updated. Since my initial post I've downloaded the Python 2.7.5 source and have found the makeunicodedata.py script which creates this module. Are there plans to add the extra data from the other UCD files to this module? At the moment I am using a module from https://gist.github.com/anonymous/2204527 to obtain the script of a character but it would be nice if this was available from the standard library. On 6 September 2013 16:38, MRAB <python@mrabarnett.plus.com> wrote:
On 06/09/2013 10:54, Andrew Miller wrote:
The unicodedata module only contains data up to Unicode 5.2 (October 2009), so attempting to reference any character from a later version e.g:
unicodedata.lookup("TURKISH LIRA SIGN")
results in a KeyError.
Also, it seems to be limited to properties in the UnicodeData.txt file and does not contain any data from the other files from the Unicode Character Database (the perl library Unicode::UCD is far more complete).
Are there any plans to update this module to the latest Unicode version (6.2, with 6.3 being released shortly), or is there another module that provides more up to date information?
Which version of Python are you talking about? Python 3.3 uses Unicode version 6.1.
______________________________**_________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/**mailman/listinfo/python-dev<https://mail.python.org/mailman/listinfo/python-dev> Unsubscribe: https://mail.python.org/**mailman/options/python-dev/a.** j.miller%40bcs.org.uk<https://mail.python.org/mailman/options/python-dev/a.j.miller%40bcs.org.uk>

On 9/6/2013 11:55 AM, Andrew Miller wrote:
I've just checked on Python 2.7.5 and Python 3.3.2 (Win32 versions).
In Python 3.3.2 unicodedata.unidata_version is set to '6.1.0'.
In Python 2.7.5 it is set to '5.2.0' so it looks as though this version is no longer being updated.
In general, new features do not go into bugfix releases (x.y.z, where z
= 1). Updating the unidate_version add new features to the unicodedata module.
-- Terry Jan Reedy

Am 06.09.13 20:24, schrieb Terry Reedy:
In Python 2.7.5 it is set to '5.2.0' so it looks as though this version is no longer being updated.
In general, new features do not go into bugfix releases (x.y.z, where z
= 1). Updating the unidate_version add new features to the unicodedata module.
One might argue that an update of the UCD data is within the scope of 2.7, since it's just data, not code that is being changed. I'd argue against that, since this specific change has a chance of breaking existing tests that people might have. Of course, it is up the the release manager of 2.7 to decide on that if such a change would be proposed. Regards, Martin

Am 06.09.13 17:55, schrieb Andrew Miller:
Are there plans to add the extra data from the other UCD files to this module? At the moment I am using a module from https://gist.github.com/anonymous/2204527 to obtain the script of a character but it would be nice if this was available from the standard library.
Well, it is available, and new versions of the UCD are added to new Python releases. Please consider Python 2 as dead wrt. Unicode support. Regards, Martin

2013/9/6 Andrew Miller <A.J.Miller@bcs.org.uk>:
The unicodedata module only contains data up to Unicode 5.2 (October 2009), so attempting to reference any character from a later version e.g:
unicodedata.lookup("TURKISH LIRA SIGN")
results in a KeyError.
Also, it seems to be limited to properties in the UnicodeData.txt file and does not contain any data from the other files from the Unicode Character Database (the perl library Unicode::UCD is far more complete).
Are there any plans to update this module to the latest Unicode version (6.2, with 6.3 being released shortly), or is there another module that provides more up to date information?
I usually keep the latest Python version up to date with the latest Unicode version, so 3.4 will have Unicode 6.2. -- Regards, Benjamin
participants (7)
-
"Martin v. Löwis"
-
Andrew Miller
-
Benjamin Peterson
-
MRAB
-
R. David Murray
-
Stefan Behnel
-
Terry Reedy