[Python-Dev] cc: "Martin v. Löwis" <martin at v.loewis.de>

Nick Maclaren nmm1 at cus.cam.ac.uk
Wed Aug 8 21:31:49 CEST 2007

Re: [Python-Dev] Regular expressions, Unicode etc.
=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin at v.loewis.de> wrote:
> I recommend you use the 4.1 version of the database; this should
> work out of the box, with no change to the build environment at
> all.

I tried that, of course.  See below.

> As for updating it - that has to wait until the next release
> of Python. At that point, 5.1 might be releasesd, so 5.0 might
> get skipped altogether.

Very true.

> I would likely close such a report as "works for me" (after testing
> it does - it did when I last ran it, which was before the release
> of Python 2.5).

I think that you will find that you are using a non-standard
environment and set of Python sources.  I started off with the
standard distribution.

> It did not suffer from bit-rot - it still works just fine for
> the version of the database that is supported.

Really?  I have just checked 2.5.1, and the same defects are there.

> As for the need for redesigning - I don't see that need. What specific
> aspect do you think needs redesigning? If you merely meant to say
> "I don't understand the code" - this is not enough reason, I
> remember it took me some time to understand it as well, but now
> I see that it does precisely what it needs to do, and precisely
> in the way it needs to do that.

Well, here are a selection of the issues that I found:

The Makefile includes the command:
    ncftpget -R ftp.unicode.org . Public/MAPPINGS
Not merely is ncftpget not a standard utility, the current mappings
are no longer at that location.  Indeed, I can see nothing useful in
that directory at present, though I haven't searched it in depth!

Looking through www.unicode.org, I could find the relevant files
for 5.0.0, but for no other version.  No, I am NOT going to type
in over a megabyte of data from the PDF!

makeunicodedata.py has a reference to the Unicode 3.2 files, but
they are not present in the standard distribution, the Makefile
doesn't fetch them, and I can't find them.

makeunicodedata.py refers to (for example) UnicodeData.txt and
Modules/unicodedata_db.h as such, which rather requires it to be
run in a particular directory.  I can find nothing in any file
even referring to this.

Having run it, running 'make all' does not rebuild Python correctly.
I couldn't be bothered to work out why, so I hit it with the usual
trick, 'make distclean'.

And, of course, it SHOULD be possible to upgrade the Unicode data
without having to change version of Python!

Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679

More information about the Python-Dev mailing list