[Python-Dev] Split unicodeobject.c into subfiles
Stephen J. Turnbull
stephen at xemacs.org
Fri Oct 26 04:35:38 CEST 2012
Antoine Pitrou writes:
> Well, "tangled monolithic mess" is quite true about unicodeobject.c,
> IMO.
s/object.c// and your point remains valid. Just reading the table of
contents for UTR#17 (http://www.unicode.org/reports/tr17/) should
convince you that it's not going to be easy to produce an elegant
implementation!
> Seriously, I agree with Victor: navigating around unicodeobject.c is a
> PITA. Perhaps it isn't if you are using emacs, or you have 35 fingers,
> or just a lot of spare time, but in my experience it's painful.
Sure, but I don't know of a Unicode implementation which isn't.
I don't think that having a unicode/*.[ch] with a dozen files
(including the README etc) in it is going to make it much more
navigable. If there are too many files, it's going to be a PITA to
maintain because there won't be an obvious place to put certain
functions. Eg, I've already mentioned my suspicions about the charmap
code (I apologize for not reading Victor's code to confirm them).
I don't object in principle to splitting the unicodeobject.c. At the
very least, with all due respect to MAL, XEmacs experience with coding
systems (the Emacs equivalent of Python codecs) suggests that there is
very little to be lost by moving the codec implementations to a
separate file from the Unicode object implementation. (Here I'm
talking about codecs in the narrow sense of wire-format to Python3 str
and back, not the more general Python2 sense that included zip and
base64 and so on. Ie, PyUnicode_Translate is not a codec in the
relevant sense.)
On the other hand, I wouldn't be surprised if (despite my earlier
suggestion) codecs and unicode object internals need a close
relationship. (My intuition and sense of style says splitting codecs
from the low level memory management and PEP 393 stuff is a good idea,
but I'm not confident it would have no impact on performance.)
More information about the Python-Dev
mailing list