[Python-Dev] Split unicodeobject.c into subfiles
Benjamin Peterson
benjamin at python.org
Tue Oct 23 10:22:10 CEST 2012
2012/10/22 Victor Stinner <victor.stinner at gmail.com>:
> Hi,
>
> I forked CPython repository to work on my "split unicodeobject.c" project:
> http://hg.python.org/sandbox/split-unicodeobject.c
>
> The result is 10 files (included the existing unicodeobject.c):
>
> 1176 Objects/unicodecharmap.c
> 1678 Objects/unicodecodecs.c
> 1362 Objects/unicodeformat.c
> 253 Objects/unicodeimpl.h
> 733 Objects/unicodelegacy.c
> 1836 Objects/unicodenew.c
> 2777 Objects/unicodeobject.c
> 2421 Objects/unicodeoperators.c
> 1235 Objects/unicodeoscodecs.c
> 1288 Objects/unicodeutfcodecs.c
> 14759 total
>
> This is just a proposition (and work in progress). Everything can be changed :-)
>
> "unicodenew.c" is not a good name. Content of this file may be moved
> somewhere else.
>
> Some files may be merged again if the separation is not justified.
>
> I don't like the "unicode" prefix for filenames, I would prefer a new directory.
>
> --
>
> Shorter files are easier to review and maintain. The compilation is
> faster if only one file is modified.
>
> The MBCS codec requires windows.h. The whole unicodeobject.c includes
> it just for this codec. With the split, only unicodeoscodecs.c
> includes this file.
>
> The MBCS codec needs also a "winver" variable. This variable is
> defined between the BLOOM filter and the unicode_result_unchanged()
> function. How can you explain how these things are sorted? Where
> should I add a new function or variable? With the split, the variable
> is now defined very close to where is it used. You don't have to
> scroll 7000 lines to see where it is used.
>
> If you would like to work on a specific function, you don't have to
> use the search function of your editor to skip thousands to lines. For
> example, the 18 functions and 2 types related to the charmap codec are
> now grouped into one unique and short C file.
>
> It was already possible to extend and maintain unicodeobject.c (some
> people proved it!), but it should now be much simpler with shorter
> files.
I would like to repeat my opposition to splitting unicodeobject.c. I
don't think the benefits of such a split have been well justified,
certainly not to the point that the claim about "much simpler"
maintenance is true.
--
Regards,
Benjamin
More information about the Python-Dev
mailing list