[Python-Dev] Split unicodeobject.c into subfiles

Tue Oct 23 10:22:10 CEST 2012

2012/10/22 Victor Stinner <victor.stinner at gmail.com>:
> Hi,
>
> I forked CPython repository to work on my "split unicodeobject.c" project:
> http://hg.python.org/sandbox/split-unicodeobject.c
>
> The result is 10 files (included the existing unicodeobject.c):
>
>   1176 Objects/unicodecharmap.c
>   1678 Objects/unicodecodecs.c
>   1362 Objects/unicodeformat.c
>    253 Objects/unicodeimpl.h
>    733 Objects/unicodelegacy.c
>   1836 Objects/unicodenew.c
>   2777 Objects/unicodeobject.c
>   2421 Objects/unicodeoperators.c
>   1235 Objects/unicodeoscodecs.c
>   1288 Objects/unicodeutfcodecs.c
>  14759 total
>
> This is just a proposition (and work in progress). Everything can be changed :-)
>
> "unicodenew.c" is not a good name. Content of this file may be moved
> somewhere else.
>
> Some files may be merged again if the separation is not justified.
>
> I don't like the "unicode" prefix for filenames, I would prefer a new directory.
>
> --
>
> Shorter files are easier to review and maintain. The compilation is
> faster if only one file is modified.
>
> The MBCS codec requires windows.h. The whole unicodeobject.c includes
> it just for this codec. With the split, only unicodeoscodecs.c
> includes this file.
>
> The MBCS codec needs also a "winver" variable. This variable is
> defined between the BLOOM filter and the unicode_result_unchanged()
> function. How can you explain how these things are sorted? Where
> should I add a new function or variable? With the split, the variable
> is now defined very close to where is it used. You don't have to
> scroll 7000 lines to see where it is used.
>
> If you would like to work on a specific function, you don't have to
> use the search function of your editor to skip thousands to lines. For
> example, the 18 functions and 2 types related to the charmap codec are
> now grouped into one unique and short C file.
>
> It was already possible to extend and maintain unicodeobject.c (some
> people proved it!), but it should now be much simpler with shorter
> files.

I would like to repeat my opposition to splitting unicodeobject.c. I
don't think the benefits of such a split have been well justified,
certainly not to the point that the claim about "much simpler"
maintenance is true.

-- 
Regards,
Benjamin