[Python-Dev] Split unicodeobject.c into subfiles

Tue Oct 23 18:29:53 CEST 2012

On 10/23/2012 10:22 AM, Benjamin Peterson wrote:
> 2012/10/22 Victor Stinner <victor.stinner at gmail.com>:
>> Hi,
>>
>> I forked CPython repository to work on my "split unicodeobject.c" project:
>> http://hg.python.org/sandbox/split-unicodeobject.c
>>
>> The result is 10 files (included the existing unicodeobject.c):
>>
>>   1176 Objects/unicodecharmap.c
>>   1678 Objects/unicodecodecs.c
>>   1362 Objects/unicodeformat.c
>>    253 Objects/unicodeimpl.h
>>    733 Objects/unicodelegacy.c
>>   1836 Objects/unicodenew.c
>>   2777 Objects/unicodeobject.c
>>   2421 Objects/unicodeoperators.c
>>   1235 Objects/unicodeoscodecs.c
>>   1288 Objects/unicodeutfcodecs.c
>>  14759 total
>>
>> This is just a proposition (and work in progress). Everything can be changed :-)
>>
>> "unicodenew.c" is not a good name. Content of this file may be moved
>> somewhere else.
>>
>> Some files may be merged again if the separation is not justified.
>>
>> I don't like the "unicode" prefix for filenames, I would prefer a new directory.
>>
>> --
>>
>> Shorter files are easier to review and maintain. The compilation is
>> faster if only one file is modified.
>>
>> The MBCS codec requires windows.h. The whole unicodeobject.c includes
>> it just for this codec. With the split, only unicodeoscodecs.c
>> includes this file.
>>
>> The MBCS codec needs also a "winver" variable. This variable is
>> defined between the BLOOM filter and the unicode_result_unchanged()
>> function. How can you explain how these things are sorted? Where
>> should I add a new function or variable? With the split, the variable
>> is now defined very close to where is it used. You don't have to
>> scroll 7000 lines to see where it is used.
>>
>> If you would like to work on a specific function, you don't have to
>> use the search function of your editor to skip thousands to lines. For
>> example, the 18 functions and 2 types related to the charmap codec are
>> now grouped into one unique and short C file.
>>
>> It was already possible to extend and maintain unicodeobject.c (some
>> people proved it!), but it should now be much simpler with shorter
>> files.
> 
> I would like to repeat my opposition to splitting unicodeobject.c. I
> don't think the benefits of such a split have been well justified,
> certainly not to the point that the claim about "much simpler"
> maintenance is true.

I agree.  I haven't edited much in unicodeobject.c lately, so this is
just an expression of my preference in general to keep things together.

We tell new Python programmers to stop worrying about using indentation
for grouping because editors are meant to make this easy.  A similar
argument applies to navigating large files: with a decent editor there is
no real problem with large files.

I agree completely with suggestions to improve sectioning and/or comments
within the file.

But once you make any split, people will look for things in the wrong file.
It happens for me every time I look for something in either object.c or
abstract.c -- that's an instance where the function name prefix doesn't imply
the implementation file name, which is otherwise very clear and easy in the
Python sources.

Especially since you're suggesting a huge number of new files, I question the
argument of better navigability.

Georg

BTW:

> If you would like to work on a specific function, you don't have to
> use the search function of your editor to skip thousands to lines. For
> example, the 18 functions and 2 types related to the charmap codec are
> now grouped into one unique and short C file.

After opening the right file, I *still* use the search function to get to
the function I want to edit.  Don't tell me using a scroll bar to scan
for the right place is faster...