[Python-Dev] Split unicodeobject.c into subfiles

M.-A. Lemburg mal at egenix.com
Tue Oct 23 11:28:39 CEST 2012


On 23.10.2012 10:22, Benjamin Peterson wrote:
> 2012/10/22 Victor Stinner <victor.stinner at gmail.com>:
>> Hi,
>>
>> I forked CPython repository to work on my "split unicodeobject.c" project:
>> http://hg.python.org/sandbox/split-unicodeobject.c
>>
>> The result is 10 files (included the existing unicodeobject.c):
>>
>>   1176 Objects/unicodecharmap.c
>>   1678 Objects/unicodecodecs.c
>>   1362 Objects/unicodeformat.c
>>    253 Objects/unicodeimpl.h
>>    733 Objects/unicodelegacy.c
>>   1836 Objects/unicodenew.c
>>   2777 Objects/unicodeobject.c
>>   2421 Objects/unicodeoperators.c
>>   1235 Objects/unicodeoscodecs.c
>>   1288 Objects/unicodeutfcodecs.c
>>  14759 total
>>
>> This is just a proposition (and work in progress). Everything can be changed :-)
>>
>> "unicodenew.c" is not a good name. Content of this file may be moved
>> somewhere else.
>>
>> Some files may be merged again if the separation is not justified.
>>
>> I don't like the "unicode" prefix for filenames, I would prefer a new directory.
>>
>> --
>>
>> Shorter files are easier to review and maintain. The compilation is
>> faster if only one file is modified.
>>
>> The MBCS codec requires windows.h. The whole unicodeobject.c includes
>> it just for this codec. With the split, only unicodeoscodecs.c
>> includes this file.
>>
>> The MBCS codec needs also a "winver" variable. This variable is
>> defined between the BLOOM filter and the unicode_result_unchanged()
>> function. How can you explain how these things are sorted? Where
>> should I add a new function or variable? With the split, the variable
>> is now defined very close to where is it used. You don't have to
>> scroll 7000 lines to see where it is used.
>>
>> If you would like to work on a specific function, you don't have to
>> use the search function of your editor to skip thousands to lines. For
>> example, the 18 functions and 2 types related to the charmap codec are
>> now grouped into one unique and short C file.
>>
>> It was already possible to extend and maintain unicodeobject.c (some
>> people proved it!), but it should now be much simpler with shorter
>> files.
> 
> I would like to repeat my opposition to splitting unicodeobject.c. I
> don't think the benefits of such a split have been well justified,
> certainly not to the point that the claim about "much simpler"
> maintenance is true.

Same feelings here.

If you do go ahead with such a split, please only split the source
files and keep the unicodeobject.c file which then includes all
the other files. Such a restructuring should not result in compilers
no longer being able to optimize code by inlining functions
in one of the most important basic types we have in Python 3.

Also note that splitting the file in multiple smaller ones will
actually create more maintenance overhead, since patches will
likely no longer be easy to merge from 3.3 to 3.4.

BTW: The positive effect of having everything in one file is
that you no longer have to figure which files to look when
trying to find a piece of logic... it's just a ctrl-f or
ctrl-s away :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 23 2012)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2012-09-27: Released eGenix PyRun 1.1.0 ...       http://egenix.com/go35
2012-09-26: Released mxODBC.Connect 2.0.1 ...     http://egenix.com/go34
2012-09-25: Released mxODBC 3.2.1 ...             http://egenix.com/go33
2012-10-23: Python Meeting Duesseldorf ...                         today

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-Dev mailing list