On 10/24/2012 03:15 PM, Nick Coghlan wrote:

Breaking such files up into separately compiled modules serves two purposes:

1. It proves that the code *isn't* a tangled monolithic mess;
2. It enlists the compilation toolchain's assistance in ensuring that remains the case in the future.

Either the code is a "tangled monolithic mess" or it isn't. If it is, then let's fix that, regardless of the size of the file. If it isn't, I don't see breaking up the code among multiple files as providing any benefit. And I see no need for the toolchain's assistance to help us do something without benefit. The line count of the file is essentially unrelated to its inherent quality / maintainability.

We are not special snow flakes - good software engineering practice is advisable for us as well, so a big +1 from me for breaking up the monstrosity that is unicodeobject.c and lowering the barrier to entry for hacking on the individual pieces. This should come with a large block comment in unicodeobject.c explaining how the pieces are put back together again.

I'm all for good software engineering practice. But can you cite objective reasons why large source files are provably bad? Not "tangled monolithic messes", not poorly-factored code. I agree that those are bad--but so far nobody has proposed that either of those is true about unicodeobject.c (unless you are implicitly doing so above), nor have they proposed credible remedies. All I've seen is that unicodeobject.c is a large file, and some people want to break it up into smaller files. I have yet to see anything but handwaving as justification. For example, what is this barrier to entry you suggest exists to hacking on the str object, that will apparently be dispelled simply by splitting one file into multiple files?

Someone proposed breaking up unicodeobject.c into three distinct subsystems and putting those in separate files. I still don't agree. It seems natural to me to have everything associated with the str object in one file, just as we do with every other object I can think of. If this were a genuinely good idea, we should consider doing it with every similar object. But nobody is proposing that. My guess is because the other files in CPython are "small enough". At which point we're right back to the primary motivation simply being the line count of unicodeobject.c, as a purely aesthetic and subjective judgment.

/arry