[Tutor] refactoring book and function size

Magnus Lycka magnus@thinkware.se
Wed, 18 Sep 2002 16:29:13 +0200


At 00:06 2002-09-18 -0400, Anthony Barker wrote:
>I have been reading the book "Refactoring: Improving the Design of
>Existing Code" it makes for a good read - particularly if you have
>refactored before on a medium sized project.
>
>One thing I found odd is how they pull out very small bits of code and add
>them to new routines.
>
>I am reading "Code Complete", by Steve McConnell in parallel. - he
>mentions some research that larger subroutines statistically have fewer
>bugs than very small ones. The research shows that routines of up to 200
>lines are not bad.
>
>Personally I find readability drops if you have subroutines for less than
>2-3 statements. Larger routines, with purpose, clean names and simple
>interfaces make understanding a program or module easier.

There is a big span between 2-3 statements and 200 lines...

You should realize that these people come from different
backgrounds. McConnell and Code Complete refers mainly to
C, and I think his research refers to typical procedural
languages as well.

The XP people have their roots in Smalltalk to a large
degree (although I'm not sure about Martin Fowler). Both
Smalltalk syntax and Object-Oriented programming in general
leads to different optima.

Some friends of mine listened to Fowler talking about this
in Bergen two years ago, and the thing they mentioned, and
that he himself even seemed a little puzzled over, was that
he typically made very small methods in Java these days.

On one hand, it makes each method trivial, but on the other
hand, it might lead to a situation where you are a bit lost
with all these methods. In Python there is also the time
involved in function call overhead to consider.

I suppose it might be a bit like with mathematical proofs,
that you can disect a problem until each piece is trivial,
solve each trivial piece, assemble the pieces, and the
problem is solved without you really understanding the
whole code. You're still confident in the result as you
are sure that each piece is correct in itself, and correctly
used. I'm sure detailed unit testing plays a big part here
as well.

If you are capable of simplifying the problems very much,
you will probably end up with shorter routines, and a
smaller program in all. You will also get fewer bugs.

If you try to split large coherent functions in an
artificial way without reducing complexity, just to get
down line-count, I'm guessing things will just get worse.

I'm sure you understand that optimal routine sizes differ
differ with language. Obviously, you must be able to make
some kind of point in a routine, and a very verbose language
is going to require larger routines to be coherent.

I've never programmed a lot in functional languages, but
as far as I understand, functions in ML are typically
much smaller than functions in C for instance.

Actually, I had a look at an O'Caml application called
GeneWeb, and out of almost 1500 routines, about 40% were
no more than 10 lines, and 2.5% were more than 100 lines,
the largest being 263 lines. Median lenght was 13 lines,
and average was 22 LoC.

(Disclaimer: I don't know O'Caml. What I did was too look
at a few files, decide that the routines in these files
always seems to start with a line beginning with "value",
and end with a ';' in the first column of a line. Then I
made a quick Python hack to count lines based on that
presumption. I might be wrong, but it looks like it fits.)

Actually, on closer inspection, it seems I missed 265
one line functions with my scan... Here we go again:

number of functions = 1729
LoCmin = 1
LoCmax = 263
LoCmedian = 10
LoCaverage = 19
LoC<10 = 50%
Loc>100 = 2.1%

I would also like to claim that the kind of problem you
are solving will lead to different routine sizes. I guess
that advanced cryptography might be more complicated,
and for that reason require larger routines, than say a
business administration package.


-- 
Magnus Lycka, Thinkware AB
Alvans vag 99, SE-907 50 UMEA, SWEDEN
phone: int+46 70 582 80 65, fax: int+46 70 612 80 65
http://www.thinkware.se/  mailto:magnus@thinkware.se