Python3.0 has more duplication in source code than Python2.5

Terry terry.yinzhe at gmail.com
Sat Feb 7 00:50:12 EST 2009


I used a CPD (copy/paste detector) in PMD to analyze the code
duplication in Python source code. I found that Python3.0 contains
more duplicated code than the previous versions. The CPD tool is far
from perfect, but I still feel the analysis makes some sense.

|Source Code      | NLOC     | Dup60   | Dup30   | Rate60    | Rate 30
|
Python1.5(Core)   19418       1072        3023      6%           16%
Python2.5(Core)   35797       1656        6441      5%           18%
Python3.0(Core)   40737       3460        9076      8%           22%
Apache(server)     18693       1114        2553      6%           14%

NLOC: The net lines of code
Dup60: Lines of code that has 60 continuous tokens duplicated to other
code (counted twice or more)
Dup30: 30 tokens duplicated
Rate60: Dup60/NLOC
Rate30: Dup30/NLOC

We can see that the common duplicated rate is tended to be stable. But
Python3.0 is slightly bigger than that. Consider the small increase in
NLOC, the duplication rate of Python3.0 might be too big.

Does that say something about the code quality of Python3.0?



More information about the Python-list mailing list