Python3.0 has more duplication in source code than Python2.5
Terry
terry.yinzhe at gmail.com
Sat Feb 7 00:50:12 EST 2009
I used a CPD (copy/paste detector) in PMD to analyze the code
duplication in Python source code. I found that Python3.0 contains
more duplicated code than the previous versions. The CPD tool is far
from perfect, but I still feel the analysis makes some sense.
|Source Code | NLOC | Dup60 | Dup30 | Rate60 | Rate 30
|
Python1.5(Core) 19418 1072 3023 6% 16%
Python2.5(Core) 35797 1656 6441 5% 18%
Python3.0(Core) 40737 3460 9076 8% 22%
Apache(server) 18693 1114 2553 6% 14%
NLOC: The net lines of code
Dup60: Lines of code that has 60 continuous tokens duplicated to other
code (counted twice or more)
Dup30: 30 tokens duplicated
Rate60: Dup60/NLOC
Rate30: Dup30/NLOC
We can see that the common duplicated rate is tended to be stable. But
Python3.0 is slightly bigger than that. Consider the small increase in
NLOC, the duplication rate of Python3.0 might be too big.
Does that say something about the code quality of Python3.0?
More information about the Python-list
mailing list