[Python-ideas] Objectively Quantifying Readability
Steven D'Aprano
steve at pearwood.info
Mon Apr 30 20:42:53 EDT 2018
On Mon, Apr 30, 2018 at 11:28:17AM -0700, Matt Arcidy wrote:
> A study has been done regarding readability in code which may serve as
> insight into this issue. Please see page 8, fig 9 for a nice chart of
> the results, note the negative/positive coloring of the correlations,
> grey/black respectively.
Indeed. It seems that nearly nothing is positively correlated to
increased readability, aside from comments, blank lines, and (very
weakly) arithmetic operators. Everything else hurts readability.
The conclusion here is that if you want readable source code, you should
remove the source code. *wink*
> https://web.eecs.umich.edu/~weimerw/p/weimer-tse2010-readability-preprint.pdf
>
> The criteria in the paper can be applied to assess an increase or
> decrease in readability between current and proposed changes. Perhaps
> even an automated tool could be implemented based on agreed upon
> criteria.
That's a really nice study, and thank you for posting it. There are some
interested observations here, e.g.:
- line length is negatively correlated with readability;
(a point against those who insist that 79 character line
limits are irrelevant since we have wide screens now)
- conventional measures of complexity do not correlate well
with readability;
- length of identifiers was strongly negatively correlated
with readability: long, descriptive identifier names hurt
readability while short variable names appeared to make
no difference;
(going against the common wisdom that one character names
hurt readability -- maybe mathematicians got it right
after all?)
- people are not good judges of readability;
but I think the practical relevance here is very slim. Aside from
questions about the validity of the study (it is only one study, can the
results be replicated, do they generalise beyond the narrowly self-
selected set of university students they tested?) I don't think that it
gives us much guidance here. For example:
1. The study is based on Java, not Python.
2. It looks at a set of pre-existing source code features.
3. It gives us little or no help in deciding whether new syntax will or
won't affect readability: the problem of *extrapolation* remains.
(If we know that, let's say, really_long_descriptive_identifier_names
hurt readability, how does that help us judge whether adding a new kind
of expression will hurt or help readability?)
4. The authors themselves warn that it is descriptive, not prescriptive,
for example replacing long identifier names with randomly selected two
character names is unlikely to be helpful.
5. The unfamiliarity affect: any unfamiliar syntax is going to be less
readable than a corresponding familiar syntax.
It's a great start to the scientific study of readability, but I don't
think it gives us any guidance with respect to adding new features.
> Opinions about readability can be shifted from:
> - "Is it more or less readable?"
> to
> - "This change exceeds a tolerance for levels of readability given
> the scope of the change."
One unreplicated(?) study for readability of Java snippets does not give
us a metric for predicting the readability of new Python syntax. While
it would certainly be useful to study the possibly impact of adding new
features to a language, the authors themselves state that this study is
just "a framework for conducting such experiments".
Despite the limitations of the study, it was an interesting read, thank
you for posting it.
--
Steve
More information about the Python-ideas
mailing list