[Python-ideas] Objectively Quantifying Readability
marcidy at gmail.com
Tue May 1 05:55:05 EDT 2018
On Tue, May 1, 2018 at 1:29 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Mon, Apr 30, 2018 at 8:46 PM, Matt Arcidy <marcidy at gmail.com> wrote:
>> On Mon, Apr 30, 2018 at 5:42 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>>> (If we know that, let's say, really_long_descriptive_identifier_names
>>> hurt readability, how does that help us judge whether adding a new kind
>>> of expression will hurt or help readability?)
>> A new feature can remove symbols or add them. It can increase density
>> on a line, or remove it. It can be a policy of variable naming, or it
>> can specifically note that variable naming has no bearing on a new
>> feature. This is not limited in application. It's just scoring.
>> When anyone complains about readability, break out the scoring
>> criteria and assess how good the _comparative_ readability claim is:
>> 2 vs 10? 4 vs 5? The arguments will no longer be singularly about
>> "readability," nor will the be about the question of single score for
>> a specific statement. The comparative scores of applying the same
>> function over two inputs gives a relative difference. This is what
>> measures do in the mathematical sense.
> Unfortunately, they kind of study they did here can't support this
> kind of argument at all; it's the wrong kind of design. (I'm totally
> in favor of being more evidence-based decisions about language design,
> but interpreting evidence is tricky!) Technically speaking, the issue
> is that this is an observational/correlational study, so you can't use
> it to infer causality. Or put another way: just because they found
> that unreadable code tended to have a high max variable length,
> doesn't mean that taking those variables and making them shorter would
> make the code more readable.
I think you are right about the study, but are tangential to what I am
trying to say.
I am not inferring causality when creating a measure. In the most
tangible example, there is no inference that the euclidean measure
_creates_ a distance, or that _anything_ creates a distance at all, it
merely generates a number based on coordinates in space. That
generation has specific properties which make it a measure, or a
metric, what have you.
The average/mean is another such object: a measure of central tendency
or location. It does not infer causality, it is merely an algorithm
by which things can be compared. Even misapplied, it provides a
consistent ranking of one mean higher than another in an objective
Even if not a single person agrees that line length is a correct
measure for an application, it is a measure. I can feed two lines
into "len" and get consistent results out. This result will be the
same value for all strings of length n, and for a string with length m
> n, the measure will always report a higher measured value for the
string of length m than the string of length n. This is straight out
of measure theory, the results are a distance between the two objects,
not a reason why.
The same goes for unique symbols. I can count the unique symbols in
two lines, and state which is higher. This does not infer a
causality, nor do _which_ symbols matter in this example, only that I
can count them, and that if count_1 == count_2, the ranks are equal
aka no distance between them, and if count_1 > count_2, count 1 is
The cause of complexity can be a number of things, but stating a bunch
of criteria to measure is not about inference. Measuring the
temperature of a steak doesn't infer why people like it medium rare.
It just quantifies it.
> This sounds like a finicky technical complaint, but it's actually a
> *huge* issue in this kind of study. Maybe the reason long variable
> length was correlated with unreadability was that there was one
> project in their sample that had terrible style *and* super long
> variable names, so the two were correlated even though they might not
> otherwise be related. Maybe if you looked at Perl, then the worst
> coders would tend to be the ones who never ever used long variables
> names. Maybe long lines on their own are actually fine, but in this
> sample, the only people who used long lines were ones who didn't read
> the style guide, so their code is also less readable in other ways.
> (In fact they note that their features are highly correlated, so they
> can't tell which ones are driving the effect.) We just don't know.
Your points here are dead on. It's not like a single metric will be
the deciding factor. Nor will a single rank end all disagreements.
It's a tool. Consider line length 79, that's an explicit statement
about readability, it's "hard coded" in the language. Disagreement
with the value 79 or even the metric line-length doesn't mean it's not
a measure. Length is the euclidean measure in one dimension.
The measure will be a set of filters and metrics that combine to a
value or set of values in a reliable way. It's not about any sense of
correctness or even being better, that is, at a minimum, an
> And yeah, it doesn't help that they're only looking at 3 line blocks
> of code and asking random students to judge readability – hard to say
> how that generalizes to real code being read by working developers.
Respectfully, this is practical application and not a PhD defense, so
it will be generated by practical coding. People can argue about the
chosen metrics, but it is a more informative debate than just the
label "readability". If 10 people state a change badly violates one
criteria, perhaps that can be easily addressed. if many people make
multiple claims based on many criteria, there is a real readability
problem (assuming the metric survived SOME vetting of course)
> Nathaniel J. Smith -- https://vorpus.org
More information about the Python-ideas