On Tue, May 1, 2018 at 1:29 AM, Nathaniel Smith
On Mon, Apr 30, 2018 at 8:46 PM, Matt Arcidy
wrote: On Mon, Apr 30, 2018 at 5:42 PM, Steven D'Aprano
wrote: (If we know that, let's say, really_long_descriptive_identifier_names hurt readability, how does that help us judge whether adding a new kind of expression will hurt or help readability?)
A new feature can remove symbols or add them. It can increase density on a line, or remove it. It can be a policy of variable naming, or it can specifically note that variable naming has no bearing on a new feature. This is not limited in application. It's just scoring. When anyone complains about readability, break out the scoring criteria and assess how good the _comparative_ readability claim is: 2 vs 10? 4 vs 5? The arguments will no longer be singularly about "readability," nor will the be about the question of single score for a specific statement. The comparative scores of applying the same function over two inputs gives a relative difference. This is what measures do in the mathematical sense.
Unfortunately, they kind of study they did here can't support this kind of argument at all; it's the wrong kind of design. (I'm totally in favor of being more evidence-based decisions about language design, but interpreting evidence is tricky!) Technically speaking, the issue is that this is an observational/correlational study, so you can't use it to infer causality. Or put another way: just because they found that unreadable code tended to have a high max variable length, doesn't mean that taking those variables and making them shorter would make the code more readable.
I think you are right about the study, but are tangential to what I am trying to say. I am not inferring causality when creating a measure. In the most tangible example, there is no inference that the euclidean measure _creates_ a distance, or that _anything_ creates a distance at all, it merely generates a number based on coordinates in space. That generation has specific properties which make it a measure, or a metric, what have you. The average/mean is another such object: a measure of central tendency or location. It does not infer causality, it is merely an algorithm by which things can be compared. Even misapplied, it provides a consistent ranking of one mean higher than another in an objective sense. Even if not a single person agrees that line length is a correct measure for an application, it is a measure. I can feed two lines into "len" and get consistent results out. This result will be the same value for all strings of length n, and for a string with length m
n, the measure will always report a higher measured value for the string of length m than the string of length n. This is straight out of measure theory, the results are a distance between the two objects, not a reason why.
The same goes for unique symbols. I can count the unique symbols in two lines, and state which is higher. This does not infer a causality, nor do _which_ symbols matter in this example, only that I can count them, and that if count_1 == count_2, the ranks are equal aka no distance between them, and if count_1 > count_2, count 1 is ranked higher. The cause of complexity can be a number of things, but stating a bunch of criteria to measure is not about inference. Measuring the temperature of a steak doesn't infer why people like it medium rare. It just quantifies it.
This sounds like a finicky technical complaint, but it's actually a *huge* issue in this kind of study. Maybe the reason long variable length was correlated with unreadability was that there was one project in their sample that had terrible style *and* super long variable names, so the two were correlated even though they might not otherwise be related. Maybe if you looked at Perl, then the worst coders would tend to be the ones who never ever used long variables names. Maybe long lines on their own are actually fine, but in this sample, the only people who used long lines were ones who didn't read the style guide, so their code is also less readable in other ways. (In fact they note that their features are highly correlated, so they can't tell which ones are driving the effect.) We just don't know.
Your points here are dead on. It's not like a single metric will be the deciding factor. Nor will a single rank end all disagreements. It's a tool. Consider line length 79, that's an explicit statement about readability, it's "hard coded" in the language. Disagreement with the value 79 or even the metric line-length doesn't mean it's not a measure. Length is the euclidean measure in one dimension. The measure will be a set of filters and metrics that combine to a value or set of values in a reliable way. It's not about any sense of correctness or even being better, that is, at a minimum, an interpretation.
And yeah, it doesn't help that they're only looking at 3 line blocks of code and asking random students to judge readability – hard to say how that generalizes to real code being read by working developers.
Respectfully, this is practical application and not a PhD defense, so it will be generated by practical coding. People can argue about the chosen metrics, but it is a more informative debate than just the label "readability". If 10 people state a change badly violates one criteria, perhaps that can be easily addressed. if many people make multiple claims based on many criteria, there is a real readability problem (assuming the metric survived SOME vetting of course)
-n
-- Nathaniel J. Smith -- https://vorpus.org