On Mon, Apr 30, 2018 at 5:42 PM, Steven D'Aprano
On Mon, Apr 30, 2018 at 11:28:17AM -0700, Matt Arcidy wrote:
A study has been done regarding readability in code which may serve as insight into this issue. Please see page 8, fig 9 for a nice chart of the results, note the negative/positive coloring of the correlations, grey/black respectively.
Indeed. It seems that nearly nothing is positively correlated to increased readability, aside from comments, blank lines, and (very weakly) arithmetic operators. Everything else hurts readability.
The conclusion here is that if you want readable source code, you should remove the source code. *wink*
https://web.eecs.umich.edu/~weimerw/p/weimer-tse2010-readability-preprint.pd...
The criteria in the paper can be applied to assess an increase or decrease in readability between current and proposed changes. Perhaps even an automated tool could be implemented based on agreed upon criteria.
That's a really nice study, and thank you for posting it. There are some interested observations here, e.g.:
- line length is negatively correlated with readability;
(a point against those who insist that 79 character line limits are irrelevant since we have wide screens now)
- conventional measures of complexity do not correlate well with readability;
- length of identifiers was strongly negatively correlated with readability: long, descriptive identifier names hurt readability while short variable names appeared to make no difference;
(going against the common wisdom that one character names hurt readability -- maybe mathematicians got it right after all?)
- people are not good judges of readability;
but I think the practical relevance here is very slim. Aside from questions about the validity of the study (it is only one study, can the results be replicated, do they generalise beyond the narrowly self- selected set of university students they tested?) I don't think that it gives us much guidance here. For example:
I don't propose to replicate correlations. I don't see these "standard" terminal conclusions as forgone when looking at the idea as a whole, as opposed to the paper itself, which they may be. The authors crafted a method and used that method to do a study, I like the method. I think I can agree with your point about the study without validating or invalidating the method.
1. The study is based on Java, not Python.
An objective measure can be created, based or not on the paper's parameters, but it clearly would need to be adjusted to a specific language, good point. Here "objective" does not mean "with absolute correctness" but "applied the same way such that a 5 is always a 5, and a 5 is always greater than 4." I think I unfortunately presented the paper as "The Answer" in my initial email, but I didn't intend to say "each detail must be implemented as is" but more like "this is a thing which can be done." Poor job on my part.
2. It looks at a set of pre-existing source code features.
3. It gives us little or no help in deciding whether new syntax will or won't affect readability: the problem of *extrapolation* remains.
(If we know that, let's say, really_long_descriptive_identifier_names hurt readability, how does that help us judge whether adding a new kind of expression will hurt or help readability?)
A new feature can remove symbols or add them. It can increase density on a line, or remove it. It can be a policy of variable naming, or it can specifically note that variable naming has no bearing on a new feature. This is not limited in application. It's just scoring. When anyone complains about readability, break out the scoring criteria and assess how good the _comparative_ readability claim is: 2 vs 10? 4 vs 5? The arguments will no longer be singularly about "readability," nor will the be about the question of single score for a specific statement. The comparative scores of applying the same function over two inputs gives a relative difference. This is what measures do in the mathematical sense. Maybe the "readability" debate then shifts to arguing criteria: "79? Too long in your opinion!" A measure will at least break "readability" up and give some structure to that argument. Right now "readability" comes up and starts a semi-polite flame war. Creating _any_ criteria will help narrow the scope of the argument. Even when someone writes perfectly logical statements about it, the statements can always be dismantled because it's based in opinion. By creating a measure, objectivity is forced. While each criterion is less or more subjective, the measure will be applied objectively to each instance, the same way, to get a score.
4. The authors themselves warn that it is descriptive, not prescriptive, for example replacing long identifier names with randomly selected two character names is unlikely to be helpful.
Of course, which is why it's a score, not a single criterion. For example, if you hit the Shannon limit, no one will be able to read it anyways. "shorter is better" doesn't mean "shortest is best".
5. The unfamiliarity affect: any unfamiliar syntax is going to be less readable than a corresponding familiar syntax.
Definitely, let me respond specifically, but as an example of how to apply a measure flexibly. A criterion can be turned on/off based on the target of the new feature. Do you want beginners to understand this? Is this for core developers? If there exists one measure, another can be created by adding/subtracting criteria. I'm not saying do it, I'm saying it can be done. It's a matter of conditioning, like a marginal distribution. Core developers seem fairly indifferent to symbolic density on a line, but many are concerned about beginners. Heck, run both measures and see how dramatically the numbers change.
It's a great start to the scientific study of readability, but I don't think it gives us any guidance with respect to adding new features.
Opinions about readability can be shifted from: - "Is it more or less readable?" to - "This change exceeds a tolerance for levels of readability given the scope of the change."
One unreplicated(?) study for readability of Java snippets does not give us a metric for predicting the readability of new Python syntax. While it would certainly be useful to study the possibly impact of adding new features to a language, the authors themselves state that this study is just "a framework for conducting such experiments".
It's example of a measure. I presented it poorly, but even poor presentation should not prevent acknowledging that objective measures exist today. "In english" is a good one for me for sure, I'm barely monolingual. Perhaps agreement on the criteria will be attritive, perhaps impossible, but the "pure opinion" argument is definitely not true. This should be clearly noted, specifically because there is _so much upon which is already agreed_, which is a real tragedy here. Even as information theory is useful in this pursuit, it cannot be applied to limit, or we'd be trying to read/write bzip hex files. I think what you have mentioned enhances the point that rules exist, and your points can be formalized to rules and then incorporated into a model. Using your names example again, single letter names are very not meaningful, and 5 random alphanumerics are no better, perhaps even less so if 'e' is Exception and now it's 'et82c'. However, 5 letters that trigger a spell checker to propose the correct concept pointed to by the name, clearly has _value_ over the random 5 alphanumerics, i.e. a Hamming distance type measure would measure this improvement perfectly. As for predictability, every possible statement has a score that just needs to be computed by the measure, measures are not predictive. Running a string of symbols will result in a number based on patterns, as it's not a semantic analysis. If the scoring is garbage, the measure is garbage and needs to be redone, but it's not because it fails at predicting. Each symbol string has a score, precisely as two points have a distance in the euclidean measure. In lieu of a wide statistical study, assumptions will be made, argued, set, used, argued again, etc. This is life. But the rhetoric of "readability; there for my opinion is right or your statement is an opinion" will be tempered. A criteria can be theorized or accepted as a de-facto tool. Or not. But it can exist.
Despite the limitations of the study, it was an interesting read, thank you for posting it.
I think I presented the paper as "The answer" as opposed to "this is an approach." I agree that some, perhaps many, of the paper specifics are wholly irrelevant. Crafting a meaningful measure of readability is do-able, however. Obtaining agreement is still hard, and perhaps unfortunately impossible, but a measure exists. I supposed I'll build something and see where it goes, oddly enough, I am very enamored with my own idea! I appreciate your feedback and will incorporate it, and if you have any more, I am interested to hear it.
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/