[Tutor] Long Lines techniques

Thu Dec 13 19:27:14 EST 2018

On Thu, Dec 13, 2018 at 12:36:27PM -0500, Avi Gross wrote:

> Simple question:
> 
> When lines get long, what points does splitting them make sense and what
> methods are preferred?

Good question!

First, some background:

Long lines are a potential code smell: a possible sign of excessively 
terse code. A long line may be a sign that you're doing too much in one 
line.

https://martinfowler.com/bliki/CodeSmell.html
http://wiki.c2.com/?CodeSmell
https://blog.codinghorror.com/code-smells/

Related: 
https://www.joelonsoftware.com/2005/05/11/making-wrong-code-look-wrong/

Note that merely splitting a logical line over two or more physical 
lines may still be a code-smell. Sure, your eyes don't get as tired 
reading fifteen lines of 50 characters each, compared to a single 750 
character line, but there's just as much processing going on in what is 
essentially a single operation.

Long lines are harder to read: your eyes have to scan across a long 
line, and beyond 60 or 70 characters, it becomes physically more 
difficult to scan across the line, and the error rate increases. 
[Citation required.]

But short lines don't include enough information, so the traditional 
compromise is 80 characters, the character width of the old-school 
green-screen terminals. The Python standard library uses 79 characters. 
(The odd number is to allow for scripts which count the newline at the 
end of the line as one of the 80.)

https://www.python.org/dev/peps/pep-0008/

Okay, so we have a style-guide that sets a maximum line length, whether 
it is 72 or 79 or 90 or 100 characters. What do you do when a line 
exceeds that length?

The only firm rule is that you must treat each case on its own merits. 
There is no one right or wrong answer. Every long line of code is 
different, and the solution will depend on the line itself. There is no 
getting away from human judgement.

(1) Long names. Do you really need to call the variable 
"number_of_characters" when "numchars" or even "n" would do?

The same applies to long function names: "get_data_from_database" is 
probably redundant, "get_data" will probably do.

Especially watch out for long dotted names that you use over and over 
again. Unlike static languages like Java, each dot represents a runtime 
lookup. Long names like:

    package.subpackage.module.object.method

requires four lookups. Look for oportunities to make an alias for a long 
name and avoid long chains of dots:

    for item in sequence:
        do_something_with(package.subpackage.module.object.method(arg, item))

can be refactored to:

    method = package.subpackage.module.object.method
    for item in sequence:
        do_something_with(method(arg, item))

and is both easier to read and more efficient. A double win!

(2) Temporary constants: sometimes it is good enough to just introduce a 
simple named constant used once. The cognitive load is low if it is 
defined immediately before it is used. Instead of the long line:

    raise ValueError("expected a list, string, dict or None, but instead got '%s'" % type(value).__name__)

I write:

    errmsg = "expected a list, string, dict or None, but instead got '%s'"
    raise ValueError(errmsg % type(value).__name__)

(3) Code refactoring. Maybe that long line is sign that you need to add 
a method or function? Especially if you are using that line, or similar, 
in multiple places. But refactoring is justified even if you use the 
line *once* if it is complicated enough.

Likewise, sometimes it is helpful to factor out separate sub-expressions 
onto their own lines, using their own variables, rather than doing 
everything in a single, complicated, expression.

Psychologists, educators and linguists call this "chunking", and it is 
often very helpful for simplifying complicated ideas, sentences and 
expressions.

The lack of chunks is why long Perl one-liners are so inpenetrable.

(4) Split the long logical line over multiple physical lines. This does 
nothing to reduce the inherent complexity of the line, but if that's 
fairly low to start with, it is often helpful.

Python gives us two ways to split a logical line over multiple physical 
lines: a backslash at the end of the line, and brackets of any sort.

The preferred way is to use round brackets for grouping:

    result = (some very long expression
              which can be split over
              many lines)

This is especially useful with function calls:

    result = function(first_argument, second_argument,
                      third_argument, fourth_argument)

If you are building a list or dict literal, there is no need for the 
parentheses, as square and curly brackets have the same effect. That's 
especially useful with two-dimensional nested lists:

    data = [[row, one, with, many, items],
            [row, two, with, many, items],
            [row, three, with, many, items]]

For long strings, I like to use *implicit string concatentation*. String 
literals which are separated by nothing except whitespace are 
concatenated at compile-time. So I can write a long string like this:

    long_string = ("this is a very long string which doesn't"
                   " fit on a single line but isn't appropriate"
                   " for a triple-quoted string")

Notice that I split the string at word breaks, and move the space to the 
beginning of the physical line rather than the end. I find that I'm less 
likely to forget the space if I put it at the start of the line rather 
than the end.

Not preferred, but allowed for backwards compatibility and still very 
occasionally useful, is to end the line with a bare backslash. I find it 
helpful in conjunction with triple quoted strings:

    text = """\
    body of the string
    is aligned
    including the first line
    """

but otherwise the backslash is problematic and error-prone. It must be 
*immediately* followed by a newline, if you accidentally add a space 
after the backslash it won't work.

And finally:

(5) Its just a style guide, not a law of physics. As Douglas Bader once 
said, "Rules are for the guidance of the wise and the obedience of 
fools." See also Raymond Hettinger's talk "Beyond PEP 8":

https://twitter.com/raymondh/status/589849947408703488

https://medium.com/@drb/pep-8-beautiful-code-and-the-tyranny-of-guidelines-f96499f5ac17

Better to go two or three characters beyond the maximum length than to 
make the code ugly.

[...]
> There are places you can break lines as in a comprehension such as this set
> comprehension:
> 
>     letter_set = { letter
>                    for word in (left_list + right_list)
>                    for letter in word }
> 
> The above is an example where I know I can break because the {} is holding
> it together. I know I can break at each "for" or "if" but can I break at
> random places?

Not quite random, you can't break in the middle of a word, but you 
can break between words.

[...]
> I will stop here with saying that unlike many languages, parentheses must be
> used with care in python as they may create a tuple or even generator
> expression.

But not by accident. You can't create a generator expression by accident 
by wrapping an arbitrary expression in round brackets, or turn a 
expression into a tuple. 

Remember, it isn't the parentheses which make tuples, its the commas. 
Except for the empty tuple special case, (), the parens are ALWAYS 
just there to either group the tuple so as to avoid ambiguity, or to 
visually emphasize that it is a tuple even if the interpreter doesn't 
need the hint.

-- 
Steve