Best practices to overcome python's dynamic data type nature

Chris Angelico rosuav at gmail.com
Fri Feb 14 20:42:41 CET 2014


On Sat, Feb 15, 2014 at 3:54 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Sam <lightaiyee at gmail.com>:
>
>> Dynamic data type has pros and cons. It is easier to program but also
>> easier to create bugs. What are the best practices to reduce bugs
>> caused by Python's dynamic data-type characteristic? Can the
>> experienced Python programmers here advise?
>
> Here's some advice from a very experienced programmer: become a very
> experienced programmer.

Definitely.

> I like Java a lot, but boy does the boilerplate get in your way. When
> you start writing a feature, you have to produce 2000 lines of code
> *before writing a single statement*! That's why experienced Java
> programmers tend to resort to code generators.

Also this. It's been shown that, for any given programmer, the number
of bugs per thousand lines of code is approximately stable. This means
that, all other things being equal, a more expressive language will
help you write less buggy code. When you have to write 2000 lines of
boilerplate in a 200 line program, that's 2200 lines that might
potentially be wrong; if, instead, you use a language that requires
just 20 lines of boilerplate for the same number of lines of your
code, that gives you one tenth the total program size and, in very
rough figures, probably about a tenth the number of bugs.

It gets even better than that, though. The more expressive the
language, the easier it is to notice bugs when they do occur. Tell me,
can you see a couple of bugs in this code?

make_combined_color:
push bp
mov bp,sp
mov bx,[bp+8]
cmp bx,16
ja .err
lea ax,[color_data+bx*4+8]
pop bp
ret 8
.err:
; Handle error by returning 0
mov ax,0
pop bp
ret 8

color_data dw 110000, 001011
dw  000000h, 00007Fh, 007F00h, 007F7Fh, 7F0000h, 7F007Fh, 7F7F00h, C0C0C0h
dw 7F7F7Fh, 0000FFh, 00FF00h, 00FFFFh, FF0000h, FF00FFh, FFFF00h, FFFFFFh

No? Isn't it obvious? Hmm, okay. Well, here's a C version:

int color_data[] =
{0x000000,0x00007F,0x007F00,0x007F7F,0x7F0000,0x7F007F,0x7F7F00,0xC0C0C0,
0x7F7F7F,0x0000FF,0x00FF00,0x00FFFF,0xFF0000,0xFF00FF,0xFFFF00,0xFFFFFF};

int make_combined_color(int fg, int bg)
{
    if (fg>ARRAY_SIZE(color_data)) return 0;
    return color_data[fg];
}

If you know C, you should be able to spot a couple of errors here.
(Assume that ARRAY_SIZE gives you the number of elements in an array.
It's an easy enough macro to define, using sizeof.) Now here's the
high level equivalent:

array color_data=({0x000000,0x00007F,0x007F00,0x007F7F,0x7F0000,0x7F007F,0x7F7F00,0xC0C0C0,
0x7F7F7F,0x0000FF,0x00FF00,0x00FFFF,0xFF0000,0xFF00FF,0xFFFF00,0xFFFFFF});

int make_combined_color(int fg, int bg)
{
    return color_data[fg];
}

One of the bugs doesn't even exist now! It's a off-by-one error in
array bounds checking. With the high level code, I'm not checking my
own array bounds, so it's impossible for me to get that wrong. As to
the other bug, it's now patently obvious that it's taking two
arguments and ignoring one of them. That's not necessarily a problem,
but the name make_combined_color suggests that it should be, well,
combining something.

Every line of code you write could potentially have a bug in it. If
you can write less code to achieve your goals, then - all other things
being equal, which of course they never quite are - you'll generally
have less bugs. And of course, when you have something that looks
weird but is intentional (maybe the above function is _supposed_ to
ignore its bg argument, for some reason), you use a code comment.
Here's one of my favourites, just for its brevity:

gc->set_foreground(bg); //(sic)

It's clear from the context this code is in that it's correct to set
the display's foreground color to bg, but since it looks wrong at
first glance, it merits a comment :)

So use a language that lets you say things succinctly and readably
(sorry APL). You'll still make bugs, but you'll be able to spot them
in subsequent editing.

Also: Use source control. Get familiar with a DVCS (I usually
recommend either git or hg for all new projects) and get used to
checking back on the origin of the code. This gives you two benefits:
Firstly, you can know exactly what was written when and why, which
helps hugely when you're trying to figure out whether something's
correct or not. And secondly - more subtly but perhaps more
importantly - it frees you from the need to write all that sort of
thing in code comments. You don't need to retain the history of a
block of code by commenting out the failed attempts; go back to source
control for that. You don't need to predict in advance which bits of
code you'll, in six months time, wonder about. (I guarantee you'll
predict wrong.) When you come to something that seems odd, you look it
up, and *then* add the comments. Consider it a YAGNI policy for
verbiage, if you like. Saves you a huge amount of trouble, and keeps
your source code lean and clean - which, see above, will tend to
reduce your bug count.

Huh. My primary point is "keep your code as short as possible"... and
look how long this post is. That's a textbook example of irony, right
there...

ChrisA



More information about the Python-list mailing list