[Tutor] More or less final Chutes & Ladders

Steven D'Aprano steve at pearwood.info
Sat Jan 4 14:03:27 CET 2014


On Sat, Jan 04, 2014 at 07:19:43AM -0500, Keith Winston wrote:

> I understand that Python doesn't have composite objects, but neither does
> it dislallow my list of lists of ints and lists... which is, I imagine,
> very space efficient.

I'm afraid I have no idea what you mean by Python not having "composite 
objects". What's a composite object?


> I think what you are in essence saying is that it's a
> mistake for me to worry about space at the expense of clarity...

What Denis and I are trying to say is that when you are using a 
high-level language like Python, you should aim to optimize programmer 
time and effort, not computer time and effort. There are applications 
where you have to optimize what the computer does, where every byte 
counts, where microseconds matter. Python is not the right programming 
language for those applications. Python is a language for when you have 
the luxury of tens of megabytes of memory, not kilobytes, when you don't 
care whether something takes a millisecond to calculate instead of a 
tenth of a millisecond.

(Actually, there are versions of Python for embedded devices, such as 
Android, where memory does matter -- at least, where every kilobyte 
counts.)

Everything is a tradeoff. In programming, one common set of tradeoffs is 
space (memory) versus time: use more memory to run faster, or save 
memory but run slower. Python generally trades off memory for speed. 
There is very little point in trying to save bytes, because behind the 
scenes Python is using and re-using memory like its going out of 
fashion, just to ensure that it can be as fast as possible.

py> import sys
py> sys.getsizeof(42)
14

Fourteen bytes just for a tiny little integer like that??? How wasteful! 
I remember when numbers like 42 would only require *two* bytes. Of 
course, back then the biggest number you could deal with was 65535, and 
a computer with 64K of memory was considered unbelievably luxurious. 
Meanwhile, Python let's me handle numbers with thousands of digits with 
ease:

py> n = 2**10000
py> len(str(n))
3011

Another tradeoff is between programmer effort (which equals time, and 
cost) versus speed. Programs written in C are typically between 
10 and 10000 times faster than the same program written in Python, but 
they typically take between 20 and 200 times longer and more effort to 
write. You should ask, is my program *fast enough*? rather than ask if 
it is fast. Often, Python is fast enough. When it's not, there are ways 
to make it faster.

So don't sweat the small stuff. If you ever have to write an operating 
system kernel or a graphics card driver, then you need care about 
optimizing every little thing. Until then, write the most natural code 
you can, and only if it actually is too slow should you worry about it.


> I'm not meaning to argue, but to understand. Especially in lieu of an
> upcoming project with, perhaps, larger and more complex structures. I am
> increasingly curious about whether namedtuples are a good strategy for some
> of this: they store their field names, as I understand it, and I can live
> with an immutable type in all these cases: I wonder if they are as
> efficient in named-field (key) lookup as dictionaries?

Pretty close to it. Not quite, but within a factor of about 3. Let's do 
some micro-benchmarks!


First, let's create some data objects with three fields, using three 
different techniques: a dict x, a regular class with named fields y, and 
a namedtuple z.

py> x = {'a': 23, 'b': 42, 'c': 57}
py> class Record:
...     def __init__(self, a, b, c):
...             self.a = a
...             self.b = b
...             self.c = c
...
py> y = Record(23, 42, 57)
py> from collections import namedtuple
py> recordnt = namedtuple('recordnt', 'a b c')
py> z = recordnt(23, 42, 57)


Now let's set up some timing code, where we extract all three fields in 
reverse order:

py> from timeit import Timer
py> setup = "from __main__ import x, y, z"
py> t1 = Timer("x['c'], x['b'], x['a']", setup)
py> t2 = Timer("y.c, y.b, y.a", setup)
py> t3 = Timer("z.c, z.b, z.a", setup)


And now let's time them:

py> min(t1.repeat())
0.2941127344965935
py> min(t2.repeat())
0.34186235070228577
py> min(t3.repeat())
0.7729006875306368


That's not too shabby. Even the slowest of the three (test t3, the one 
using the namedtuples) takes only 0.26 microseconds per field lookup. 
Times shown are seconds for one million repeats of the test code, or 
nanoseconds per single run. There are three field lookups per test, so 
0.77 nanoseconds/3 is about 0.26 nanoseconds.

If you ever find a piece of code where the difference between 0.1 ns and 
0.26 ns per field lookup is meaningful, I'd like to see it.


> I'm also a bit confused here: obviously tuples are immutable, but one can
> use lists in them... I think that makes those lists' contents immutable?

Nope. It makes the tuple immutable in the sense that it's *direct* 
contents cannot be changed, but mutable in the sense that the contents 
of the tuple can be mutated.

py> t = (1, 2, [])
py> t[2] = ["hello"]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
py> t[2].append("hello")
py> t
(1, 2, ['hello'])


> And could one define a namedtuple that included lists that were different
> lengths for different instantiations (like my game results, for example)? 

Naturally. The nametuple doesn't care what you put inside it.


-- 
Steven


More information about the Tutor mailing list