[Tutor] More or less final Chutes & Ladders
Steven D'Aprano
steve at pearwood.info
Sat Jan 4 14:03:27 CET 2014
On Sat, Jan 04, 2014 at 07:19:43AM -0500, Keith Winston wrote:
> I understand that Python doesn't have composite objects, but neither does
> it dislallow my list of lists of ints and lists... which is, I imagine,
> very space efficient.
I'm afraid I have no idea what you mean by Python not having "composite
objects". What's a composite object?
> I think what you are in essence saying is that it's a
> mistake for me to worry about space at the expense of clarity...
What Denis and I are trying to say is that when you are using a
high-level language like Python, you should aim to optimize programmer
time and effort, not computer time and effort. There are applications
where you have to optimize what the computer does, where every byte
counts, where microseconds matter. Python is not the right programming
language for those applications. Python is a language for when you have
the luxury of tens of megabytes of memory, not kilobytes, when you don't
care whether something takes a millisecond to calculate instead of a
tenth of a millisecond.
(Actually, there are versions of Python for embedded devices, such as
Android, where memory does matter -- at least, where every kilobyte
counts.)
Everything is a tradeoff. In programming, one common set of tradeoffs is
space (memory) versus time: use more memory to run faster, or save
memory but run slower. Python generally trades off memory for speed.
There is very little point in trying to save bytes, because behind the
scenes Python is using and re-using memory like its going out of
fashion, just to ensure that it can be as fast as possible.
py> import sys
py> sys.getsizeof(42)
14
Fourteen bytes just for a tiny little integer like that??? How wasteful!
I remember when numbers like 42 would only require *two* bytes. Of
course, back then the biggest number you could deal with was 65535, and
a computer with 64K of memory was considered unbelievably luxurious.
Meanwhile, Python let's me handle numbers with thousands of digits with
ease:
py> n = 2**10000
py> len(str(n))
3011
Another tradeoff is between programmer effort (which equals time, and
cost) versus speed. Programs written in C are typically between
10 and 10000 times faster than the same program written in Python, but
they typically take between 20 and 200 times longer and more effort to
write. You should ask, is my program *fast enough*? rather than ask if
it is fast. Often, Python is fast enough. When it's not, there are ways
to make it faster.
So don't sweat the small stuff. If you ever have to write an operating
system kernel or a graphics card driver, then you need care about
optimizing every little thing. Until then, write the most natural code
you can, and only if it actually is too slow should you worry about it.
> I'm not meaning to argue, but to understand. Especially in lieu of an
> upcoming project with, perhaps, larger and more complex structures. I am
> increasingly curious about whether namedtuples are a good strategy for some
> of this: they store their field names, as I understand it, and I can live
> with an immutable type in all these cases: I wonder if they are as
> efficient in named-field (key) lookup as dictionaries?
Pretty close to it. Not quite, but within a factor of about 3. Let's do
some micro-benchmarks!
First, let's create some data objects with three fields, using three
different techniques: a dict x, a regular class with named fields y, and
a namedtuple z.
py> x = {'a': 23, 'b': 42, 'c': 57}
py> class Record:
... def __init__(self, a, b, c):
... self.a = a
... self.b = b
... self.c = c
...
py> y = Record(23, 42, 57)
py> from collections import namedtuple
py> recordnt = namedtuple('recordnt', 'a b c')
py> z = recordnt(23, 42, 57)
Now let's set up some timing code, where we extract all three fields in
reverse order:
py> from timeit import Timer
py> setup = "from __main__ import x, y, z"
py> t1 = Timer("x['c'], x['b'], x['a']", setup)
py> t2 = Timer("y.c, y.b, y.a", setup)
py> t3 = Timer("z.c, z.b, z.a", setup)
And now let's time them:
py> min(t1.repeat())
0.2941127344965935
py> min(t2.repeat())
0.34186235070228577
py> min(t3.repeat())
0.7729006875306368
That's not too shabby. Even the slowest of the three (test t3, the one
using the namedtuples) takes only 0.26 microseconds per field lookup.
Times shown are seconds for one million repeats of the test code, or
nanoseconds per single run. There are three field lookups per test, so
0.77 nanoseconds/3 is about 0.26 nanoseconds.
If you ever find a piece of code where the difference between 0.1 ns and
0.26 ns per field lookup is meaningful, I'd like to see it.
> I'm also a bit confused here: obviously tuples are immutable, but one can
> use lists in them... I think that makes those lists' contents immutable?
Nope. It makes the tuple immutable in the sense that it's *direct*
contents cannot be changed, but mutable in the sense that the contents
of the tuple can be mutated.
py> t = (1, 2, [])
py> t[2] = ["hello"]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
py> t[2].append("hello")
py> t
(1, 2, ['hello'])
> And could one define a namedtuple that included lists that were different
> lengths for different instantiations (like my game results, for example)?
Naturally. The nametuple doesn't care what you put inside it.
--
Steven
More information about the Tutor
mailing list