[Tutor] global variables/constants versus volatile variables/constants

Steven D'Aprano steve at pearwood.info
Fri Jun 13 15:08:40 CEST 2014


On Fri, Jun 13, 2014 at 05:10:28AM -0700, Albert-Jan Roskam wrote:

> The other day I used collections.namedtuple and I re-initialized 
> Record (see below) with every function*) call. Bad idea! It looks 
> nicer because I did not need a global (and globals are baaad, mkay?), 
> but it was *much* slower. I processed a log of a few million lines, I 
> think.
> 
> # bad --> time-consuming
> import collections
> 
> def do_something_with(raw_record):
>    Record = collections.namedtuple("_", " ".join("v%%03d" % i for i in range(100)))
>    return Record(*raw_record.split())

Look at how much work you do here. First, you create a long string of 
the form:

    "v000 v001 v002 v003 ... v099"

representing 1000 v-digits names. Then you create a brand new Record 
class that takes those 100 v-digits names as arguments. Creating that 
class requires building a string, parsing it as Python code, and then 
running it. (You're not expected to know that, but if you read the 
source code for namedtuple you will see that's how it works.) So 
creating that class is slow. Every time you call the function, it builds 
a new "v000 ... v099" string, from scratch, then builds a new class, 
also from scratch, and finally populates an instance of that class with 
100 values from the raw_record.

Only that last step needs to be done inside the function.


> # better --> even though it uses a global variable
> import collections
> 
> Record = collections.namedtuple("_", " ".join("v%%03d" % i for i in range(100)))

[Aside: you may find it easier to debug problems with this if you give 
the namedtuple class a sensible name, like "Record", rather than "_".]

How is that a global *variable*? It's a global name, "Record", but it is 
no more a "variable" than it would be if you did:

class Record(tuple):
    def __new__(cls, v000, v001, v002, ... , v099):
        # code goes here

    @property
    def v000(self):
        return self[0]

    # likewise for v001, v002, ... v099
    # plus additional methods


namedtuple is a factory function which creates a class. Buried deep 
within it is a class statement, just as if you had written the class 
yourself. Normally, when you create a class, you don't treat it as a 
variable, you treat it as a constant, like functions. That is no 
different from classes you create with the class keyword. So "global 
variables are bad" doesn't apply because it's not a variable.

Even if it were a variable, what really matters is not that it gets 
stored in the global scope, but whether or not it gets explicitly passed 
to functions as arguments, or implicitly modified secretly behind the 
scenes. For example:

# Not bad
data = ["put", "stuff", "here"]
process(data)
do_things_with(data)


# Bad, for various reasons
data = ["put", "stuff", "here"]
process()  # process what?
do_things_with()  # What are we doing things with?



In the first case, "data" may be stored in the global scope, but inside 
each function it is treated as a regular local variable. Let's contrast 
how one might operate on a second set of data in each case:

# Too easy
process(some_other_data)


# Ouch, this is painful
save_data = data
data = some_other_data
process()
data = save_data  # restore the previous value


Global variables aren't bad because Moses came down from the mountains 
with a stone tablet that declares that they are bad. They're bad because 
they cause excessive coupling, they operate by side-effect, they spoil 
idepotent code, and they are implicit instead of explicit.




> def do_something_with(raw_record):
>    return Record(*raw_record.split())

Much more sensible!



-- 
Steven


More information about the Tutor mailing list