[Tutor] Python Variables Changing in Other Functions

Wed May 25 14:16:22 CEST 2011

On Tue, May 24, 2011 at 11:14 PM, Rachel-Mikel ArceJaeger <
arcejaeger at gmail.com> wrote:

> Hello,
>
> I am having trouble with determining when python is passing by reference
> and by value and how to fix it to do what I want:
>

Andre already mentioned that you shouldn't think of Python as
'pass-by-reference' or 'pass-by-value', but this link should help you
understand why not: http://effbot.org/zone/call-by-object.htm

> The issue is this: random.shuffle() mutates the list in place, rather than
> creating a new copy. This is fine, but rather than modifying just the local
> copy of my titles, it is modifying it in the other functions, too. For
> instance, rankRandom() is called by main(), which passes it listOfTitles.
> When rankRandom() returns, listOfTitles has been changed to the randomized
> version of titles.
>
> To fix this, I tried copying the original title list and then assigning it
> to the mutated version right before the rankRandom() function returns. The
> local version of titles in rankRandom() does indeed regain its original
> value, but listOfTitles in main() is still being assigned to the randomized
> version, and not to the original version. This boggles me, since it seems
> like shuffle is mutating the titles as if it were a global variable, but
> assignment is treating it only as a local.
>
> What exactly is going on here, and how do I avoid this problem?
>

Well, you also have some non-idiomatic code, so I'll go through and comment
on the whole thing

> ----------------------------
>
> import xlwt as excel
> import random
> import copy
>
> def getTitleList():
>    """ Makes a list of all the lines in a file """
>
>    filename = raw_input("Name and Extension of File: ")
>
>    myFile = open( filename )
>
>    titles = []
>

You can eliminate from here...

>    title = "none"
>
>    while title != "":
>
>        title = myFile.readline()
>

to here, by iterating over your file object directly. The for loop in Python
is quite powerful, and in this case you can simply say:

for title in myFile:

This is easier to read, especially by other Python programmers. You could
call it a design pattern - when you want to read  a file line by line you
can say:

for line in filehandle:
    # do something with line

>
>        if title not in ["\n",""]:
>            titles.append(title)
>
>    return titles
>

It also turns out that you can replace almost this entire function with a
list comprehension. Because doing this sort of thing:

collection = []
for item in another_collection:
    if item == criteria:
        collection.append(item)

is so common, there is a shortcut - you basically put the loop inside the []
braces and you get the same result:

collection = [item for item in another_collection if item == criteria]

The syntax is much more terse, but once you understand what's happening,
they are much cleaner to both read and write (IMO - some people disagree)

So for your program you could say:

titles = [title for title in myFile if title not in ["\n",""]]

Much shorter than your original function!

>
> def rank( titles ):
>    """ Gets a user-input ranking for each line of text.
>        Returns those rankings
>    """
>
>    ranks = []
>
>    for t in titles:
>
>        rank = raw_input(t + " ")
>        ranks.append(rank)
>

What happens if someone inputs "Crunchy Frog" for their rank? What would you
expect to happen?

>
>    return ranks
>
> def rankRandom( titles ):
>    """ Takes a list of titles, puts them in random order, gets ranks, and
> then
>        returns the ranks to their original order (so that the rankings
> always
>        match the correct titles).
>    """
>
>    finalRanks = [0]*len(titles)
>
>    origTitles = copy.copy(titles)
>    #print "Orign: ", origTitles
>
>    random.shuffle(titles)      # Shuffle works in-place
>    ranks = rank(titles)
>
>    i = 0
>    for t in titles:
>
>        finalRanks[ origTitles.index(t) ] = ranks[i]
>        i += 1
>

This code looks clunky. And usually when code looks clunky, that means
there's a better way to do it!

In your case, I would strongly recommend using a dictionary. As a matter of
fact, your problem just begs to use a dictionary, which is a collection of
related items, such as a title and a rank. So in your case, you could have
something like:

finalRanks = {}
for title, rank  in zip(titles, ranks):           # you really should avoid
1-letter variables, except maybe in the case of axes in a graph
    finalRanks[title] = rank

Which would give you something like:

{'Gone With the Wind': 2, 'Dune': 1, 'Harold and the Purple Crayon': 3}

>
>    titles = origTitles # Must restore, since python passes by reference,
> not
>                        # value, and the original structure was changed by
>                        # shuffle
>    #print "t: ", titles
>
>    return finalRanks
>
> def writeToExcel(titles, allRanks):
>
>    # Open new workbook
>    mydoc = excel.Workbook()
>
>    # Add a worksheet
>    mysheet = mydoc.add_sheet("Ranks")
>
>    # Write headers
>    header_font = excel.Font() # Make a font object
>    header_font.bold = True
>    header_font.underline = True
>
>    # Header font needs to be style, actually
>    header_style = excel.XFStyle(); header_style.font = header_font
>

You should probably break that up onto two lines. You don't gain anything by
keeping it on one line, but you lose a lot of readability.

>
>    # Write Headers: write( row, col, data, style )
>    row = 0
>    col = 0
>    for t in titles:
>        # Write data. Indexing is zero based, row then column
>        mysheet.write(row, col, t, header_style)
>        col += 1
>
>    # Write Data
>    row += 1
>    for ranks in allRanks:
>        col = 0
>        for r in ranks:
>            mysheet.write(row, col, r)
>            col += 1
>        row += 1
>

Remember the part about looking clunky? Having lots of += hanging around is
a perfect example of a code smell (i.e. something in this code stinks, and
we should change it). Part of being a good programmer is learning to
recognize those bad smells and getting rid of them. Turns out, Python has a
lot of nice built-in functions for the elimination of code smells.  In this
case, it's the enumerate function:

>>> help(enumerate)
Help on class enumerate in module __builtin__:

class enumerate(object)
 |  enumerate(iterable[, start]) -> iterator for index, value of iterable
 |
 |  Return an enumerate object.  iterable must be another object that
supports
 |  iteration.  The enumerate object yields pairs containing a count (from
 |  start, which defaults to zero) and a value yielded by the iterable
argument.
 |  enumerate is useful for obtaining an indexed list:
 |      (0, seq[0]), (1, seq[1]), (2, seq[2]), ...
. . .

for row, ranks in enumerate(allRanks, row+1): # Since apparently you're
starting from row + 1
    for col, rank in enumerate(ranks):                    # Again eliminate
those 1-letter variables
        mysheet.write(row, col, rank)

Three lines to replace 7. And it's much clearer what's going on here, even
if you don't know exactly what enumerate does - because from the loop
variables (row, ranks) we can infer that it does something useful, like
produce the row and ranks. And since we know there's lots of useful
documentation both through using the help() function, and through the web,
we can easily look up "python enumerate" or help(enumerate) and find all
sorts of useful information and examples.

   # Save file. You don't have to close it like you do with a file object
>    mydoc.save("r.xls")
>
> def main():
>
>    listOfTitles = getTitleList()
>
>    allRanks = []
>
>    done = raw_input("Done?: ")
>

It's usually a good idea to give users an indication of what sort of values
you'll accept, such as "Done? (y/[n]): " - this tells the user that you're
expecting a y or an n (technically this is a lie, since anything not 'y'
will continue, but most people won't care so much. They just need to know
that entering 'yes' will not finish the program like they expect!)

>
>    while done != "y":
>        allRanks.append( rankRandom( listOfTitles ) )
>        #print listOfTitles
>        done = raw_input("Done?: ")
>

And here is another opportunity to look for code smells. Notice one?

Any time you have to write identical code twice, that's probably a bad
thing. In this case, you have a couple of options. First, you could simply
initialize done this way:

done = ''
while done.lower() != 'y':   # Let the user input 'Y' as well
    # do stuff
    done = raw_input("Done? (y/[n]): ")

>    writeToExcel(listOfTitles, allRanks )
>
>
> if __name__ == "__main__" : main()
>

Again, here you don't gain anything by keeping main() on the same line, but
you lose some readability.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110525/b8138602/attachment-0001.html>