[Tutor] What's going on in this Python code from Programming Collective Intelligence?

Dave Angel davea at davea.name
Wed Jul 10 14:05:37 CEST 2013


On 07/06/2013 11:18 PM, bluepresley wrote:
> I'm reading the book Programming Collective Intelligence by Toby Segaran.
> I'm having a lot of difficulty understanding the some of the code from
> chapter four (the code for this chapter is available online at
> https://github.com/cataska/programming-collective-intelligence-code/blob/master/chapter4/searchengine.pyand
> starts at line172), specifically staring with this function:
>
> def getscoredlist(self,rows,wordids):
>      totalscores=dict([(row[0],0) for row in rows])# This is where you'll
> later put the scoring functions
>      weights=[]
>
>      for (weight,scores) in weights:
>        for url in totalscores:
>          totalscores[url]+=weight*scores[url]
>
>      return totalscores
>
> What does this mean?
> totalscores=dict([(row[0],0) for row in rows])

Two different things might be confusing you.  It'd help to refactor it, 
and explain which line isn't clear.

mylist = [(row[0],0) for row in rows]
totalscores = dict(mylist)
del mylist

I'll guess it's the list comprehension that confuses you.  That line is 
roughly equivalent to:
mylist = []
for row in rows:
     mylist.append( (row[0], row) )

So you end up with a list of tuples, each consisting of row[0] and row

The call to dict simply processes that list and makes a dict out of it. 
  The first item in each tuple is the key, and the second item is the value.

>
> I have a lot of experience with PHP and Objective C.  If anyone is familiar
> with PHP or another language could you please provide the equivalent? I
> think that would really help me understand better.
>
>
> The function just before that, getmatchingrows, provides the arguments for
> getscoredlist. "rows" is rows from a database query; "wordids" is a list of
> word ids searched for that generated rows result set. For that function
> (getmatchingrows) it returns 2 variables simultaneously. I'm unfamiliar
> with this. What's going on there?

I didn't download the link, so I'll just make up a function as an example:

def myfunc():
     return 12, 42

That comma between the two values creates a tuple out of them.  The 
tuple is normally written  (12, 42), and is a immutable version of the 
list [12, 42]

If somebody calls the function like:
     a = myfunc()

then a will be that same tuple.

But if somebody calls the function like:
      c, d = myfunc()

then a symmetric trick called "tuple unpacking" goes into effect.  When 
you have a bunch of variables on the left side of an assignment, it'll 
unpack whatever iterable is being passed.

So in this case, c will be 12, and d will be 42.  This same syntax works 
in any other context, so you can do:

      a, b = b, a

to swap two values.

>
> Also, as far as I can tell from the getmatchingrows code, it returns a
> multidimensional array of database results with row[0] being the urlid (NOT
> the url), and other indices correspond to the word id location.
>
> In getscoredlist, totalscores[url] doesn't make sense. Where is [url]
> coming from? could they have meant to say urlid here?
>
> This chapter is also available online for free from O'reilly.  Here is the
> page that talks specifically about this part.
>
> Any help understanding what this code in this part of this book is doing
> would be greatly appreciated.
>
> Thanks,
> Blue
>
> http://my.safaribooksonline.com/book/web-development/9780596529321/4dot-searching-and-ranking/querying#X2ludGVybmFsX0h0bWxWaWV3P3htbGlkPTk3ODA1OTY1MjkzMjElMkZjb250ZW50YmFzZWRfcmFua2luZyZxdWVyeT0=
>
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>


-- 
DaveA



More information about the Tutor mailing list