[Tutor] What's going on in this Python code from Programming Collective Intelligence?
Dave Angel
davea at davea.name
Wed Jul 10 14:05:37 CEST 2013
On 07/06/2013 11:18 PM, bluepresley wrote:
> I'm reading the book Programming Collective Intelligence by Toby Segaran.
> I'm having a lot of difficulty understanding the some of the code from
> chapter four (the code for this chapter is available online at
> https://github.com/cataska/programming-collective-intelligence-code/blob/master/chapter4/searchengine.pyand
> starts at line172), specifically staring with this function:
>
> def getscoredlist(self,rows,wordids):
> totalscores=dict([(row[0],0) for row in rows])# This is where you'll
> later put the scoring functions
> weights=[]
>
> for (weight,scores) in weights:
> for url in totalscores:
> totalscores[url]+=weight*scores[url]
>
> return totalscores
>
> What does this mean?
> totalscores=dict([(row[0],0) for row in rows])
Two different things might be confusing you. It'd help to refactor it,
and explain which line isn't clear.
mylist = [(row[0],0) for row in rows]
totalscores = dict(mylist)
del mylist
I'll guess it's the list comprehension that confuses you. That line is
roughly equivalent to:
mylist = []
for row in rows:
mylist.append( (row[0], row) )
So you end up with a list of tuples, each consisting of row[0] and row
The call to dict simply processes that list and makes a dict out of it.
The first item in each tuple is the key, and the second item is the value.
>
> I have a lot of experience with PHP and Objective C. If anyone is familiar
> with PHP or another language could you please provide the equivalent? I
> think that would really help me understand better.
>
>
> The function just before that, getmatchingrows, provides the arguments for
> getscoredlist. "rows" is rows from a database query; "wordids" is a list of
> word ids searched for that generated rows result set. For that function
> (getmatchingrows) it returns 2 variables simultaneously. I'm unfamiliar
> with this. What's going on there?
I didn't download the link, so I'll just make up a function as an example:
def myfunc():
return 12, 42
That comma between the two values creates a tuple out of them. The
tuple is normally written (12, 42), and is a immutable version of the
list [12, 42]
If somebody calls the function like:
a = myfunc()
then a will be that same tuple.
But if somebody calls the function like:
c, d = myfunc()
then a symmetric trick called "tuple unpacking" goes into effect. When
you have a bunch of variables on the left side of an assignment, it'll
unpack whatever iterable is being passed.
So in this case, c will be 12, and d will be 42. This same syntax works
in any other context, so you can do:
a, b = b, a
to swap two values.
>
> Also, as far as I can tell from the getmatchingrows code, it returns a
> multidimensional array of database results with row[0] being the urlid (NOT
> the url), and other indices correspond to the word id location.
>
> In getscoredlist, totalscores[url] doesn't make sense. Where is [url]
> coming from? could they have meant to say urlid here?
>
> This chapter is also available online for free from O'reilly. Here is the
> page that talks specifically about this part.
>
> Any help understanding what this code in this part of this book is doing
> would be greatly appreciated.
>
> Thanks,
> Blue
>
> http://my.safaribooksonline.com/book/web-development/9780596529321/4dot-searching-and-ranking/querying#X2ludGVybmFsX0h0bWxWaWV3P3htbGlkPTk3ODA1OTY1MjkzMjElMkZjb250ZW50YmFzZWRfcmFua2luZyZxdWVyeT0=
>
>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
--
DaveA
More information about the Tutor
mailing list