[Chicago] Chicago Digest, Vol 123, Issue 8

Len Wanger len_wanger at hotmail.com
Tue Nov 10 15:15:39 EST 2015


Doing it in the dunder function __next__ makes it so you can call it with the next keyword or as an iterator. You have to watch out for infinite loops in your case, but it's nice. E.G.

c_list = circular_list( ('a', 'b', 'c') )
for i in range(10):
    print( next(c_list) )

-or-

for i in circular_list( ('a', 'b', 'c') ):
    # warning: I'm in an infinite loop!
   print(i)

Note: This is also already in the standard library. Look at cycle in the itertools module.

Note: One more note. Be careful to use an immutable sequence (like a tuple) instead of a list in your call to circular_list or you'll open yourself up to nasty side effects and a long night of debugging!

Len

> From: chicago-request at python.org
> Subject: Chicago Digest, Vol 123, Issue 8
> To: chicago at python.org
> Date: Tue, 10 Nov 2015 12:26:05 -0500
> 
> Send Chicago mailing list submissions to
> 	chicago at python.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://mail.python.org/mailman/listinfo/chicago
> or, via email, send a message with subject or body 'help' to
> 	chicago-request at python.org
> 
> You can reach the person managing the list at
> 	chicago-owner at python.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Chicago digest..."
> 
> 
> Today's Topics:
> 
>    1. Re: Can this be done with a yield statement and generator
>       object? (Lewit, Douglas)
>    2. Re: Chicago Digest, Vol 123, Issue 6 (Lewit, Douglas)
>    3. Re: Can this be done with a yield statement and generator
>       object? (Tim Ottinger)
>    4. Re: Need advice on this project. (Lewit, Douglas)
>    5. Re: Can this be done with a yield statement and generator
>       object? (Lewit, Douglas)
>    6. Re: Need advice on this project. (Lewit, Douglas)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Tue, 10 Nov 2015 11:09:27 -0600
> From: "Lewit, Douglas" <d-lewit at neiu.edu>
> To: The Chicago Python Users Group <chicago at python.org>
> Subject: Re: [Chicago] Can this be done with a yield statement and
> 	generator object?
> Message-ID:
> 	<CAPdZZGwCMRKdwnBhYFWuk2zRBrEDqNf3ezb3givGyVinD9OFrw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> That's very logical, Will, thanks!   :-)
> 
> On Tue, Nov 10, 2015 at 9:02 AM, William E. S. Clemens <wesclemens at gmail.com
> > wrote:
> 
> > I didn't test, but something like this should do the same.
> >
> > def circular_list(array):
> >       while True:
> >             counter = -1
> >             if counter == len(array) - 1:
> >                   counter = -1
> >             counter+=1
> >             yield array[counter]
> >
> >
> > On Tue, Nov 10, 2015 at 2:10 AM, Lewit, Douglas <d-lewit at neiu.edu> wrote:
> >
> >> Hey guys,
> >>
> >> I'm attaching a simple class that I created in Python.... Python 3 to be
> >> specific, but I think it should work in Python 2 as well, maybe.  Anyhow,
> >> is there a way to implement the same concept using a *yield statement*
> >> in a function to create a generator object?  Just wondering.  Let me know,
> >> thanks!
> >>
> >> Best,
> >>
> >> Douglas Lewit
> >>
> >> P.S.  Obviously if you use a generator object to do this then the
> >> generator object would never produce the StopIteration error.  But I'm kind
> >> of confused about how to create and define a generator object that would
> >> produce this cyclical behavior in an array or list.
> >>
> >>
> >>
> >> _______________________________________________
> >> Chicago mailing list
> >> Chicago at python.org
> >> https://mail.python.org/mailman/listinfo/chicago
> >>
> >>
> >
> >
> > --
> > William Clemens
> > Phone: 847.485.9455
> > E-mail: wesclemens at gmail.com
> >
> > _______________________________________________
> > Chicago mailing list
> > Chicago at python.org
> > https://mail.python.org/mailman/listinfo/chicago
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/chicago/attachments/20151110/7cf9f177/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 2
> Date: Tue, 10 Nov 2015 11:11:30 -0600
> From: "Lewit, Douglas" <d-lewit at neiu.edu>
> To: The Chicago Python Users Group <chicago at python.org>
> Subject: Re: [Chicago] Chicago Digest, Vol 123, Issue 6
> Message-ID:
> 	<CAPdZZGyfPuVVbBT9dWFM-3XgPqjMpF7o_1fyrR8+Ydcb+dq2ig at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Thanks Len, pretty simple and straightforward!  I'm still getting the feel
> for the "yield" statement.
> 
> On Tue, Nov 10, 2015 at 9:51 AM, Len Wanger <len_wanger at hotmail.com> wrote:
> 
> > Try changing your next routine to this:
> >
> > def __next__(self):
> >       while True:
> >             if self.counter == len(self.array) - 1:
> >                   self.counter = -1
> >             self.counter+=1
> >             yield self.array[self.counter]
> >
> >
> > Len
> >
> > > From: chicago-request at python.org
> > > Subject: Chicago Digest, Vol 123, Issue 6
> > > To: chicago at python.org
> > > Date: Tue, 10 Nov 2015 09:54:39 -0500
> > >
> > > Send Chicago mailing list submissions to
> > > chicago at python.org
> > >
> > > To subscribe or unsubscribe via the World Wide Web, visit
> > > https://mail.python.org/mailman/listinfo/chicago
> > > or, via email, send a message with subject or body 'help' to
> > > chicago-request at python.org
> > >
> > > You can reach the person managing the list at
> > > chicago-owner at python.org
> > >
> > > When replying, please edit your Subject line so it is more specific
> > > than "Re: Contents of Chicago digest..."
> > >
> > >
> > > Today's Topics:
> > >
> > > 1. Can this be done with a yield statement and generator object?
> > > (Lewit, Douglas)
> > >
> > >
> > > ----------------------------------------------------------------------
> > >
> > > Message: 1
> > > Date: Tue, 10 Nov 2015 02:10:12 -0600
> > > From: "Lewit, Douglas" <d-lewit at neiu.edu>
> > > To: The Chicago Python Users Group <chicago at python.org>
> > > Subject: [Chicago] Can this be done with a yield statement and
> > > generator object?
> > > Message-ID:
> > > <CAPdZZGxoZRe=L7LyVQ30QBaqy_NeRWrisOdraQpcu-wFXEhqAw at mail.gmail.com>
> > > Content-Type: text/plain; charset="utf-8"
> > >
> > > Hey guys,
> > >
> > > I'm attaching a simple class that I created in Python.... Python 3 to be
> > > specific, but I think it should work in Python 2 as well, maybe. Anyhow,
> > > is there a way to implement the same concept using a *yield statement*
> > in a
> > > function to create a generator object? Just wondering. Let me know,
> > > thanks!
> > >
> > > Best,
> > >
> > > Douglas Lewit
> > >
> > > P.S. Obviously if you use a generator object to do this then the
> > generator
> > > object would never produce the StopIteration error. But I'm kind of
> > > confused about how to create and define a generator object that would
> > > produce this cyclical behavior in an array or list.
> > > -------------- next part --------------
> > > An HTML attachment was scrubbed...
> > > URL: <
> > http://mail.python.org/pipermail/chicago/attachments/20151110/b012ba59/attachment.html
> > >
> > > -------------- next part --------------
> > > A non-text attachment was scrubbed...
> > > Name: Circular_List_in_Python.png
> > > Type: image/png
> > > Size: 277517 bytes
> > > Desc: not available
> > > URL: <
> > http://mail.python.org/pipermail/chicago/attachments/20151110/b012ba59/attachment.png
> > >
> > > -------------- next part --------------
> > > A non-text attachment was scrubbed...
> > > Name: CircularList.py
> > > Type: text/x-python
> > > Size: 373 bytes
> > > Desc: not available
> > > URL: <
> > http://mail.python.org/pipermail/chicago/attachments/20151110/b012ba59/attachment.py
> > >
> > >
> > > ------------------------------
> > >
> > > Subject: Digest Footer
> > >
> > > _______________________________________________
> > > Chicago mailing list
> > > Chicago at python.org
> > > https://mail.python.org/mailman/listinfo/chicago
> > >
> > >
> > > ------------------------------
> > >
> > > End of Chicago Digest, Vol 123, Issue 6
> > > ***************************************
> >
> > _______________________________________________
> > Chicago mailing list
> > Chicago at python.org
> > https://mail.python.org/mailman/listinfo/chicago
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/chicago/attachments/20151110/f84a62d3/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 3
> Date: Tue, 10 Nov 2015 11:14:47 -0600
> From: Tim Ottinger <tottinge at gmail.com>
> To: The Chicago Python Users Group <chicago at python.org>
> Subject: Re: [Chicago] Can this be done with a yield statement and
> 	generator object?
> Message-ID:
> 	<CAN2NStnXER8dw5dwKWHUA4pprUJQ8KQ9XdcCSoWez8FKoc6MnQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> You mean, like itertools.cycle?
> 
> 
> On Tue, Nov 10, 2015 at 11:09 AM, Lewit, Douglas <d-lewit at neiu.edu> wrote:
> 
> > That's very logical, Will, thanks!   :-)
> >
> > On Tue, Nov 10, 2015 at 9:02 AM, William E. S. Clemens <
> > wesclemens at gmail.com> wrote:
> >
> >> I didn't test, but something like this should do the same.
> >>
> >> def circular_list(array):
> >>       while True:
> >>             counter = -1
> >>             if counter == len(array) - 1:
> >>                   counter = -1
> >>             counter+=1
> >>             yield array[counter]
> >>
> >>
> >> On Tue, Nov 10, 2015 at 2:10 AM, Lewit, Douglas <d-lewit at neiu.edu> wrote:
> >>
> >>> Hey guys,
> >>>
> >>> I'm attaching a simple class that I created in Python.... Python 3 to be
> >>> specific, but I think it should work in Python 2 as well, maybe.  Anyhow,
> >>> is there a way to implement the same concept using a *yield statement*
> >>> in a function to create a generator object?  Just wondering.  Let me know,
> >>> thanks!
> >>>
> >>> Best,
> >>>
> >>> Douglas Lewit
> >>>
> >>> P.S.  Obviously if you use a generator object to do this then the
> >>> generator object would never produce the StopIteration error.  But I'm kind
> >>> of confused about how to create and define a generator object that would
> >>> produce this cyclical behavior in an array or list.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Chicago mailing list
> >>> Chicago at python.org
> >>> https://mail.python.org/mailman/listinfo/chicago
> >>>
> >>>
> >>
> >>
> >> --
> >> William Clemens
> >> Phone: 847.485.9455
> >> E-mail: wesclemens at gmail.com
> >>
> >> _______________________________________________
> >> Chicago mailing list
> >> Chicago at python.org
> >> https://mail.python.org/mailman/listinfo/chicago
> >>
> >>
> >
> > _______________________________________________
> > Chicago mailing list
> > Chicago at python.org
> > https://mail.python.org/mailman/listinfo/chicago
> >
> >
> 
> 
> -- 
> Tim Ottinger, Anzeneer, Industrial Logic
> -------------------------------------
> http://www.industriallogic.com/
> http://agileotter.blogspot.com/
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/chicago/attachments/20151110/9f332581/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 4
> Date: Tue, 10 Nov 2015 11:20:33 -0600
> From: "Lewit, Douglas" <d-lewit at neiu.edu>
> To: The Chicago Python Users Group <chicago at python.org>
> Subject: Re: [Chicago] Need advice on this project.
> Message-ID:
> 	<CAPdZZGzFHp4VRknpBPhtzdugYDOC7WOSskogxes_PqadXUig-w at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> This was amazingly helpful, thanks!  I'll check the denominator of my
> correlation, but I'm pretty sure that's correct.  But it won't hurt to
> double check it.  When I took a slice of my similarity matrix all the
> correlations were floats in between -1 and +1, so that's a good sign that
> my computation was correct albeit very time consuming.  The reason I didn't
> use Numpy arrays is because the professor for this class doesn't know a lot
> of Python, and he uses Microsoft Visual Studio to run my Python programs.
> I don't know if Numpy is a part of that installation.  Numpy is not part of
> the standard Python installation, so if I submit a program that contains
> anything from the Numpy library then he won't be able to run my code.  I
> emailed him and asked him about his Python installation, but he didn't get
> back to me.
> 
> Thanks for the feedback!  Very much appreciated!!!
> 
> Best,
> 
> Douglas.
> 
> 
> On Tue, Nov 10, 2015 at 8:52 AM, Sunhwan Jo <sunhwanj at gmail.com> wrote:
> 
> > 1. Your ?correlation? function takes most of the execution time.
> >
> > def Correlation(p, q):
> >       global PQ_Ratings
> >       sum1 = 0
> >       sum2 = 0
> >       numeratorProduct = 1
> >       denominatorProduct1 = 1
> >       denominatorProduct2 = 1
> >       for key in filter( lambda x: x[0] == p or x[0] == q,
> > PQ_Ratings.keys( ) ):
> >             if key[0] == p:
> >                   sum1+= PQ_Ratings[key] - AverageRatingsOfItems[key[1]]
> >             else:
> >                   sum2+= PQ_Ratings[key] - AverageRatingsOfItems[key[1]]
> >             numeratorProduct+= sum1*sum2
> >             denominatorProduct1+= sum1**2
> >             denominatorProduct2+= sum2**2
> >       return
> > numeratorProduct/(math.sqrt(denominatorProduct1)*math.sqrt(denominatorProduct2))
> >
> >
> > By changing sum1 and sum2 as list comprehension can increase the execution
> > speed about 10x (rough estimate using your code). In addition, the
> > denominator is also wrong. It should be *sum of squared differences* not
> > *square of sum of differences*, but I?m not concerned at this yet.
> >
> > def Correlation(p, q):
> >       global PQ_Ratings
> >       sum1 = 0
> >       sum2 = 0
> >       numeratorProduct = 1
> >       denominatorProduct1 = 1
> >       denominatorProduct2 = 1
> >       keys = [key for key in PQ_Ratings.keys() if key[0] == p or key[0] ==
> > q]
> >       sum1 = sum([PQ_Ratings[key] - AverageRatingsOfItems[key[1]] for key
> > in keys if key[0] == p])
> >       sum2 = sum([PQ_Ratings[key] - AverageRatingsOfItems[key[1]] for key
> > in keys if key[0] == q])
> >       numeratorProduct+= sum1*sum2
> >       denominatorProduct1+= sum1**2
> >       denominatorProduct2+= sum2**2
> >       return
> > numeratorProduct/(math.sqrt(denominatorProduct1)*math.sqrt(denominatorProduct2))
> >
> >
> > 2. You don?t have to re-calculate sum1 each time. ?sum1" only depends on
> > ?p?. So, you can calculate that only in the outer loop and reuse it.
> >
> > keys = PQ_Ratings.keys()
> > for i in range(1, len(SimilarityMatrix)):
> >       sum1 = sum([PQ_Ratings[key] - AverageRatingsOfItems[key[1]] for key
> > in keys if key[0] == i])
> >
> >       for j in range(i + 1, len(SimilarityMatrix)):
> >             sum2 = sum([PQ_Ratings[key] - AverageRatingsOfItems[key[1]]
> > for key in keys if key[0] == j])
> >             numeratorProduct = sum1*sum2 + 1
> >             denominatorProduct1 = sum1**2 + 1
> >             denominatorProduct2 = sum2**2 + 1
> >             SimilarityMatrix[i][j] =
> > numeratorProduct/(math.sqrt(denominatorProduct1)*math.sqrt(denominatorProduct2))
> >
> >
> > This will again speed up but the total execution time is about 200 minutes
> > with +900 users.
> >
> > 3. Is there any reason not to use NumPy array? Using NumPy it finishes
> > less than a fraction of a minute. Notice I also fixed the bug in the
> > nominator and the denominator.
> >
> > import numpy as np
> > nitems = max(AverageRatingsOfItems.keys())
> > nusers = max([key[0] for key in PQ_Ratings.keys()])
> > avg_rating = np.zeros(nitems)
> > pq_rating = np.zeros((nusers, nitems))
> > keys = PQ_Ratings.keys()
> > for key in keys:
> >       pq_rating[key[0]-1, key[1]-1] = PQ_Ratings[key]
> > keys = AverageRatingsOfItems.keys()
> > for key in keys:
> >       avg_rating[key-1] = AverageRatingsOfItems[key]
> >
> > startTime = time.time( )
> >
> > #### Let's finish building up our similarity matrix for this problem.
> > keys = PQ_Ratings.keys()
> > for i in range(1, len(SimilarityMatrix)):
> >       #sum1 = sum([PQ_Ratings[key] - AverageRatingsOfItems[key[1]] for key
> > in keys if key[0] == i])
> >       diff1 = np.sum(pq_rating[i-1] - avg_rating)
> >
> >       for j in range(i + 1, len(SimilarityMatrix)):
> >             #sum2 = sum([PQ_Ratings[key] - AverageRatingsOfItems[key[1]]
> > for key in keys if key[0] == j])
> >             diff2 = np.sum(pq_rating[j-1] - avg_rating)
> >             numeratorProduct = np.sum(diff1*diff2)
> >             denominatorProduct1 = np.sum(diff1**2)
> >             denominatorProduct2 = np.sum(diff2**2)
> >             SimilarityMatrix[i][j] =
> > numeratorProduct/(math.sqrt(denominatorProduct1)*math.sqrt(denominatorProduct2))
> >
> >
> >
> >
> > On Nov 9, 2015, at 7:44 PM, Lewit, Douglas <d-lewit at neiu.edu> wrote:
> >
> > Hey guys,
> >
> > I need some advice on this one.  I'm attaching the homework assignment so
> > that you understand what I'm trying to do.  I went as far as the
> > construction of the Similarity Matrix, which is a matrix of Pearson
> > correlation coefficients.
> >
> > My problem is this.  u1.base (which is also attached) contains Users
> > (first column), Items (second column), Ratings (third column) and finally
> > the time stamp in the 4th and final column.  (Just discard the 4th column.
> > We're not using it for anything. )
> >
> > It's taking HOURS for Python to build the similarity matrix.  So what I
> > did was:
> >
> > *head -n 5000 u1.base > practice.base*
> >
> > and I also downloaded the PyPy interpreter for Python 3.  Then using PyPy
> > (or pypy or whatever) I ran my program on the first ten thousand lines of
> > data from u1.base stored in the new text file, practice.base.  Not a
> > problem!!!  I still had to wait a couple minutes, but not a couple hours!!!
> >
> >
> > Is there a way to make this program work for such a large set of data?  I
> > know my program successfully constructs the Similarity Matrix (i.e.
> > similarity between users) for 5,000, 10,000, 20,000 and even 25,000 lines
> > of data.  But for 80,000 lines of data the program becomes very slow and
> > overtaxes my CPU.  (The fan turns on and the bottom of my laptop starts to
> > get very hot.... a bad sign! )
> >
> > Does anyone have any recommendations?  ( I'm supposed to meet with my prof
> > on Tuesday.  I may just explain the problem to him and request a smaller
> > data set to work with.  And unfortunately he knows very little about
> > Python.  He's primarily a C++ and Java programmer. )
> >
> > I appreciate the feedback.  Thank you!!!
> >
> > Best,
> >
> > Douglas Lewit
> >
> >
> > <Homework3_Revision2.py><u1.base><practice2.base><HW3.pdf>
> > _______________________________________________
> > Chicago mailing list
> > Chicago at python.org
> > https://mail.python.org/mailman/listinfo/chicago
> >
> >
> >
> > _______________________________________________
> > Chicago mailing list
> > Chicago at python.org
> > https://mail.python.org/mailman/listinfo/chicago
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/chicago/attachments/20151110/ac236063/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 5
> Date: Tue, 10 Nov 2015 11:21:16 -0600
> From: "Lewit, Douglas" <d-lewit at neiu.edu>
> To: The Chicago Python Users Group <chicago at python.org>
> Subject: Re: [Chicago] Can this be done with a yield statement and
> 	generator object?
> Message-ID:
> 	<CAPdZZGy8GbXGvB6tqOpNnqBLBm04UB0Ucw_kZx623W-K_nCpcQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Well that just makes it too easy!!!   ;-)
> 
> On Tue, Nov 10, 2015 at 11:14 AM, Tim Ottinger <tottinge at gmail.com> wrote:
> 
> > You mean, like itertools.cycle?
> >
> >
> > On Tue, Nov 10, 2015 at 11:09 AM, Lewit, Douglas <d-lewit at neiu.edu> wrote:
> >
> >> That's very logical, Will, thanks!   :-)
> >>
> >> On Tue, Nov 10, 2015 at 9:02 AM, William E. S. Clemens <
> >> wesclemens at gmail.com> wrote:
> >>
> >>> I didn't test, but something like this should do the same.
> >>>
> >>> def circular_list(array):
> >>>       while True:
> >>>             counter = -1
> >>>             if counter == len(array) - 1:
> >>>                   counter = -1
> >>>             counter+=1
> >>>             yield array[counter]
> >>>
> >>>
> >>> On Tue, Nov 10, 2015 at 2:10 AM, Lewit, Douglas <d-lewit at neiu.edu>
> >>> wrote:
> >>>
> >>>> Hey guys,
> >>>>
> >>>> I'm attaching a simple class that I created in Python.... Python 3 to
> >>>> be specific, but I think it should work in Python 2 as well, maybe.
> >>>> Anyhow, is there a way to implement the same concept using a *yield
> >>>> statement* in a function to create a generator object?  Just
> >>>> wondering.  Let me know, thanks!
> >>>>
> >>>> Best,
> >>>>
> >>>> Douglas Lewit
> >>>>
> >>>> P.S.  Obviously if you use a generator object to do this then the
> >>>> generator object would never produce the StopIteration error.  But I'm kind
> >>>> of confused about how to create and define a generator object that would
> >>>> produce this cyclical behavior in an array or list.
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Chicago mailing list
> >>>> Chicago at python.org
> >>>> https://mail.python.org/mailman/listinfo/chicago
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> William Clemens
> >>> Phone: 847.485.9455
> >>> E-mail: wesclemens at gmail.com
> >>>
> >>> _______________________________________________
> >>> Chicago mailing list
> >>> Chicago at python.org
> >>> https://mail.python.org/mailman/listinfo/chicago
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Chicago mailing list
> >> Chicago at python.org
> >> https://mail.python.org/mailman/listinfo/chicago
> >>
> >>
> >
> >
> > --
> > Tim Ottinger, Anzeneer, Industrial Logic
> > -------------------------------------
> > http://www.industriallogic.com/
> > http://agileotter.blogspot.com/
> >
> > _______________________________________________
> > Chicago mailing list
> > Chicago at python.org
> > https://mail.python.org/mailman/listinfo/chicago
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/chicago/attachments/20151110/98ea81e5/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 6
> Date: Tue, 10 Nov 2015 11:25:57 -0600
> From: "Lewit, Douglas" <d-lewit at neiu.edu>
> To: The Chicago Python Users Group <chicago at python.org>
> Subject: Re: [Chicago] Need advice on this project.
> Message-ID:
> 	<CAPdZZGxTJB9At=xzjf6t_N6=sQ+DMUHiuX0W78dHYVsdQZqrXQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> 10 seconds???!!!  Wow!!!  Okay then, I'll buy you dinner if you finish my
> homework for me!!!   ;-)
> 
> Argument unpacking?  What's that?  As for lambdas, I just LOVE them!  They
> are so cool, and make certain procedures so much easier.  What is PEP8?  It
> sounds like a nutritional supplement or an energy drink!   ;-)
> 
> On Tue, Nov 10, 2015 at 9:46 AM, Adam Forsyth <adam at adamforsyth.net> wrote:
> 
> > Hi Douglas,
> >
> > You seem to post interesting homework assignments when I'm looking for a
> > fun problem, thanks.
> >
> > The issue definitely isn't the performance of either Python (the language)
> > or CPython (the implementation). I did the assignment last night, and
> > calculating the matrix for "u1.base" took my code less than 10 seconds.
> >
> > For readability in your Correlation function, try to avoid: globals;
> > creating lambdas inside loops; and indexing with constant keys rather than
> > using argument unpacking (i.e. key[0]). It also helps to follow PEP8 if you
> > want other Python programmers to be able to read your code easily.
> >
> > You probably have an algorithmic error in there somewhere -- it's hard for
> > me to tell for sure because your code is difficult to follow. Read the
> > assignment carefully, and only do what it tells you. For performance, are
> > there different data structures you could use? Are there "batteries
> > included" in Python that could combine some of those individual arithmetic
> > operations? I don't want to be too specific here because implementing the
> > algorithm is the point of the assignment.
> >
> > It looks like you still have two weeks to complete the project, so I'd
> > recommend taking your time, and don't be afraid to start a new version --
> > it can help you break out of bad patterns you've started in your existing
> > code.
> >
> > Best,
> > Adam
> >
> >
> >
> > On Mon, Nov 9, 2015 at 7:44 PM, Lewit, Douglas <d-lewit at neiu.edu> wrote:
> >
> >> Hey guys,
> >>
> >> I need some advice on this one.  I'm attaching the homework assignment so
> >> that you understand what I'm trying to do.  I went as far as the
> >> construction of the Similarity Matrix, which is a matrix of Pearson
> >> correlation coefficients.
> >>
> >> My problem is this.  u1.base (which is also attached) contains Users
> >> (first column), Items (second column), Ratings (third column) and finally
> >> the time stamp in the 4th and final column.  (Just discard the 4th column.
> >> We're not using it for anything. )
> >>
> >> It's taking HOURS for Python to build the similarity matrix.  So what I
> >> did was:
> >>
> >> *head -n 5000 u1.base > practice.base*
> >>
> >> and I also downloaded the PyPy interpreter for Python 3.  Then using PyPy
> >> (or pypy or whatever) I ran my program on the first ten thousand lines of
> >> data from u1.base stored in the new text file, practice.base.  Not a
> >> problem!!!  I still had to wait a couple minutes, but not a couple hours!!!
> >>
> >>
> >> Is there a way to make this program work for such a large set of data?  I
> >> know my program successfully constructs the Similarity Matrix (i.e.
> >> similarity between users) for 5,000, 10,000, 20,000 and even 25,000 lines
> >> of data.  But for 80,000 lines of data the program becomes very slow and
> >> overtaxes my CPU.  (The fan turns on and the bottom of my laptop starts to
> >> get very hot.... a bad sign! )
> >>
> >> Does anyone have any recommendations?  ( I'm supposed to meet with my
> >> prof on Tuesday.  I may just explain the problem to him and request a
> >> smaller data set to work with.  And unfortunately he knows very little
> >> about Python.  He's primarily a C++ and Java programmer. )
> >>
> >> I appreciate the feedback.  Thank you!!!
> >>
> >> Best,
> >>
> >> Douglas Lewit
> >>
> >>
> >>
> >> _______________________________________________
> >> Chicago mailing list
> >> Chicago at python.org
> >> https://mail.python.org/mailman/listinfo/chicago
> >>
> >>
> >
> > _______________________________________________
> > Chicago mailing list
> > Chicago at python.org
> > https://mail.python.org/mailman/listinfo/chicago
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/chicago/attachments/20151110/5b3eb8c8/attachment.html>
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> https://mail.python.org/mailman/listinfo/chicago
> 
> 
> ------------------------------
> 
> End of Chicago Digest, Vol 123, Issue 8
> ***************************************
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20151110/0235ee1a/attachment-0001.html>


More information about the Chicago mailing list