[Chicago] Need advice on this project.

Ross Heflin heflin.rosst at gmail.com
Tue Nov 10 09:57:28 EST 2015


Might be time to profile.
Run your similarity matrix builder with the large dataset against cProfile
(or whatever works on PyPy) for some time (30 min) and see where its
spending the majority of its time.

-Ross

On Mon, Nov 9, 2015 at 7:44 PM, Lewit, Douglas <d-lewit at neiu.edu> wrote:

> Hey guys,
>
> I need some advice on this one.  I'm attaching the homework assignment so
> that you understand what I'm trying to do.  I went as far as the
> construction of the Similarity Matrix, which is a matrix of Pearson
> correlation coefficients.
>
> My problem is this.  u1.base (which is also attached) contains Users
> (first column), Items (second column), Ratings (third column) and finally
> the time stamp in the 4th and final column.  (Just discard the 4th column.
> We're not using it for anything. )
>
> It's taking HOURS for Python to build the similarity matrix.  So what I
> did was:
>
> *head -n 5000 u1.base > practice.base*
>
> and I also downloaded the PyPy interpreter for Python 3.  Then using PyPy
> (or pypy or whatever) I ran my program on the first ten thousand lines of
> data from u1.base stored in the new text file, practice.base.  Not a
> problem!!!  I still had to wait a couple minutes, but not a couple hours!!!
>
>
> Is there a way to make this program work for such a large set of data?  I
> know my program successfully constructs the Similarity Matrix (i.e.
> similarity between users) for 5,000, 10,000, 20,000 and even 25,000 lines
> of data.  But for 80,000 lines of data the program becomes very slow and
> overtaxes my CPU.  (The fan turns on and the bottom of my laptop starts to
> get very hot.... a bad sign! )
>
> Does anyone have any recommendations?  ( I'm supposed to meet with my prof
> on Tuesday.  I may just explain the problem to him and request a smaller
> data set to work with.  And unfortunately he knows very little about
> Python.  He's primarily a C++ and Java programmer. )
>
> I appreciate the feedback.  Thank you!!!
>
> Best,
>
> Douglas Lewit
>
>
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> https://mail.python.org/mailman/listinfo/chicago
>
>


-- 
>From the "desk" of Ross Heflin
phone number: (847) <23,504,826th decimal place of pi>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20151110/d8a0deae/attachment.html>


More information about the Chicago mailing list