[Chicago] Need advice on this project.

Lewit, Douglas d-lewit at neiu.edu
Wed Nov 11 07:10:57 EST 2015

Hey Mark,

Please don't sweat it too much!  I ran my program overnight.  With Pypy it
took slightly more than 2 hours.  Then I wrote my matrix to a file.  Read
it back in just to make sure it worked--and it did!!!!  My prof said that
the data won't change, so the important thing is to just save the matrix in
a file and then reuse that matrix (which comes from the training set) and
then use it to make some "guesses" about the test set.  In other words, I
don't have to keep generating the matrix over and over and over again,
thank god!!!  Basically a one-shot deal and then I can just store the
results in a text file and read the results back into a list of lists for
future applications.

But thanks for the help!  I'll check out what you did on Github.... but
first a little sleep!

Stay warm,


On Tue, Nov 10, 2015 at 9:37 AM, Mark Graves <mgraves87 at gmail.com> wrote:

> I think I must have screwed this up, can someone point out my errors?
> I worked based off Doug's code, then attempted to dictify the results to
> minimize lookup times in that filter function.
> Full disclosure: I was only working based off no errors, with no knowledge
> of the algorithm implementation.
> code:
> https://gist.github.com/gravesmedical/58a6b665b553c1294b56
> On Tue, Nov 10, 2015 at 8:57 AM, Ross Heflin <heflin.rosst at gmail.com>
> wrote:
>> Might be time to profile.
>> Run your similarity matrix builder with the large dataset against
>> cProfile (or whatever works on PyPy) for some time (30 min) and see where
>> its spending the majority of its time.
>> -Ross
>> On Mon, Nov 9, 2015 at 7:44 PM, Lewit, Douglas <d-lewit at neiu.edu> wrote:
>>> Hey guys,
>>> I need some advice on this one.  I'm attaching the homework assignment
>>> so that you understand what I'm trying to do.  I went as far as the
>>> construction of the Similarity Matrix, which is a matrix of Pearson
>>> correlation coefficients.
>>> My problem is this.  u1.base (which is also attached) contains Users
>>> (first column), Items (second column), Ratings (third column) and finally
>>> the time stamp in the 4th and final column.  (Just discard the 4th column.
>>> We're not using it for anything. )
>>> It's taking HOURS for Python to build the similarity matrix.  So what I
>>> did was:
>>> *head -n 5000 u1.base > practice.base*
>>> and I also downloaded the PyPy interpreter for Python 3.  Then using
>>> PyPy (or pypy or whatever) I ran my program on the first ten thousand lines
>>> of data from u1.base stored in the new text file, practice.base.  Not a
>>> problem!!!  I still had to wait a couple minutes, but not a couple hours!!!
>>> Is there a way to make this program work for such a large set of data?
>>> I know my program successfully constructs the Similarity Matrix (i.e.
>>> similarity between users) for 5,000, 10,000, 20,000 and even 25,000 lines
>>> of data.  But for 80,000 lines of data the program becomes very slow and
>>> overtaxes my CPU.  (The fan turns on and the bottom of my laptop starts to
>>> get very hot.... a bad sign! )
>>> Does anyone have any recommendations?  ( I'm supposed to meet with my
>>> prof on Tuesday.  I may just explain the problem to him and request a
>>> smaller data set to work with.  And unfortunately he knows very little
>>> about Python.  He's primarily a C++ and Java programmer. )
>>> I appreciate the feedback.  Thank you!!!
>>> Best,
>>> Douglas Lewit
>>> _______________________________________________
>>> Chicago mailing list
>>> Chicago at python.org
>>> https://mail.python.org/mailman/listinfo/chicago
>> --
>> From the "desk" of Ross Heflin
>> phone number: (847) <23,504,826th decimal place of pi>
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> https://mail.python.org/mailman/listinfo/chicago
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> https://mail.python.org/mailman/listinfo/chicago
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20151111/b8023282/attachment.html>

More information about the Chicago mailing list