[Chicago] Need advice on this project.

Adam Forsyth adam at adamforsyth.net
Tue Nov 10 12:54:34 EST 2015

I can't tell if you're joking or not (which is a problem when you're asking
for help), but if you're actually not familiar with one or more of those
terms, I suggest you Google them. Both are important to understand as a
Python programmer.

On Tue, Nov 10, 2015 at 11:25 AM, Lewit, Douglas <d-lewit at neiu.edu> wrote:

> 10 seconds???!!!  Wow!!!  Okay then, I'll buy you dinner if you finish my
> homework for me!!!   ;-)
> Argument unpacking?  What's that?  As for lambdas, I just LOVE them!  They
> are so cool, and make certain procedures so much easier.  What is PEP8?  It
> sounds like a nutritional supplement or an energy drink!   ;-)
> On Tue, Nov 10, 2015 at 9:46 AM, Adam Forsyth <adam at adamforsyth.net>
> wrote:
>> Hi Douglas,
>> You seem to post interesting homework assignments when I'm looking for a
>> fun problem, thanks.
>> The issue definitely isn't the performance of either Python (the
>> language) or CPython (the implementation). I did the assignment last night,
>> and calculating the matrix for "u1.base" took my code less than 10 seconds.
>> For readability in your Correlation function, try to avoid: globals;
>> creating lambdas inside loops; and indexing with constant keys rather than
>> using argument unpacking (i.e. key[0]). It also helps to follow PEP8 if you
>> want other Python programmers to be able to read your code easily.
>> You probably have an algorithmic error in there somewhere -- it's hard
>> for me to tell for sure because your code is difficult to follow. Read the
>> assignment carefully, and only do what it tells you. For performance, are
>> there different data structures you could use? Are there "batteries
>> included" in Python that could combine some of those individual arithmetic
>> operations? I don't want to be too specific here because implementing the
>> algorithm is the point of the assignment.
>> It looks like you still have two weeks to complete the project, so I'd
>> recommend taking your time, and don't be afraid to start a new version --
>> it can help you break out of bad patterns you've started in your existing
>> code.
>> Best,
>> Adam
>> On Mon, Nov 9, 2015 at 7:44 PM, Lewit, Douglas <d-lewit at neiu.edu> wrote:
>>> Hey guys,
>>> I need some advice on this one.  I'm attaching the homework assignment
>>> so that you understand what I'm trying to do.  I went as far as the
>>> construction of the Similarity Matrix, which is a matrix of Pearson
>>> correlation coefficients.
>>> My problem is this.  u1.base (which is also attached) contains Users
>>> (first column), Items (second column), Ratings (third column) and finally
>>> the time stamp in the 4th and final column.  (Just discard the 4th column.
>>> We're not using it for anything. )
>>> It's taking HOURS for Python to build the similarity matrix.  So what I
>>> did was:
>>> *head -n 5000 u1.base > practice.base*
>>> and I also downloaded the PyPy interpreter for Python 3.  Then using
>>> PyPy (or pypy or whatever) I ran my program on the first ten thousand lines
>>> of data from u1.base stored in the new text file, practice.base.  Not a
>>> problem!!!  I still had to wait a couple minutes, but not a couple hours!!!
>>> Is there a way to make this program work for such a large set of data?
>>> I know my program successfully constructs the Similarity Matrix (i.e.
>>> similarity between users) for 5,000, 10,000, 20,000 and even 25,000 lines
>>> of data.  But for 80,000 lines of data the program becomes very slow and
>>> overtaxes my CPU.  (The fan turns on and the bottom of my laptop starts to
>>> get very hot.... a bad sign! )
>>> Does anyone have any recommendations?  ( I'm supposed to meet with my
>>> prof on Tuesday.  I may just explain the problem to him and request a
>>> smaller data set to work with.  And unfortunately he knows very little
>>> about Python.  He's primarily a C++ and Java programmer. )
>>> I appreciate the feedback.  Thank you!!!
>>> Best,
>>> Douglas Lewit
>>> _______________________________________________
>>> Chicago mailing list
>>> Chicago at python.org
>>> https://mail.python.org/mailman/listinfo/chicago
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> https://mail.python.org/mailman/listinfo/chicago
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> https://mail.python.org/mailman/listinfo/chicago
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20151110/d62e39b7/attachment.html>

More information about the Chicago mailing list