Dear Luellen<br>Parallel computing , big data and a talk about ocaml are beyond the scope of the talk.<br>I know about functional programming and ocaml in general can you email me personally about these topics ? I would like to help you understand these but it would be best to do this off of the list<br>Sincerely <br>Joshua herman <br><div class="gmail_quote"><div dir="ltr">On Tue, Sep 8, 2015 at 2:55 PM Lewit, Douglas <<a href="mailto:d-lewit@neiu.edu">d-lewit@neiu.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><font size="4">Thanks Tanya! Yes, someone at Northeastern told me the three keys to success in big data are 1) Python, 2) R and finally 3) Hadoop, which he said is really an extension of SQL. I'm sure that next week the three keys to success will be something else! Technology is great, but it's changing faster than I can keep pace with it. I wonder how many people feel that way???<br><br></font></div><font size="4">What time does the presentation officially begin on Thursday evening? 6 pm? What time does it end? Is it structures like a classroom or is it more open-ended? Just one group? Or does it get split up into many different sub-groups?<br><br></font></div><font size="4">I don't know a lot about "parallel computing" other than this is what Java programmers call "Threads". I think (not too sure really) that when you "thread" an algorithm, you allocate one core for one part of the algorithm and then another core for another part of the algorithm, and so on and so forth, and then at the end it all has to get magically pieced back together. ( Sounds like a merge sort problem to me! ) What about Python? Does Python support this multicore approach or "threading"? I really don't know. Jon Haroop, the author of "Ocaml For Scientists" told me that the Ocaml programming language became much less popular in the early 2000's because its main developer underestimated the impact of multicore technology on modern programming.<br><br></font></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Sep 8, 2015 at 8:28 AM, Tanya Schlusser <span dir="ltr"><<a href="mailto:tanya@tickel.net" target="_blank">tanya@tickel.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Where is the ChiPy meeting this Thursday evening? I checked the website,<br>
but the location of the meeting had not yet been decided.<br></blockquote><div><br></div><div>It's at Braintree again (8th floor, Merchandise Mart) -- sorry for taking so long to get it up!</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
When I think of "big data analysis" I think of something like, "Okay, read<br>
all these data from an Excel spreadsheet into a huge Python array or<br>
matrix, and then construct various Q-Q plots to see if the data are<br>
normally distributed, exponentially distributed or something else, and then<br>
determine the parameters of the distribution". In other words, when I hear<br>
"big data" I'm really thinking of a mixture of statistics and computer<br>
programming. Is that correct or is my "definition" a little too narrow?<br></blockquote><div><br></div><div><br></div><div>It's pretty correct. The 'analysis' part is correct -- it's still statistics / machine learning. The 'big data' part really was a catchall phrase for "anything that can't be done right now in a standard database". So 'big data' can mean a handful of things:</div><div><ul><li>Workarounds to key generation because inserts are happening faster than a standard database can deal with them. This was the 'Velocity' part of the big data marketers' advertising campaigns. <a href="https://blog.twitter.com/2010/announcing-snowflake" target="_blank">Twitter's Snowflake</a> is a good example of working around this.<br><br></li><li>NOSQL (Not Only SQL) -- storing and doing computation over images, MRIs, genetic data, PDFs, entire Log Files, et cetera... This is the 'Variety' in the big data marketing. Hadoop's Distributed File System and MongoDB are good examples of databases that can store these sorts of files.<br><br></li><li></li><li>Parallel computation on a(n inexpensive) cluster because it would take too long or the data would not fit on one computer. This means the algorithms had to be rewritten for parallel execution. This was the 'Volume' part of big data marketing.<br></li><ul><li><a href="https://mahout.apache.org/users/basics/algorithms.html" target="_blank">Apache Mahout</a> (in java) was I think one of the first open-source implementation of parallelized machine learning algorithms.</li><li>The hottest things for this now are the <a href="http://spark.apache.org/docs/latest/mllib-guide.html" target="_blank">Spark Machine Learning library</a> ( -- <a href="http://pyvideo.org/video/3407/introduction-to-spark-with-python" target="_blank">Pycon 2015 presentation of spark+python</a>) . There is also a <a href="http://www.meetup.com/Chicago-Spark-Users/" target="_blank">Chicago Spark Meetup</a>.</li><li>And the newcomer Apache Flink, also in Java, bypasses Java's garbage collection for speed, optimizes SQL queries (unlike Hive), and claims to provide a truly streaming analytics option without some of the hangups of Storm. <a href="http://mvnrepository.com/artifact/org.apache.flink/flink-python" target="_blank">It also has Python bindings</a>. There is a <a href="http://www.meetup.com/Chicago-Apache-Flink-Meetup/" target="_blank">Chicago Flink meetup</a> -- I think it's the 3rd Flink user group in North America.<br><br></li></ul></ul></div><div>hope it was useful...see you @ Braintree Thursday!</div></div></div></div>
<br>_______________________________________________<br>
Chicago mailing list<br>
<a href="mailto:Chicago@python.org" target="_blank">Chicago@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/chicago" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/chicago</a><br>
<br></blockquote></div><br></div>
_______________________________________________<br>
Chicago mailing list<br>
<a href="mailto:Chicago@python.org" target="_blank">Chicago@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/chicago" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/chicago</a><br>
</blockquote></div>