[Chicago] Chicago Digest, Vol 121, Issue 6

Osman Siddique osiddique at gmail.com
Wed Sep 9 16:46:33 CEST 2015

'Inside Luellen Douglas' was a great movie.

On Tue, Sep 8, 2015 at 3:04 PM, Lewit, Douglas <d-lewit at neiu.edu> wrote:

> Luellen???  Not even close.  First name is Douglas.  Last name is Lewit.
> I think Luellen is the lady who owns the nail salon down the street!   :-)
> On Tue, Sep 8, 2015 at 2:57 PM, Joshua Herman <zitterbewegung at gmail.com>
> wrote:
>> Dear Luellen
>> Parallel computing , big data and a talk about ocaml are beyond the scope
>> of the talk.
>> I know about functional programming and ocaml in general can you email me
>> personally about these topics ? I would like to help you understand these
>> but it would be best to do this off of the list
>> Sincerely
>> Joshua herman
>> On Tue, Sep 8, 2015 at 2:55 PM Lewit, Douglas <d-lewit at neiu.edu> wrote:
>>> Thanks Tanya!  Yes, someone at Northeastern told me the three keys to
>>> success in big data are 1) Python, 2) R and finally 3) Hadoop, which he
>>> said is really an extension of SQL.  I'm sure that next week the three keys
>>> to success will be something else!  Technology is great, but it's changing
>>> faster than I can keep pace with it.  I wonder how many people feel that
>>> way???
>>> What time does the presentation officially begin on Thursday evening?  6
>>> pm?  What time does it end?  Is it structures like a classroom or is it
>>> more open-ended?  Just one group?  Or does it get split up into many
>>> different sub-groups?
>>> I don't know a lot about "parallel computing" other than this is what
>>> Java programmers call "Threads".  I think (not too sure really) that when
>>> you "thread" an algorithm, you allocate one core for one part of the
>>> algorithm and then another core for another part of the algorithm, and so
>>> on and so forth, and then at the end it all has to get magically pieced
>>> back together.  ( Sounds like a merge sort problem to me! )  What about
>>> Python?  Does Python support this multicore approach or "threading"?  I
>>> really don't know.  Jon Haroop, the author of "Ocaml For Scientists" told
>>> me that the Ocaml programming language became much less popular in the
>>> early 2000's because its main developer underestimated the impact of
>>> multicore technology on modern programming.
>>> On Tue, Sep 8, 2015 at 8:28 AM, Tanya Schlusser <tanya at tickel.net>
>>> wrote:
>>>>> Where is the ChiPy meeting this Thursday evening?  I checked the
>>>>> website,
>>>>> but the location of the meeting had not yet been decided.
>>>> It's at Braintree again (8th floor, Merchandise Mart) -- sorry for
>>>> taking so long to get it up!
>>>>> When I think of "big data analysis" I think of something like, "Okay,
>>>>> read
>>>>> all these data from an Excel spreadsheet into a huge Python array or
>>>>> matrix, and then construct various Q-Q plots to see if the data are
>>>>> normally distributed, exponentially distributed or something else, and
>>>>> then
>>>>> determine the parameters of the distribution".  In other words, when I
>>>>> hear
>>>>> "big data" I'm really thinking of a mixture of statistics and computer
>>>>> programming.  Is that correct or is my "definition" a little too
>>>>> narrow?
>>>> It's pretty correct. The 'analysis' part is correct -- it's still
>>>> statistics / machine learning. The 'big data' part really was a catchall
>>>> phrase for "anything that can't be done right now in a standard database".
>>>> So 'big data' can mean a handful of things:
>>>>    - Workarounds to key generation because inserts are happening
>>>>    faster than a standard database can deal with them. This was the 'Velocity'
>>>>    part of the big data marketers' advertising campaigns. Twitter's
>>>>    Snowflake <https://blog.twitter.com/2010/announcing-snowflake> is a
>>>>    good example of working around this.
>>>>    - NOSQL (Not Only SQL) -- storing and doing computation over
>>>>    images, MRIs, genetic data, PDFs, entire Log Files, et cetera... This is
>>>>    the 'Variety' in the big data marketing. Hadoop's Distributed File System
>>>>    and MongoDB are good examples of databases that can store these sorts of
>>>>    files.
>>>>    -
>>>>    - Parallel computation on a(n inexpensive) cluster because it would
>>>>    take too long or the data would not fit on one computer. This means the
>>>>    algorithms had to be rewritten for parallel execution. This was the
>>>>    'Volume' part of big data marketing.
>>>>    - Apache Mahout
>>>>       <https://mahout.apache.org/users/basics/algorithms.html> (in
>>>>       java) was I think one of the first open-source implementation of
>>>>       parallelized machine learning algorithms.
>>>>       - The hottest things for this now are the Spark Machine Learning
>>>>       library <http://spark.apache.org/docs/latest/mllib-guide.html> (
>>>>       -- Pycon 2015 presentation of spark+python
>>>>       <http://pyvideo.org/video/3407/introduction-to-spark-with-python>)
>>>>       . There is also a Chicago Spark Meetup
>>>>       <http://www.meetup.com/Chicago-Spark-Users/>.
>>>>       - And the newcomer Apache Flink, also in Java, bypasses Java's
>>>>       garbage collection for speed, optimizes SQL queries (unlike Hive), and
>>>>       claims to provide a truly streaming analytics option without some of the
>>>>       hangups of Storm. It also has Python bindings
>>>>       <http://mvnrepository.com/artifact/org.apache.flink/flink-python>.
>>>>       There is a Chicago Flink meetup
>>>>       <http://www.meetup.com/Chicago-Apache-Flink-Meetup/> -- I think
>>>>       it's the 3rd Flink user group in North America.
>>>> hope it was useful...see you @ Braintree Thursday!
>>>> _______________________________________________
>>>> Chicago mailing list
>>>> Chicago at python.org
>>>> https://mail.python.org/mailman/listinfo/chicago
>>> _______________________________________________
>>> Chicago mailing list
>>> Chicago at python.org
>>> https://mail.python.org/mailman/listinfo/chicago
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> https://mail.python.org/mailman/listinfo/chicago
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> https://mail.python.org/mailman/listinfo/chicago
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20150909/76cde798/attachment.html>

More information about the Chicago mailing list