[Chicago] Chicago Digest, Vol 121, Issue 6

Lewit, Douglas d-lewit at neiu.edu
Tue Sep 8 22:04:26 CEST 2015

Luellen???  Not even close.  First name is Douglas.  Last name is Lewit.  I
think Luellen is the lady who owns the nail salon down the street!   :-)

On Tue, Sep 8, 2015 at 2:57 PM, Joshua Herman <zitterbewegung at gmail.com>

> Dear Luellen
> Parallel computing , big data and a talk about ocaml are beyond the scope
> of the talk.
> I know about functional programming and ocaml in general can you email me
> personally about these topics ? I would like to help you understand these
> but it would be best to do this off of the list
> Sincerely
> Joshua herman
> On Tue, Sep 8, 2015 at 2:55 PM Lewit, Douglas <d-lewit at neiu.edu> wrote:
>> Thanks Tanya!  Yes, someone at Northeastern told me the three keys to
>> success in big data are 1) Python, 2) R and finally 3) Hadoop, which he
>> said is really an extension of SQL.  I'm sure that next week the three keys
>> to success will be something else!  Technology is great, but it's changing
>> faster than I can keep pace with it.  I wonder how many people feel that
>> way???
>> What time does the presentation officially begin on Thursday evening?  6
>> pm?  What time does it end?  Is it structures like a classroom or is it
>> more open-ended?  Just one group?  Or does it get split up into many
>> different sub-groups?
>> I don't know a lot about "parallel computing" other than this is what
>> Java programmers call "Threads".  I think (not too sure really) that when
>> you "thread" an algorithm, you allocate one core for one part of the
>> algorithm and then another core for another part of the algorithm, and so
>> on and so forth, and then at the end it all has to get magically pieced
>> back together.  ( Sounds like a merge sort problem to me! )  What about
>> Python?  Does Python support this multicore approach or "threading"?  I
>> really don't know.  Jon Haroop, the author of "Ocaml For Scientists" told
>> me that the Ocaml programming language became much less popular in the
>> early 2000's because its main developer underestimated the impact of
>> multicore technology on modern programming.
>> On Tue, Sep 8, 2015 at 8:28 AM, Tanya Schlusser <tanya at tickel.net> wrote:
>>>> Where is the ChiPy meeting this Thursday evening?  I checked the
>>>> website,
>>>> but the location of the meeting had not yet been decided.
>>> It's at Braintree again (8th floor, Merchandise Mart) -- sorry for
>>> taking so long to get it up!
>>>> When I think of "big data analysis" I think of something like, "Okay,
>>>> read
>>>> all these data from an Excel spreadsheet into a huge Python array or
>>>> matrix, and then construct various Q-Q plots to see if the data are
>>>> normally distributed, exponentially distributed or something else, and
>>>> then
>>>> determine the parameters of the distribution".  In other words, when I
>>>> hear
>>>> "big data" I'm really thinking of a mixture of statistics and computer
>>>> programming.  Is that correct or is my "definition" a little too narrow?
>>> It's pretty correct. The 'analysis' part is correct -- it's still
>>> statistics / machine learning. The 'big data' part really was a catchall
>>> phrase for "anything that can't be done right now in a standard database".
>>> So 'big data' can mean a handful of things:
>>>    - Workarounds to key generation because inserts are happening faster
>>>    than a standard database can deal with them. This was the 'Velocity' part
>>>    of the big data marketers' advertising campaigns. Twitter's Snowflake
>>>    <https://blog.twitter.com/2010/announcing-snowflake> is a good
>>>    example of working around this.
>>>    - NOSQL (Not Only SQL) -- storing and doing computation over images,
>>>    MRIs, genetic data, PDFs, entire Log Files, et cetera... This is the
>>>    'Variety' in the big data marketing. Hadoop's Distributed File System and
>>>    MongoDB are good examples of databases that can store these sorts of files.
>>>    -
>>>    - Parallel computation on a(n inexpensive) cluster because it would
>>>    take too long or the data would not fit on one computer. This means the
>>>    algorithms had to be rewritten for parallel execution. This was the
>>>    'Volume' part of big data marketing.
>>>    - Apache Mahout
>>>       <https://mahout.apache.org/users/basics/algorithms.html> (in
>>>       java) was I think one of the first open-source implementation of
>>>       parallelized machine learning algorithms.
>>>       - The hottest things for this now are the Spark Machine Learning
>>>       library <http://spark.apache.org/docs/latest/mllib-guide.html> (
>>>       -- Pycon 2015 presentation of spark+python
>>>       <http://pyvideo.org/video/3407/introduction-to-spark-with-python>)
>>>       . There is also a Chicago Spark Meetup
>>>       <http://www.meetup.com/Chicago-Spark-Users/>.
>>>       - And the newcomer Apache Flink, also in Java, bypasses Java's
>>>       garbage collection for speed, optimizes SQL queries (unlike Hive), and
>>>       claims to provide a truly streaming analytics option without some of the
>>>       hangups of Storm. It also has Python bindings
>>>       <http://mvnrepository.com/artifact/org.apache.flink/flink-python>.
>>>       There is a Chicago Flink meetup
>>>       <http://www.meetup.com/Chicago-Apache-Flink-Meetup/> -- I think
>>>       it's the 3rd Flink user group in North America.
>>> hope it was useful...see you @ Braintree Thursday!
>>> _______________________________________________
>>> Chicago mailing list
>>> Chicago at python.org
>>> https://mail.python.org/mailman/listinfo/chicago
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> https://mail.python.org/mailman/listinfo/chicago
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> https://mail.python.org/mailman/listinfo/chicago
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20150908/25f1f0fe/attachment-0001.html>

More information about the Chicago mailing list