[Chennaipy] MOM - September 2018 Chennaipy Meetup
Vijay Kumar
vijaykumar at bravegnu.org
Sun Oct 14 22:12:15 EDT 2018
Hi Bharathwaaj,
Thanks for the minutes. The description of my talk, in the minutes, is a
little terse, and might not accurately reflect what I said. A more
accurate description can be obtained from my slides available at
https://www.dropbox.com/s/uwzvgvfrs7o0nhf/slides.html?dl=1
Regards,
Vijay
On Sunday 07 October 2018 10:35 AM, Bharathwaaj S wrote:
> Hello,
>
> Apologize for the delay. Please find the minutes of September 2018 Meetup.
>
> *Data Compression Techniques*
> Data compression involves minimizing bytes size without degrading
> quality to an unacceptable level. There are lossy & lossless data
> compressions.
>
> But how can we measure information? Information theory provides
> solution for the same. It defines 1 unit of information. Uncertainty,
> Information and Entropy are terms used in Information Theory. If a
> data is uncertain it means it has low probability and hence high in
> information and entropy. (For ex. it is hot in chennai is not an
> information but snow in chennai is)
>
> If a data needs to be compressed, instead of coding directly the bits,
> we can alter the codeword based on their probability of occurrence.
> Huffman Coding Algorithm uses this method to achieve lossless data
> compression. It maps symbols to probability based codeword.
>
> Information theory is a well developed field and many ideas are drawn
> from it in data sciences. On a lighter note, this was already
> implemented in Morse code on 1836 before Shannon formalised it on 1948.
>
> *Last mile problem in ML*
> Software Engineering involves a function which takes an input and
> gives an output. Machine Learning involves a good function which is
> called as model.
>
> For ML we now have dead simple APIs with abundance power. The
> cornerstone of science is repeatable results. Since data science
> involves science, it is important to produce repeatable results and
> hence track experiments. When this is not done we end up with zombie
> models.
>
> We need a way to obtain the following (wishlist)
> - Remember what training data used
> - Remember what code was used
> - Remember configuration and hyperparameters used
> - Remember results
> - Save model
> - Compare the results
>
> We've a tool called mlflow which provides these. With the help of apis
> such as set_tracking_url, start, log_param, log_metric, log_artifact
> these could be achieved. We could also deploy to AWS sagemaker.
>
> The code structure should be proper and should try to expose the
> models like a library. A sample code structure was shared.
>
> *Pysangamam - Lessons learnt*
> Timeline 2 keynote, 16 20 minute slots, 16 poster slots, 12 lightning
> slots. Idea started on Dec 8, 2017.
>
> Zen - Local > national > international. Stick with where the base is
> more. Use mail lists in TN.
>
> Prototype before implementing was the rule. And constraints lead to
> quality. Organizers cannot be speakers and ensure environment is kept
> clean after the event.
>
> Good part:
> All tasks were completed on time. There were rehearsals and the
> quality was good. Posters were very engaging. Lightning talks time
> managed using a countdown timer. Food was served on time. Reception
> was positive.
>
> Website was great and social media were updated. The name & logo were
> well appreciated. Venue was spacious and compact. Contributor tickets
> helped provide discounts to students.
>
> Hard part:
> The process was painful. Difficult to keep enthusiasm and set the ball
> rolling. Ensured that the organizers F2F meet once every week.
> Sponsorship was difficult. No sufficient contacts.
>
> Less takers for posters and logo approval took time. Video recording,
> banners had issues. No on the spot registration. Very few volunteers.
> And food got waste.
>
> *Importance of unit testing:*
> Unit testing makes product stable and prevents regression. Good unit
> test = No network, no db, no file modification, run parallel, no
> special environment. Artima Link:
> https://www.artima.com/weblogs/viewpost.jsp?thread=126923
> <https://www.artima.com/weblogs/viewpost.jsp?thread=126923>
>
> Mock external dependencies in unit tests.
>
> pip install exam (Provides decorators like fixture, before, after).
> Use flake8
>
> Importance of logging in Database - For disaster recovery.
>
> Kind regards,
> Bharath
>
>
>
> _______________________________________________
> Chennaipy mailing list
> Chennaipy at python.org
> https://mail.python.org/mailman/listinfo/chennaipy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chennaipy/attachments/20181015/8ca44f27/attachment.html>
More information about the Chennaipy
mailing list