[Chennaipy] MOM - September 2018 Chennaipy Meetup

Vijay Kumar vijaykumar at bravegnu.org
Sun Oct 14 22:12:15 EDT 2018


Hi Bharathwaaj,
Thanks for the minutes. The description of my talk, in the minutes, is a 
little terse, and might not accurately reflect what I said. A more 
accurate description can be obtained from my slides available at 
https://www.dropbox.com/s/uwzvgvfrs7o0nhf/slides.html?dl=1

Regards,
Vijay



On Sunday 07 October 2018 10:35 AM, Bharathwaaj S wrote:
> Hello,
>
> Apologize for the delay. Please find the minutes of September 2018 Meetup.
>
> *Data Compression Techniques*
> Data compression involves minimizing bytes size without degrading 
> quality to an unacceptable level. There are lossy & lossless data 
> compressions.
>
> But how can we measure information? Information theory provides 
> solution for the same. It defines 1 unit of information. Uncertainty, 
> Information and Entropy are terms used in Information Theory. If a 
> data is uncertain it means it has low probability and hence high in 
> information and entropy. (For ex. it is hot in chennai is not an 
> information but snow in chennai is)
>
> If a data needs to be compressed, instead of coding directly the bits, 
> we can alter the codeword based on their probability of occurrence. 
> Huffman Coding Algorithm uses this method to achieve lossless data 
> compression. It maps symbols to probability based codeword.
>
> Information theory is a well developed field and many ideas are drawn 
> from it in data sciences. On a lighter note, this was already 
> implemented in Morse code on 1836 before Shannon formalised it on 1948.
>
> *Last mile problem in ML*
> Software Engineering involves a function which takes an input and 
> gives an output. Machine Learning involves a good function which is 
> called as model.
>
> For ML we now have dead simple APIs with abundance power. The 
> cornerstone of science is repeatable results. Since data science 
> involves science, it is important to produce repeatable results and 
> hence track experiments. When this is not done we end up with zombie 
> models.
>
> We need a way to obtain the following (wishlist)
> - Remember what training data used
> - Remember what code was used
> - Remember configuration and hyperparameters used
> - Remember results
> - Save model
> - Compare the results
>
> We've a tool called mlflow which provides these. With the help of apis 
> such as set_tracking_url, start, log_param, log_metric, log_artifact 
> these could be achieved. We could also deploy to AWS sagemaker.
>
> The code structure should be proper and should try to expose the 
> models like a library. A sample code structure was shared.
>
> *Pysangamam - Lessons learnt*
> Timeline 2 keynote, 16 20 minute slots, 16 poster slots, 12 lightning 
> slots. Idea started on Dec 8, 2017.
>
> Zen - Local > national > international. Stick with where the base is 
> more. Use mail lists in TN.
>
> Prototype before implementing was the rule. And constraints lead to 
> quality. Organizers cannot be speakers and ensure environment is kept 
> clean after the event.
>
> Good part:
> All tasks were completed on time. There were rehearsals and the 
> quality was good. Posters were very engaging. Lightning talks time 
> managed using a countdown timer. Food was served on time. Reception 
> was positive.
>
> Website was great and social media were updated. The name & logo were 
> well appreciated. Venue was spacious and compact. Contributor tickets 
> helped provide discounts to students.
>
> Hard part:
> The process was painful. Difficult to keep enthusiasm and set the ball 
> rolling. Ensured that the organizers F2F meet once every week.  
> Sponsorship was difficult. No sufficient contacts.
>
> Less takers for posters and logo approval took time. Video recording, 
> banners had issues. No on the spot registration. Very few volunteers. 
> And food got waste.
>
> *Importance of unit testing:*
> Unit testing makes product stable and prevents regression. Good unit 
> test = No network, no db, no file modification, run parallel, no 
> special environment. Artima Link: 
> https://www.artima.com/weblogs/viewpost.jsp?thread=126923 
> <https://www.artima.com/weblogs/viewpost.jsp?thread=126923>
>
> Mock external dependencies in unit tests.
>
> pip install exam (Provides decorators like fixture, before, after). 
> Use flake8
>
> Importance of logging in Database - For disaster recovery.
>
> Kind regards,
> Bharath
>
>
>
> _______________________________________________
> Chennaipy mailing list
> Chennaipy at python.org
> https://mail.python.org/mailman/listinfo/chennaipy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chennaipy/attachments/20181015/8ca44f27/attachment.html>


More information about the Chennaipy mailing list