[Chennaipy] Meeting minutes of 6/27/2020 Chennai py
Pradeep Padmanaban
pradeeppadmanaban7 at gmail.com
Sat Jun 27 07:43:25 EDT 2020
*Role of Python in ETL- Deepak*
ETL - Extract, Transform and Load
Python used to customize ETL because of easy to read, write and execute
structure
Python ETL tools:
1.
Petl - Example with converting 4x2 array to html
2.
Pandas - has lot of implementations in which etl with transformations,
conditional
Example - load the data from file , find duplicates and send it to another
file
1.
Mara - Light weight , web based ui(special feature)
2.
Apache Airflow- created by Airbnb, DAG(Directed Acyclic Graphs)
3.
Pyspark - Big Data tool, Data streaming , ML on top of streaming
4.
Bonobo - Supports Parallel processing
5.
Luigi - Created by spotify, for enterprise level solution (more incoming
data every minute)
6.
AVIK Cloud - Not open source. Python can be implemented directly. Its a
software as a service product
Doubts :
Ashok: Can you explain me pipeline
Deepak: For enterprise level activity i have multiple operations in
parallel so pipelines are created.
*Introduction to Cyclic Redundancy Check (CRC)- Ashok*
Also called frame check sequence
Types of Errors -
1.
Single Bit error
2.
Burst error
Error detection in Computer Network errors- Add Redundancy bits
1.
A basic example - to transmit 1000 bits from CP1 to CP2 + 125 redundancy
bit
2.
State Machine
3.
2 dimensional Parity Check - for 32 bit extra 13 bits are send
4.
Checksum - Binary Addition
5.
CRC - most common used in digital systems
i) State machine
ii) CRC computation -XOR in Polynomial Division
iii) Code walk through
Conclusion - we may get a CRC error if we open the harddrive.
*Open Slot:*
Ashok - Documentary suggestion - Prediction by the numbers, The Code both
are available in Netflix
Rengaraj - Pycon İndia 2020
Pradeep -Mit Opencourseware https://ocw.mit.edu/index.htm
Vijay Ravider - python modules used in infrastructure based provisioning
services
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chennaipy/attachments/20200627/4f546048/attachment.html>
More information about the Chennaipy
mailing list