PROJECT 10 : DATA ENGINEERING PROJECTS
Welcome to my Cliche…
The advent of Big data no doubt has proliferated the need for Data Engineers.
But what is Big data?
They are data with 3 V’s and can’t be processed by traditional method.
The 3 V’s are:
- Volume(large like Gigabytes,Pb..),
- Velocity (frequent, stream, real-time…)
- Variety ( different formats like csv, parquet, audio, video..)
This kind of data needs specialized infrastructures, Schema, Storage, Processing tools, automation & monitoring.
Who is going to do that?
DATA ENGINEERS
Earlier this year, I did a course in Udacity with lot of projects, These projects are:
- Data Modelling with PostegreSQL & Cassandra
- ETL in Cloud Data Warehouses
- Data Lakes with Spark
- Data Pipelines with Airflow
- Data Engineering Final Capstone Project: US Migration data ETL pipeline with Spark"
I have decided to share them on my Github