An ELT Pipeline for NYC Taxi & Weather Analysis
Data Engineering
This project is a complete, end-to-end data platform built on AWS. It automates the ingestion, transformation, and analysis of NYC taxi trip and weather data to uncover the impact of precipitation on travel times.
The entire cloud environment was provisioned using Terraform, establishing a reproducible and version-controlled infrastructure. An ELT pipeline, orchestrated by Apache Airflow, extracts raw data from a public data lake and a live weather API, loading it into a Redshift data warehouse. Data is then transformed, cleaned, and validated using dbt, with a suite of over 50 data quality tests to ensure accuracy. The final, analysis-ready data is presented in the interactive Streamlit dashboard.