Peter Opapa-Data Engineer

Architecture diagram for a Medallion data warehouse

Data Warehouse: Medallion Architecture

Built a production-ready SQL Server data warehouse using the Medallion architecture and T-SQL ETL. Features automated loading, transformation scripts, and data quality checks. The final output is a star-schema optimized for BI, serving as a reusable reference for SQL data pipelines.

Microsoft SQL Server Powershell Data Modelling ETL Git

View on GitHub

Real-time Data Pipeline

A real-time streaming pipeline ingesting people's profile data from an API, orchestrated by Airflow. Kafka decouples data, which is backed up in PostgreSQL. Apache Spark processes the stream and stores enriched data in Cassandra. The entire system is containerized with Docker for easy deployment and scalability.

Apache Airflow Apache Kafka Apache Spark Apache Cassandra Docker PostgreSQL Python

View on GitHub

Financial Transactions Pipeline

Built a real-time financial analytics pipeline using Kafka, Flink SQL, and PostgreSQL. It ingests events from a Python producer, performs event-time aggregations in Apache Flink, and stores results in PostgreSQL Database. Datadog monitors the pipeline system health in real time. The entire system is containerized with Docker for easy deployment.

Python Apache Kafka Apache Flink PostgreSQL Datadog Docker

View on GitHub

Azure and Terraform infrastructure diagram

Azure Infrastructure Deployment Using Terraform

This project automates the deployment and management of Azure Kubernetes Service (AKS) infrastructure using Terraform, with CI/CD pipelines powered by GitHub Actions/ Azure DevOps. The setup includes a AKS, Azure Active Directory, Azure Resource Manager, Storage Accounts, and Azure Key Vault, ensuring a consistent and repeatable infrastructure deployment process.

Terraform Azure CLI GitHub Actions Git Azure Kubernetes Service Azure Key Vault Azure Active Directory

View on GitHub

Jumia ELT web scraping pipeline architecture

Jumia ELT Pipeline

This project implements a production-grade ELT data pipeline that scrapes laptop product data from Jumia Kenya using BeautifulSoup and Requests, then processes it through a medallion architecture using PostgreSQL stored procedures. Apache Airflow orchestrates scraping, loading, and transformation tasks. The entire pipeline is containerized with Docker for portability and scalability and integrated with GitHub Actions for CI/CD.

Python(Pandas, Requests, Beautiful Soup) PostgreSQL Docker

View on GitHub

dbt and Snowflake ELT pipeline architecture

MovieLens Data Pipeline - ADLS Gen2 + dbt + Snowflake

This project showcases a modern ELT pipeline. It extracts the raw MovieLens 20M dataset from Azure Data Lake Storage Gen2, loads it into Snowflake data warehouse, and then dbt connects to Snowflake to perform data modeling and transformations. This process creates a dimensional model, with auto-generated documentation, optimized for analytics and ML applications.

Data Build Tool(dbt) Snowflake Azure Data Lake Storage Gen2 Git Github Actions Github Pages

View on GitHub

Weather Streaming

This project demonstrates a comprehensive real-time weather data streaming pipeline using Azure cloud services. The system ingests weather data from external APIs, processes it through Azure Event Hubs, and visualizes real-time insights through Power BI dashboards. The project showcases cost optimization strategies by providing both Databricks and Azure Functions implementations.

Databricks Azure Functions Azure Event Hub Azure Stream Analytics Power BI Azure Key Vault Azure Cost Management

View on GitHub

Azure enterprise-scale medallion architecture

Azure Data Engineering Project: Enterprise-Scale Medallion Architecture

This comprehensive Azure data engineering solution demonstrates the implementation of a production-ready Medallion Architecture (Bronze → Silver → Gold) pattern. The project leverages Microsoft Azure's native cloud services to create a scalable, maintainable, and cost-effective data pipeline that processes AdventureWorks business data from ingestion through to business intelligence reporting.

Pyspark Azure Data Factory Azure Data Lake Storage Databricks Azure Synapse Analytics Power BI

View on GitHub

Azure Databricks and Unity Catalog architecture

Azure-Databricks Project with Unity Catalog

This project uses Azure Data Factory to ingest data from GitHub into ADLS Gen2. It follows a medallion architecture: Databricks Autoloader streams raw data to the Bronze layer, Delta Lake tables clean it for the Silver layer, and the Gold layer provides aggregated data for analytics.

Azure Data Factory Azure Data Lake Databricks Git Delta Lake Pyspark

View on GitHub

Peter Opapa

Data Engineer & Cloud Architect building reliable, cloud-native pipelines and real-time systems.

About Me

Skills

Cloud Data Platforms

Big Data Frameworks

Programming & Databases

DevOps & Tools

Analytics & Visualization

Certifications

Confluent Data Streaming Engineer

MongoDB Developer

Oracle Cloud Infrastructure Architect

Oracle Cloud Infrastructure Generative AI

Snowflake-Data Engineering

Snowflake-Data Warehouse

Databricks Platform Architect

Databricks Fundamentals

dbt Fundamentals

Awards

Projects

Data Warehouse: Medallion Architecture

Real-time Data Pipeline

Financial Transactions Pipeline

Azure Infrastructure Deployment Using Terraform

Jumia ELT Pipeline

MovieLens Data Pipeline - ADLS Gen2 + dbt + Snowflake

Weather Streaming

Azure Data Engineering Project: Enterprise-Scale Medallion Architecture

Azure-Databricks Project with Unity Catalog

Contact

Phone

Email

LinkedIn