Peter Opapa

Data Engineer & Cloud Architect building reliable, cloud-native pipelines and real-time systems.

I design end-to-end data platforms on Azure, from ingestion to analytics-ready models using tools like Kafka, Spark, dbt, and Snowflake.

About Me

I build data systems that are dependable, observable, and easy to scale. My work spans medallion architectures, real-time streaming, and batch pipelines that move data from raw sources to analytics-ready layers.

I focus on cloud-native delivery in Azure, using modern tooling like Databricks, Airflow, Kafka, dbt, and Snowflake to ship production-grade pipelines with clean models and clear documentation.

Cloud-native delivery with Azure and Databricks
Real-time and batch pipelines with Kafka, Spark, and Airflow
ELT modeling with dbt and Snowflake
Peter Opapa - Data Engineer

Skills

Cloud Data Platforms

Azure (ADF, ADLS, Synapse)
Databricks
Snowflake
Oracle Cloud (OCI)

Big Data Frameworks

Apache Spark
Apache Flink
Apache Kafka
Apache Airflow
dbt Core

Programming & Databases

Python
SQL
PostgreSQL
MongoDB
SQL Server

DevOps & Tools

Docker
Git & GitHub
GitHub Actions
Terraform

Analytics & Visualization

Power BI
Excel

Certifications

Awards

Coming soon...

Projects

Architecture diagram for a Medallion data warehouse

Data Warehouse: Medallion Architecture

Built a production-ready SQL Server data warehouse using the Medallion architecture and T-SQL ETL. Features automated loading, transformation scripts, and data quality checks. The final output is a star-schema optimized for BI, serving as a reusable reference for SQL data pipelines.

Microsoft SQL Server Powershell Data Modelling ETL Git
View on GitHub
Architecture diagram for a real-time data pipeline with Kafka and Spark

Real-time Data Pipeline

A real-time streaming pipeline ingesting people's profile data from an API, orchestrated by Airflow. Kafka decouples data, which is backed up in PostgreSQL. Apache Spark processes the stream and stores enriched data in Cassandra. The entire system is containerized with Docker for easy deployment and scalability.

Apache Airflow Apache Kafka Apache Spark Apache Cassandra Docker PostgreSQL Python
View on GitHub
Financial transactions pipeline with Flink and Datadog

Financial Transactions Pipeline

Built a real-time financial analytics pipeline using Kafka, Flink SQL, and PostgreSQL. It ingests events from a Python producer, performs event-time aggregations in Apache Flink, and stores results in PostgreSQL Database. Datadog monitors the pipeline system health in real time. The entire system is containerized with Docker for easy deployment.

Python Apache Kafka Apache Flink PostgreSQL Datadog Docker
View on GitHub
Azure and Terraform infrastructure diagram

Azure Infrastructure Deployment Using Terraform

This project automates the deployment and management of Azure Kubernetes Service (AKS) infrastructure using Terraform, with CI/CD pipelines powered by GitHub Actions/ Azure DevOps. The setup includes a AKS, Azure Active Directory, Azure Resource Manager, Storage Accounts, and Azure Key Vault, ensuring a consistent and repeatable infrastructure deployment process.

Terraform Azure CLI GitHub Actions Git Azure Kubernetes Service Azure Key Vault Azure Active Directory
View on GitHub
Jumia ELT web scraping pipeline architecture

Jumia ELT Pipeline

This project implements a production-grade ELT data pipeline that scrapes laptop product data from Jumia Kenya using BeautifulSoup and Requests, then processes it through a medallion architecture using PostgreSQL stored procedures. Apache Airflow orchestrates scraping, loading, and transformation tasks. The entire pipeline is containerized with Docker for portability and scalability and integrated with GitHub Actions for CI/CD.

Python(Pandas, Requests, Beautiful Soup) PostgreSQL Docker
View on GitHub
dbt and Snowflake ELT pipeline architecture

MovieLens Data Pipeline - ADLS Gen2 + dbt + Snowflake

This project showcases a modern ELT pipeline. It extracts the raw MovieLens 20M dataset from Azure Data Lake Storage Gen2, loads it into Snowflake data warehouse, and then dbt connects to Snowflake to perform data modeling and transformations. This process creates a dimensional model, with auto-generated documentation, optimized for analytics and ML applications.

Data Build Tool(dbt) Snowflake Azure Data Lake Storage Gen2 Git Github Actions Github Pages
View on GitHub
Azure weather streaming pipeline architecture

Weather Streaming

This project demonstrates a comprehensive real-time weather data streaming pipeline using Azure cloud services. The system ingests weather data from external APIs, processes it through Azure Event Hubs, and visualizes real-time insights through Power BI dashboards. The project showcases cost optimization strategies by providing both Databricks and Azure Functions implementations.

Databricks Azure Functions Azure Event Hub Azure Stream Analytics Power BI Azure Key Vault Azure Cost Management
View on GitHub
Azure enterprise-scale medallion architecture

Azure Data Engineering Project: Enterprise-Scale Medallion Architecture

This comprehensive Azure data engineering solution demonstrates the implementation of a production-ready Medallion Architecture (Bronze → Silver → Gold) pattern. The project leverages Microsoft Azure's native cloud services to create a scalable, maintainable, and cost-effective data pipeline that processes AdventureWorks business data from ingestion through to business intelligence reporting.

Pyspark Azure Data Factory Azure Data Lake Storage Databricks Azure Synapse Analytics Power BI
View on GitHub
Azure Databricks and Unity Catalog architecture

Azure-Databricks Project with Unity Catalog

This project uses Azure Data Factory to ingest data from GitHub into ADLS Gen2. It follows a medallion architecture: Databricks Autoloader streams raw data to the Bronze layer, Delta Lake tables clean it for the Silver layer, and the Gold layer provides aggregated data for analytics.

Azure Data Factory Azure Data Lake Databricks Git Delta Lake Pyspark
View on GitHub

Contact

Let's connect and discuss how we can work together on your next data project!