Peter Opapa

Data Engineer & Cloud Architect building reliable, cloud-native pipelines and real-time systems.

I design end-to-end data platforms on Azure, from ingestion to analytics-ready models using tools like Kafka, Spark, dbt, and Snowflake.

9+
Certifications
9+
Projects
5+
Awards
4+
Clients

About Me

I'm a Data Engineer and pan-African innovator with a track record of building data systems that matter. I've represented Kenya at Carnegie Mellon University's AI programme, led cross-border teams across the Afretec network, and earned recognition — including a Silver Award, seed funding, and a CMU-Africa incubation invitation — for work at the intersection of data engineering and social impact.

Technically, I specialise in scalable cloud-native pipelines on Azure — from raw ingestion through real-time streaming to analytics-ready models — using tools like Databricks, Kafka, Spark, dbt, and Snowflake. I care about systems that are dependable, observable, and built to last.

Cloud-native delivery with Azure and Databricks
Real-time and batch pipelines with Kafka, Spark, and Airflow
ELT modeling with dbt and Snowflake
Peter Opapa - Data Engineer

Education

BSc. Electrical and Electronics Engineering

University of Nairobi

2021 – 2026
Expected: First Class Honours 2nd Best 4th Year Student, Faculty of Engineering

Kenya Certificate of Secondary Education (KCSE)

Homabay High School

2017 – 2020
Grade: A Plain — 82 Points

Experience

KamiLimu Fellow

KamiLimu Limited — Nairobi, Kenya (Hybrid)

Mar 2026 – Present

Selected as 1 of 40 mentees in the KamiLimu Mentorship Program, focusing on Data Engineering and professional development.

Tariff Equity — EPRA Hackathon 2026

EPRA — Nairobi, Kenya

April 2026
  • Competed in the EPRA Research Week Youth Summit Hackathon 2026 under Optimal Tariff Design for Energy Affordability and Equity.
  • Led the data science component of FairGrid — a regulatory platform helping EPRA reclassify Kenya's 8.8M residential electricity customers using 7 socioeconomic variables and ML clustering, reducing simulated subsidy leakage from 43.2% to 11.4% and improving the equity index from 0.41 to 0.74.
  • Outcome: 2nd Runners Up + Best Use of Data Award — judged by EPRA regulators and energy sector experts.

Silver Medallist & Innovator – Financial Inclusion

C4DLab / Afretec Network

Feb 2026 – Mar 2026
  • Led a Pan-African team of 7 researchers to engineer blockchain-driven fintech solutions for Libex, raising blockchain adoption for underserved communities from 30% to 60%.
  • Awarded Silver Award (2nd Place) out of 7 teams; secured exclusive invitation and seed funding for CMU-Africa Business Incubation Program in Kigali, Rwanda.

Participant – AI & Digital Manufacturing Track

African Inclusive Digital Industries, Carnegie Mellon University

Dec 2025
  • Selected to represent Kenya in a competitive, fully funded pan-African programme focused on AI, digital manufacturing, IoT, and optimization.
  • Designed and implemented an edge computing–based health monitoring system to reduce unnecessary hospital visits and enable continuous remote patient care.
  • Project ranked #1 among all participating teams.

Freelance Junior Data Engineer

Upwork — Remote

Mar 2024 – May 2025
  • Delivered scalable data engineering solutions for 4 clients; designed and optimised ETL workflows for structured and unstructured data.
  • Implemented cloud-based data infrastructure and translated client requirements into actionable analytics pipelines ensuring data quality, security, and compliance.

Electronics Engineering Intern (IoT)

Ubuntu Water Hub

May 2025 – Aug 2025
  • Assisted in the design and simulation of IoT devices for real-time water pump monitoring and control.
  • Evaluated prototype performance focusing on sensor data accuracy, reliability, and signal integrity.
  • Analysed real-time data (flow rate, status, pressure) to inform system design and evaluation.

Skills

Cloud Data Platforms

Azure (ADF, ADLS, Synapse)
Databricks
Snowflake
Oracle Cloud (OCI)

Big Data Frameworks

Apache Spark
Apache Flink
Apache Kafka
Apache Airflow
dbt Core

Programming & Databases

Python
SQL
PostgreSQL
MongoDB
SQL Server

DevOps & Tools

Docker
Git & GitHub
GitHub Actions
Terraform

Analytics & Visualization

Power BI
Excel

Certifications

Awards

Seed Funding & CMU-Africa Incubation Invitation

Carnegie Mellon University Africa

Exclusively invited to the CMU-Africa Business Incubation Program in Kigali, Rwanda with seed funding following the C4DLab Makerthon.

2026

Silver Award (2nd Place) – C4DLab Makerthon

Afretec Network Pan-African Competition

Awarded 2nd Place out of 7 Pan-African teams for engineering blockchain-driven fintech solutions for financial inclusion.

Mar 2026

Top Project Award – African Inclusive Digital Industries

Carnegie Mellon University, Kigali

Represented Kenya at the African Inclusive Digital Industries Programme. Group project on edge computing health monitoring ranked #1 among all participating teams.

Dec 2025

2nd Runners Up + Best Use of Data

EPRA Research Week Youth Summit Hackathon 2026

Recognised for data-driven regulatory platform FairGrid, judged by EPRA regulators and energy sector experts.

2026

Second Best 4th Year Student

University of Nairobi – Faculty of Engineering

Recognised as the second best 4th Year student in the Faculty of Engineering for the academic year 2024/2025.

2024/2025

Projects

Architecture diagram for a Medallion data warehouse

Data Warehouse: Medallion Architecture

Built a production-ready SQL Server data warehouse using the Medallion architecture and T-SQL ETL. Features automated loading, transformation scripts, and data quality checks. The final output is a star-schema optimized for BI, serving as a reusable reference for SQL data pipelines.

Microsoft SQL Server Powershell Data Modelling ETL Git
View on GitHub
Architecture diagram for a real-time data pipeline with Kafka and Spark

Real-time Data Pipeline

A real-time streaming pipeline ingesting people's profile data from an API, orchestrated by Airflow. Kafka decouples data, which is backed up in PostgreSQL. Apache Spark processes the stream and stores enriched data in Cassandra. The entire system is containerized with Docker for easy deployment and scalability.

Apache Airflow Apache Kafka Apache Spark Apache Cassandra Docker PostgreSQL Python
View on GitHub
Financial transactions pipeline with Flink and Datadog

Financial Transactions Pipeline

Built a real-time financial analytics pipeline using Kafka, Flink SQL, and PostgreSQL. It ingests events from a Python producer, performs event-time aggregations in Apache Flink, and stores results in PostgreSQL Database. Datadog monitors the pipeline system health in real time. The entire system is containerized with Docker for easy deployment.

Python Apache Kafka Apache Flink PostgreSQL Datadog Docker
View on GitHub
Azure and Terraform infrastructure diagram

Azure Infrastructure Deployment Using Terraform

This project automates the deployment and management of Azure Kubernetes Service (AKS) infrastructure using Terraform, with CI/CD pipelines powered by GitHub Actions/ Azure DevOps. The setup includes a AKS, Azure Active Directory, Azure Resource Manager, Storage Accounts, and Azure Key Vault, ensuring a consistent and repeatable infrastructure deployment process.

Terraform Azure CLI GitHub Actions Git Azure Kubernetes Service Azure Key Vault Azure Active Directory
View on GitHub
Jumia ELT web scraping pipeline architecture

Jumia ELT Pipeline

This project implements a production-grade ELT data pipeline that scrapes laptop product data from Jumia Kenya using BeautifulSoup and Requests, then processes it through a medallion architecture using PostgreSQL stored procedures. Apache Airflow orchestrates scraping, loading, and transformation tasks. The entire pipeline is containerized with Docker for portability and scalability and integrated with GitHub Actions for CI/CD.

Python(Pandas, Requests, Beautiful Soup) PostgreSQL Docker
View on GitHub
dbt and Snowflake ELT pipeline architecture

MovieLens Data Pipeline - ADLS Gen2 + dbt + Snowflake

This project showcases a modern ELT pipeline. It extracts the raw MovieLens 20M dataset from Azure Data Lake Storage Gen2, loads it into Snowflake data warehouse, and then dbt connects to Snowflake to perform data modeling and transformations. This process creates a dimensional model, with auto-generated documentation, optimized for analytics and ML applications.

Data Build Tool(dbt) Snowflake Azure Data Lake Storage Gen2 Git Github Actions Github Pages
View on GitHub
Azure weather streaming pipeline architecture

Weather Streaming

This project demonstrates a comprehensive real-time weather data streaming pipeline using Azure cloud services. The system ingests weather data from external APIs, processes it through Azure Event Hubs, and visualizes real-time insights through Power BI dashboards. The project showcases cost optimization strategies by providing both Databricks and Azure Functions implementations.

Databricks Azure Functions Azure Event Hub Azure Stream Analytics Power BI Azure Key Vault Azure Cost Management
View on GitHub
Azure enterprise-scale medallion architecture

Azure Data Engineering Project: Enterprise-Scale Medallion Architecture

This comprehensive Azure data engineering solution demonstrates the implementation of a production-ready Medallion Architecture (Bronze → Silver → Gold) pattern. The project leverages Microsoft Azure's native cloud services to create a scalable, maintainable, and cost-effective data pipeline that processes AdventureWorks business data from ingestion through to business intelligence reporting.

Pyspark Azure Data Factory Azure Data Lake Storage Databricks Azure Synapse Analytics Power BI
View on GitHub
Azure Databricks and Unity Catalog architecture

Azure-Databricks Project with Unity Catalog

This project uses Azure Data Factory to ingest data from GitHub into ADLS Gen2. It follows a medallion architecture: Databricks Autoloader streams raw data to the Bronze layer, Delta Lake tables clean it for the Silver layer, and the Gold layer provides aggregated data for analytics.

Azure Data Factory Azure Data Lake Databricks Git Delta Lake Pyspark
View on GitHub

Contact

Let's connect and discuss how we can work together on your next data project!