I design end-to-end data platforms on Azure, from ingestion to analytics-ready models using tools like Kafka, Spark, dbt, and Snowflake.
I'm a Data Engineer and pan-African innovator with a track record of building data systems that matter. I've represented Kenya at Carnegie Mellon University's AI programme, led cross-border teams across the Afretec network, and earned recognition — including a Silver Award, seed funding, and a CMU-Africa incubation invitation — for work at the intersection of data engineering and social impact.
Technically, I specialise in scalable cloud-native pipelines on Azure — from raw ingestion through real-time streaming to analytics-ready models — using tools like Databricks, Kafka, Spark, dbt, and Snowflake. I care about systems that are dependable, observable, and built to last.
University of Nairobi
Homabay High School
KamiLimu Limited — Nairobi, Kenya (Hybrid)
Selected as 1 of 40 mentees in the KamiLimu Mentorship Program, focusing on Data Engineering and professional development.
EPRA — Nairobi, Kenya
C4DLab / Afretec Network
African Inclusive Digital Industries, Carnegie Mellon University
Upwork — Remote
Ubuntu Water Hub
Associate Level
Associate Level
Associate Level
Professional Level
Entry Level
Entry Level
Entry Level
Entry Level
Entry Level
Carnegie Mellon University Africa
Exclusively invited to the CMU-Africa Business Incubation Program in Kigali, Rwanda with seed funding following the C4DLab Makerthon.
Afretec Network Pan-African Competition
Awarded 2nd Place out of 7 Pan-African teams for engineering blockchain-driven fintech solutions for financial inclusion.
Carnegie Mellon University, Kigali
Represented Kenya at the African Inclusive Digital Industries Programme. Group project on edge computing health monitoring ranked #1 among all participating teams.
EPRA Research Week Youth Summit Hackathon 2026
Recognised for data-driven regulatory platform FairGrid, judged by EPRA regulators and energy sector experts.
University of Nairobi – Faculty of Engineering
Recognised as the second best 4th Year student in the Faculty of Engineering for the academic year 2024/2025.
Built a production-ready SQL Server data warehouse using the Medallion architecture and T-SQL ETL. Features automated loading, transformation scripts, and data quality checks. The final output is a star-schema optimized for BI, serving as a reusable reference for SQL data pipelines.
A real-time streaming pipeline ingesting people's profile data from an API, orchestrated by Airflow. Kafka decouples data, which is backed up in PostgreSQL. Apache Spark processes the stream and stores enriched data in Cassandra. The entire system is containerized with Docker for easy deployment and scalability.
Built a real-time financial analytics pipeline using Kafka, Flink SQL, and PostgreSQL. It ingests events from a Python producer, performs event-time aggregations in Apache Flink, and stores results in PostgreSQL Database. Datadog monitors the pipeline system health in real time. The entire system is containerized with Docker for easy deployment.
This project automates the deployment and management of Azure Kubernetes Service (AKS) infrastructure using Terraform, with CI/CD pipelines powered by GitHub Actions/ Azure DevOps. The setup includes a AKS, Azure Active Directory, Azure Resource Manager, Storage Accounts, and Azure Key Vault, ensuring a consistent and repeatable infrastructure deployment process.
This project implements a production-grade ELT data pipeline that scrapes laptop product data from Jumia Kenya using BeautifulSoup and Requests, then processes it through a medallion architecture using PostgreSQL stored procedures. Apache Airflow orchestrates scraping, loading, and transformation tasks. The entire pipeline is containerized with Docker for portability and scalability and integrated with GitHub Actions for CI/CD.
This project showcases a modern ELT pipeline. It extracts the raw MovieLens 20M dataset from Azure Data Lake Storage Gen2, loads it into Snowflake data warehouse, and then dbt connects to Snowflake to perform data modeling and transformations. This process creates a dimensional model, with auto-generated documentation, optimized for analytics and ML applications.
This project demonstrates a comprehensive real-time weather data streaming pipeline using Azure cloud services. The system ingests weather data from external APIs, processes it through Azure Event Hubs, and visualizes real-time insights through Power BI dashboards. The project showcases cost optimization strategies by providing both Databricks and Azure Functions implementations.
This comprehensive Azure data engineering solution demonstrates the implementation of a production-ready Medallion Architecture (Bronze → Silver → Gold) pattern. The project leverages Microsoft Azure's native cloud services to create a scalable, maintainable, and cost-effective data pipeline that processes AdventureWorks business data from ingestion through to business intelligence reporting.
This project uses Azure Data Factory to ingest data from GitHub into ADLS Gen2. It follows a medallion architecture: Databricks Autoloader streams raw data to the Bronze layer, Delta Lake tables clean it for the Silver layer, and the Gold layer provides aggregated data for analytics.
Let's connect and discuss how we can work together on your next data project!