Transforming raw data into actionable insights through innovative engineering solutions
I’m a results-driven Data Engineer with over 1 year of experience designing and building scalable data pipelines, cloud-native architectures, and data analytics solutions. I’m passionate about transforming complex data challenges into elegant, automated, and reliable systems that drive informed decision-making.
My expertise spans the modern data stack — from ETL/ELT processes and orchestration to data lakes and data warehousing. I thrive in cloud environments like Azure and bring a solid foundation in big data technologies and scalable analytics platforms.
Based on my experience and knowledge, I offer a range of services that may be of interest to you.
With my expertise in Airflow, dbt, and Azure Data Factory, I am capable of building robust, automated ETL & ELT pipelines from any data source.
I am skilled in architecting and implementing scalable, modern data platforms on Azure, Snowflake, and Databricks.
I have hands-on experience building high-throughput, real-time data pipelines using Kafka, Flink, and Spark Streaming.
I can build and optimize scalable data warehouses on Snowflake and Synapse, implementing modern Medallion or Star Schema models.
I can automate your infrastructure using Terraform (IaC) and build production-grade CI/CD pipelines with Docker and GitHub Actions.
I am skilled in connecting data to insights by building and optimizing interactive, real-time dashboards in Power BI.
University of Nairobi
University of Nairobi
My background in Electrical and Electronics Engineering gave me a strong foundation in system-level design. I was trained to build and analyze complex, interconnected systems, and I found a natural parallel between that and the challenge of architecting large-scale data infrastructure. This 'engineering-first' mindset is what propelled my specialization into data pipelines, where I now use tools like Spark, Flink, and Kafka to manage and process data at scale.
Associate Level
Associate Level
Associate Level
Professional Level
Entry Level
Entry Level
Entry Level
Entry Level
Entry Level
Built a production-ready SQL Server data warehouse using the Medallion architecture and T-SQL ETL. Features automated loading, transformation scripts, and data quality checks. The final output is a star-schema optimized for BI, serving as a reusable reference for SQL data pipelines.
A real-time streaming pipeline ingesting people's profile data from an API, orchestrated by Airflow. Kafka decouples data, which is backed up in PostgreSQL. Apache Spark processes the stream and stores enriched data in Cassandra. The entire system is containerized with Docker for easy deployment and scalability.
Built a real-time financial analytics pipeline using Kafka, Flink SQL, and PostgreSQL. It ingests events from a Python producer, performs event-time aggregations in Apache Flink, and stores results in PostgreSQL Database. Datadog monitors the pipeline system health in real time. The entire system is containerized with Docker for easy deployment.
This project automates the deployment and management of Azure Kubernetes Service (AKS) infrastructure using Terraform, with CI/CD pipelines powered by GitHub Actions/ Azure DevOps. The setup includes a AKS, Azure Active Directory, Azure Resource Manager, Storage Accounts, and Azure Key Vault, ensuring a consistent and repeatable infrastructure deployment process.
This project implements a production-grade ELT data pipeline that scrapes laptop product data from Jumia Kenya using BeautifulSoup and Requests, then processes it through a medallion architecture using PostgreSQL stored procedures. Apache Airflow orchestrates scraping, loading, and transformation tasks. The entire pipeline is containerized with Docker for portability and scalability and integrated with GitHub Actions for CI/CD.
This project showcases a modern ELT pipeline. It extracts the raw MovieLens 20M dataset from Azure Data Lake Storage Gen2, loads it into Snowflake data warehouse, and then dbt connects to Snowflake to perform data modeling and transformations. This process creates a dimensional model, with auto-generated documentation, optimized for analytics and ML applications.
This project demonstrates a comprehensive real-time weather data streaming pipeline using Azure cloud services. The system ingests weather data from external APIs, processes it through Azure Event Hubs, and visualizes real-time insights through Power BI dashboards. The project showcases cost optimization strategies by providing both Databricks and Azure Functions implementations.
This comprehensive Azure data engineering solution demonstrates the implementation of a production-ready Medallion Architecture (Bronze → Silver → Gold) pattern. The project leverages Microsoft Azure's native cloud services to create a scalable, maintainable, and cost-effective data pipeline that processes AdventureWorks business data from ingestion through to business intelligence reporting.
This project uses Azure Data Factory to ingest data from GitHub into ADLS Gen2. It follows a medallion architecture: Databricks Autoloader streams raw data to the Bronze layer, Delta Lake tables clean it for the Silver layer, and the Gold layer provides aggregated data for analytics.
Let's connect and discuss how we can work together on your next data project!