Peter Opapa

Data Engineer & Architect specializing in Big Data, Cloud and AI.

Transforming raw data into actionable insights through innovative engineering solutions

About Me

I’m a results-driven Data Engineer with over 1 year of experience designing and building scalable data pipelines, cloud-native architectures, and data analytics solutions. I’m passionate about transforming complex data challenges into elegant, automated, and reliable systems that drive informed decision-making.

My expertise spans the modern data stack — from ETL/ELT processes and orchestration to data lakes and data warehousing. I thrive in cloud environments like Azure and bring a solid foundation in big data technologies and scalable analytics platforms.

1+ years in data engineering and analytics
Cloud-native experience in Azure
Specialized in Big data, ETL/ELT, Python and SQL
Peter Opapa - Data Engineer

Services I Offer

Based on my experience and knowledge, I offer a range of services that may be of interest to you.

Data Pipeline Development

With my expertise in Airflow, dbt, and Azure Data Factory, I am capable of building robust, automated ETL & ELT pipelines from any data source.

Cloud Data Solutions

I am skilled in architecting and implementing scalable, modern data platforms on Azure, Snowflake, and Databricks.

Real-time Stream Processing

I have hands-on experience building high-throughput, real-time data pipelines using Kafka, Flink, and Spark Streaming.

Data Warehousing & Modeling

I can build and optimize scalable data warehouses on Snowflake and Synapse, implementing modern Medallion or Star Schema models.

DataOps & Automation

I can automate your infrastructure using Terraform (IaC) and build production-grade CI/CD pipelines with Docker and GitHub Actions.

Analytics & Visualization

I am skilled in connecting data to insights by building and optimizing interactive, real-time dashboards in Power BI.

Education

University of Nairobi Logo

University of Nairobi

Bachelor's Degree in Electrical & Electronics Engineering

University of Nairobi

My background in Electrical and Electronics Engineering gave me a strong foundation in system-level design. I was trained to build and analyze complex, interconnected systems, and I found a natural parallel between that and the challenge of architecting large-scale data infrastructure. This 'engineering-first' mindset is what propelled my specialization into data pipelines, where I now use tools like Spark, Flink, and Kafka to manage and process data at scale.

Skills

Cloud Data Platforms

Azure (ADF, ADLS, Synapse)
Databricks
Snowflake
Oracle Cloud (OCI)

Big Data Frameworks

Apache Spark
Apache Flink
Apache Kafka
Apache Airflow
dbt Core

Programming & Databases

Python
SQL
PostgreSQL
MongoDB
SQL Server

DevOps & Tools

Docker
Git & GitHub
GitHub Actions
Terraform

Analytics & Visualization

Power BI
Excel

Certifications

Projects

Architecture diagram for a Medallion data warehouse

Data Warehouse: Medallion Architecture

Built a production-ready SQL Server data warehouse using the Medallion architecture and T-SQL ETL. Features automated loading, transformation scripts, and data quality checks. The final output is a star-schema optimized for BI, serving as a reusable reference for SQL data pipelines.

Microsoft SQL Server Powershell Data Modelling ETL Git
View on GitHub
Architecture diagram for a real-time data pipeline with Kafka and Spark

Real-time Data Pipeline

A real-time streaming pipeline ingesting people's profile data from an API, orchestrated by Airflow. Kafka decouples data, which is backed up in PostgreSQL. Apache Spark processes the stream and stores enriched data in Cassandra. The entire system is containerized with Docker for easy deployment and scalability.

Apache Airflow Apache Kafka Apache Spark Apache Cassandra Docker PostgreSQL Python
View on GitHub
Financial transactions pipeline with Flink and Datadog

Financial Transactions Pipeline

Built a real-time financial analytics pipeline using Kafka, Flink SQL, and PostgreSQL. It ingests events from a Python producer, performs event-time aggregations in Apache Flink, and stores results in PostgreSQL Database. Datadog monitors the pipeline system health in real time. The entire system is containerized with Docker for easy deployment.

Python Apache Kafka Apache Flink PostgreSQL Datadog Docker
View on GitHub
Azure and Terraform infrastructure diagram

Azure Infrastructure Deployment Using Terraform

This project automates the deployment and management of Azure Kubernetes Service (AKS) infrastructure using Terraform, with CI/CD pipelines powered by GitHub Actions/ Azure DevOps. The setup includes a AKS, Azure Active Directory, Azure Resource Manager, Storage Accounts, and Azure Key Vault, ensuring a consistent and repeatable infrastructure deployment process.

Terraform Azure CLI GitHub Actions Git Azure Kubernetes Service Azure Key Vault Azure Active Directory
View on GitHub
Jumia ELT web scraping pipeline architecture

Jumia ELT Pipeline

This project implements a production-grade ELT data pipeline that scrapes laptop product data from Jumia Kenya using BeautifulSoup and Requests, then processes it through a medallion architecture using PostgreSQL stored procedures. Apache Airflow orchestrates scraping, loading, and transformation tasks. The entire pipeline is containerized with Docker for portability and scalability and integrated with GitHub Actions for CI/CD.

Python(Pandas, Requests, Beautiful Soup) PostgreSQL Docker
View on GitHub
dbt and Snowflake ELT pipeline architecture

MovieLens Data Pipeline - ADLS Gen2 + dbt + Snowflake

This project showcases a modern ELT pipeline. It extracts the raw MovieLens 20M dataset from Azure Data Lake Storage Gen2, loads it into Snowflake data warehouse, and then dbt connects to Snowflake to perform data modeling and transformations. This process creates a dimensional model, with auto-generated documentation, optimized for analytics and ML applications.

Data Build Tool(dbt) Snowflake Azure Data Lake Storage Gen2 Git Github Actions Github Pages
View on GitHub
Azure weather streaming pipeline architecture

Weather Streaming

This project demonstrates a comprehensive real-time weather data streaming pipeline using Azure cloud services. The system ingests weather data from external APIs, processes it through Azure Event Hubs, and visualizes real-time insights through Power BI dashboards. The project showcases cost optimization strategies by providing both Databricks and Azure Functions implementations.

Databricks Azure Functions Azure Event Hub Azure Stream Analytics Power BI Azure Key Vault Azure Cost Management
View on GitHub
Azure enterprise-scale medallion architecture

Azure Data Engineering Project: Enterprise-Scale Medallion Architecture

This comprehensive Azure data engineering solution demonstrates the implementation of a production-ready Medallion Architecture (Bronze → Silver → Gold) pattern. The project leverages Microsoft Azure's native cloud services to create a scalable, maintainable, and cost-effective data pipeline that processes AdventureWorks business data from ingestion through to business intelligence reporting.

Pyspark Azure Data Factory Azure Data Lake Storage Databricks Azure Synapse Analytics Power BI
View on GitHub
Azure Databricks and Unity Catalog architecture

Azure-Databricks Project with Unity Catalog

This project uses Azure Data Factory to ingest data from GitHub into ADLS Gen2. It follows a medallion architecture: Databricks Autoloader streams raw data to the Bronze layer, Delta Lake tables clean it for the Silver layer, and the Gold layer provides aggregated data for analytics.

Azure Data Factory Azure Data Lake Databricks Git Delta Lake Pyspark
View on GitHub

Contact

Let's connect and discuss how we can work together on your next data project!