Igor Kołodziej - Software Engineer, Data & ML Systems

Projects

Payment Event Processing Pipeline — Scala Streaming Backend

Scala 3Cats EffectFS2RedpandaPostgreSQLMongoDB

Scala 3 backend for deterministic JSONL replay, Redpanda-backed stream ingestion, PostgreSQL enrichment, explainable risk decisions, and idempotent MongoDB persistence.

Built a backend system with JSONL, paced replay, and Redpanda input modes behind one source abstraction.
Implemented parsing, validation, normalization, customer enrichment, eligibility checks, and deterministic risk scoring.
Used PostgreSQL for customer profiles and MongoDB for processed transactions, eligibility violations, alerts, and risk history.
Structured the code around hexagonal architecture, narrow ports, functional streaming, MUnit tests, Docker Compose, and GitHub Actions CI.

GitHub

Aegis AI — GCP SRE/ChatOps Platform

Google CloudTerraformCloud RunGKEPub/SubBigQueryFirestoreSlackGemini

GCP prototype for cross-project incident detection, Slack alerting, metric-backed follow-up answers, and auditable incident storage.

Built a cross-project SRE/ChatOps workflow connecting GKE workload logs, Cloud Logging sinks, Pub/Sub, Cloud Run services, and Slack.
Used Firestore for incident sessions and BigQuery for incident lifecycle events, reporting views, and SLO evidence.
Integrated Cloud Monitoring and Gemini so engineers could ask Slack follow-up questions with real incident context and metrics.
Provisioned Hub and Client infrastructure with separate Terraform stacks, least-privilege IAM, Secret Manager, and documented demo runbooks.

GitHub

NMAR — R package for estimation under nonignorable nonresponse

CRANRCI/testsDocsSimulation studies

CRAN R package: unified nmar() API and method comparisons in simulation studies.

Implemented estimators from the literature behind a unified nmar() API.
Built reproducible simulation studies for method comparison and validation.
Packaged for CRAN with documentation, vignettes, CI, and tests.

CRAN Docs GitHub

Mamut — AutoML toolkit for tabular classification

PythonPyPIscikit-learnOptunaEnsemblesReports

AutoML workflow for tabular classification: preprocessing, hyperparameter optimization, model comparison, ensemble search, and generated reports.

Built preprocessing pipelines for imputation, scaling, encoding, skew correction, outliers, and optional feature reduction.
Supported model search across common classifiers with Bayesian or grid search.
Added dynamic ensemble search with hard/soft voting, HTML reports, notebook plots, and optional SHAP.

PyPI Docs GitHub

Other projects

              Real-Time Finance Pipeline — Dockerized big-data stack with NiFi/Kafka, HDFS, Spark, Hive, and HBase
              GitHub
            
              QuantumRAG — RAG benchmarking prototype with FAISS, Qiskit, and SQuAD evaluation
              GitHub
            
              DermNet — DINOv2 embeddings for clustering
              GitHub
            
              DoomRL — PPO/A2C agents for ViZDoom
              GitHub

Leadership

President, Data Science Club (WUT)

2024–2025

Organized talks/workshops; hosted guests from Google, ING, Allegro.
Worked with a student team on outreach and events.

Co-organizer, ensembleAI hackathon

2024-2026

Sponsors, logistics, venue coordination, on-site operations.

Capitalize (student venture, Enactus WUT)

Demo app shipped to Google Play (testing track)

Backend features/APIs (FastAPI).
Python scripts for basic telemetry analysis from Amplitude exports.

Awards

2nd place - Enactus Poland National Competition (Capitalize), 2023
Finalist - Consult IT business/technology hackathon (SGH Warsaw School of Economics), 2023
Laureate - AGH “Diamond Index” Olympiad in Physics, 2022
Finalist - National Technical Knowledge Olympiad (OWT), 2022

Skills

Backend / Systems

Scala, Java, Python, R
Cats Effect, FS2, FastAPI, Spring Boot
Git, Linux, CI/testing, Docker

Data / Streaming

SQL, Spark
Kafka/Redpanda, NiFi
Hive, HDFS, HBase

ML / Evaluation

PyTorch, scikit-learn, Transformers
Optuna, NumPy, Pandas
Model evaluation, reporting, experiment workflows

Languages

Polish - native
English - C2 (CAE Grade A)
German - basic

Contact

Open to software engineering roles in data-intensive backend, data engineering, and machine learning systems.

Email CV (PDF) GitHub LinkedIn