SRE & Production Engineering
Incident response, on-call support, root-cause analysis, SLO/SLI design, runbooks, chaos and load testing, production support during market hours.
Systems / Production Engineer
I support high-performance platforms across Linux, Kubernetes, AWS, Kafka, and observability stacks. My work spans automation, incident response, production engineering, and performance tuning for data-intensive systems.
Site reliability, cloud operations, observability, and low-latency distributed systems.
MTTR reduction through automation
On-call paging reduction
Events/day in data pipelines
Users supported in production
About
I am a systems and production engineer with experience supporting high-performance, real-time distributed systems on Linux, Kubernetes, and AWS. I build operational tooling in Python and Bash, investigate complex incidents, and work closely with engineers, traders, and research teams to improve platform reliability, speed, and resilience.
My background combines software engineering, infrastructure, automation, observability, and production support. I enjoy turning noisy operational problems into repeatable systems, robust runbooks, and measurable reliability gains.
Skills
Incident response, on-call support, root-cause analysis, SLO/SLI design, runbooks, chaos and load testing, production support during market hours.
Python, Bash, Go (familiar), Ruby, operational tooling, workflow automation, Ansible, Terraform, deployment and recovery scripting.
Linux systems administration, Kubernetes debugging, Helm, operators, Docker, systemd, networking, DNS, firewalls, and performance tuning.
Prometheus, Grafana, Splunk, OpenTelemetry, ELK, Apache Kafka, alerting, dashboards, and log aggregation.
AWS, Azure, GCP, PostgreSQL, Oracle, MySQL, MongoDB, Redis, CI/CD with Jenkins and GitHub Actions.
Market data feed handlers, CME MDP 3.0, FIX protocol, tick and bar data, low-latency trading workflows, market microstructure.
Experience
Chicago, IL
Chicago, IL
Remote
Projects
C++20 · Low latency · Trading infra
A low-latency feed handler that decodes live CME tick data using lock-free queues, per-instrument threading, and seqlock snapshots to support real-time trading-system workflows.
View on GitHubObservability · Kubernetes · Kafka · Python
An observability platform for trading systems showing order flow, latency distributions, pod health, Kafka lag, and Prometheus alerts, with auto-remediation utilities for common production failures.
View on GitHubPublication
IEEE INCOFT 2021 · DOI: 10.1109/INCOFT55651.2022.10094480
Education
Illinois Institute of Technology
National Institute of Technology, Calicut
Contact
I’m open to software engineering, systems, SRE, production engineering, and trading infrastructure opportunities.