I'm Dev Kumar — a C++ software engineer with 7+ years architecting high-performance, low-latency systems on Linux. I obsess over cache lines, lock-free queues, packet paths and flamegraphs. My work lives where software, hardware and markets collide — DPDK, Solarflare Onload, FPGA offload, modern C++20.
Low-latency isn't a niche. It's a way of thinking — where every cache miss, every syscall, every branch mispredict is a design decision you have to own.
I didn't stumble into low-latency engineering. I'm drawn to it because it's the one place where software, hardware and mathematics fuse into a single discipline.
Every line of C++ I write is a negotiation with a CPU cache, a TLB, a branch predictor, and a NIC. Kernel syscalls are not free. Virtual calls are not free. Allocation on the hot path is definitely not free. I build as if each instruction has a receipt.
There's nothing like watching a flamegraph hotspot disappear after an hour with perf,
or seeing p99 collapse 40% after a ring-buffer realignment. Trading is the ultimate
proving ground — your correctness and your speed both translate directly into basis points.
I want to build at that edge: market-data handlers that unpack CME MDP 3.0 at line rate, strategy engines whose tail latency doesn't drift, FPGA parsers that shave a handful of microseconds off a critical path — and C++ code that my future self is proud to debug at 3 a.m. during a live incident.
Every microsecond on the critical path has a home. Below is the data-flow I obsess over: kernel-bypass ingress, lock-free parsing, strategy hand-off, and FPGA-offloaded egress.
Two views of the same system. Left: traditional kernel stack. Right: the kernel-bypass re-architecture I deployed in production. Same hardware, fundamentally different budget.
Seven years of building at the intersection of modern C++, Linux internals, networking hardware and FPGA fabric.
ef_vi, OnloadEach role deepened the same obsession — from kernel modules on factory floors to kernel-bypass stacks powering AI services.
Architected a multi-threaded C++20 backend on Linux — concepts, ranges and coroutines
alongside lock-free queues, std::atomic primitives and custom thread pools — sustaining
sub-millisecond p99 across 500K+ daily requests. Re-engineered hot paths with move semantics,
cache-line alignment and custom allocators. Deployed DPDK user-space stack with FPGA-offloaded
packet parsing over PCIe DMA, cutting round-trip latency from 12 μs to 2.4 μs.
Built high-performance C++ backend components for high-traffic endpoints on Linux,
sustaining 10K+ concurrent connections with deterministic response times. Modernised
the legacy codebase to C++20 — concepts, ranges, coroutines — and bound sockets to
Solarflare ef_vi for user-space packet processing, cutting wire-to-application
latency by 70%.
Wrote low-level C++ device drivers and kernel-space modules for real-time industrial monitoring, achieving deterministic 5 ms response times across 50+ concurrent processes. Offloaded CRC validation and filtering to FPGA over PCIe/DMA for 8× throughput. Architected scalable multi-tiered networking with TCP/UDP sockets, epoll event loops and custom binary protocols — 99.99% uptime across distributed systems. Also built financial-analytics systems processing CME feeds with futures/options pricing and hedging logic.
Where I reach beyond my day job — building HFT-grade infrastructure in the open, with measurable performance I can defend line by line.
A production-grade CME MDP 3.0 market-data feed handler in C++17 on Linux. Decodes SBE-encoded messages from UDP multicast groups, rebuilds the order book and hands ticks off via a lock-free SPSC ring. Benchmarked at 500K+ packets/sec with sub-microsecond parse latency and zero-copy hot path.
Real-time trade surveillance platform: low-latency C++ market-data ingestion, multi-threaded order-flow analysis, and anomaly detection for aberrant price action. Designed as a trading-grade monitoring layer — the kind of tooling a quant desk would actually run alongside their strategies.
Published IEEE paper exploring feature-engineering techniques and loss-function design for
large-scale recommender systems. DOI: 10.1109/INCOFT55651.2022.10094480.
Hiring for a C++ engineer to sit on the hot path? I'm actively interviewing and would love to hear what you're building.