Data Software Engineer - Python, SQL, ETL/ELT, RAG

Data products that turn messy signals into measurable growth.

I build Python and SQL data platforms that connect pipelines, warehouses, embeddings, retrieval APIs, and customer-facing analytics into reliable products for marketing measurement, attribution, and AI-assisted decisions.

Building

View Projects GitHub LeetCode

0 % recommendation accuracy lift

0 K+ monthly requests processed

0 % manual processing reduction

0 % API response improvement

Marketing Intelligence Data Platform

Useful analytics starts where raw events become governed datasets, decision models, retrieval systems, and product APIs people can trust every day.

Ingest with context

Batch and streaming pipelines bring media, customer, revenue, and product events into cloud data stores with source contracts and replay paths.

Model for decisions

SQL and NoSQL models, warehouses, data quality checks, and lineage make attribution, engagement, and performance reporting more dependable.

Add intelligent retrieval

Embeddings, pgvector, prompt templates, LangChain, OpenAI APIs, and RAG workflows turn data platforms into fast, grounded assistants.

Serve product teams

FastAPI, Node.js REST APIs, CI/CD, documentation, Agile delivery, and stakeholder translation move clean data into shipped software.

Architecture Diagram

A marketing analytics data platform from source signals to ingestion, transformation, warehouses, vector search, APIs, and measurable outcomes.

Technical Stack

Data engineering, cloud platforms, warehouses, AI retrieval, backend APIs, and production delivery.

Python SQL bash JavaScript TypeScript Java ETL/ELT Kafka batch pipelines streaming pipelines Airflow-style orchestration data quality lineage OpenAI APIs LangChain HuggingFace RAG embeddings pgvector AWS Lambda S3 EC2 Glue RDS GCP BigQuery Snowflake Redshift PostgreSQL MySQL MongoDB Redis Oracle PL/SQL FastAPI Node.js REST APIs microservices GitHub Actions Docker Jenkins PyTest Jest

Professional Experience

Full work history with every impact bullet from the resume.

Mar 2025 - Present Sprouts AI Data & Software Engineer (AI Workflows) | Chicago, IL

Built production AI agent platform in Python/FastAPI orchestrating LLM calls, prompt templates, and RAG workflows - integrating OpenAI APIs and Greenhouse ATS for analytics-driven recommendation, boosting match accuracy 35%.
Designed ETL/ELT pipelines and embedding-based retrieval on AWS (S3, Lambda, RDS) with vector database storage - processing large datasets of candidate and job records for AI and analytics use cases.
Built lightweight internal LLM tools - scripted helpers, prompt templates, and RAG agents - leveraging AI-assisted development (GitHub Copilot, Claude, Cursor) to accelerate delivery 40% while maintaining code quality and security.
Collaborated with stakeholders to translate business goals into technical requirements, contributing to documentation, evaluation plans, and knowledge sharing across product, ML, and design teams.

Aug 2023 - Mar 2025 Resilience Inc Data & Software Engineer | Chicago, IL

Built data-driven web applications on Node.js/TypeScript and AWS managed services (Lambda, S3, RDS) - processing 500K+ monthly requests with REST APIs for data retrieval and customer analytics workflows.
Designed SQL and NoSQL data models (PostgreSQL, DynamoDB) supporting product analytics, marketing-style attribution, and engagement reporting for cross-functional product teams.
Integrated data models within full-stack applications via REST APIs and contributed to data/table architecture decisions, balancing performance and developer velocity in Agile iterations.
Implemented CI/CD pipelines (GitHub Actions, Jest, Docker), code review, and technical documentation upholding SDLC and DevOps best practices.

Apr 2018 - Aug 2023 Tata Steel Data & Software Engineer (Pipelines + Apps)

Built and maintained ETL/ELT pipelines and Java microservices ingesting and transforming large enterprise datasets across PostgreSQL, Oracle, Hive, and Hadoop - supporting analytics, reporting, and downstream ML workloads.
Architected cloud-native data services on AWS with optimized SQL queries, indexing, and partitioning across data warehouses - cutting manual processing 60% and lifting API response times 40%.
Built REST APIs for data retrieval and integrated data models within Java applications for 40K+ enterprise users, translating business requirements into technical solutions in Agile SDLC.
Established Jenkins CI/CD with JUnit and PyTest testing, code review, and technical documentation - sharing knowledge and providing mentorship across a 12-engineer cross-functional team.

Selected Projects

Data pipelines, RAG applications, financial analytics, published ML research, systems programming, and product builds with source and demo links.

LLM Agent Python

LangChain Chat with Search

Lightweight internal-style LLM workflow orchestrating OpenAI/Groq calls, prompt templates, web retrieval, and Wikipedia tools for scripted agent patterns.

Source Code

Agentic RAG Node.js + AWS

AskMyStore

Production agentic RAG application with embedding retrieval, vector storage, serverless AWS backend, and data retrieval APIs.

Demo Source Code

Market Data Java + Kafka

Real-time Stock Market Analysis Pipeline

Distributed event-driven ETL pipeline ingesting real-time market data with Kafka and SQL analytics using scalable streaming patterns.

Source Code

Financial Insights Full Stack

TradeSentinel

Full-stack TypeScript platform with AI-powered market analysis, serverless REST APIs, and interactive financial decision support.

Demo Source Code

Feed Handler C++

CME MDP 3.0 Multicast Feed Handler

Multi-threaded market data feed handler demonstrating high-throughput message parsing, reliability, and low-latency systems discipline.

Source Code

IEEE Publication ML Research

Comparative Study of Movie Recommendation Systems

Published IEEE research comparing collaborative, content-based, and hybrid recommendation models with feature engineering and improved error functions.

Paper Source Code Certificate

Operating Systems C

Experimental Operating System

Custom OS project with scheduling, memory management, and device driver concepts that strengthen runtime and data movement judgment.

Source Code

Linux Kernel C

Character Device Driver

Linux character device driver project showing kernel-facing C development, interface design, debugging, and careful API contracts.

Source Code

Computer Architecture Hardware

16-Bit RISC Processor

Hardware lab project implementing processor fundamentals and reinforcing performance, correctness, and data-path discipline.

Source Code

Application Build Web

STRANGERS

Interactive application project focused on maintainable product delivery, backend integration, and end-to-end implementation discipline.

Source Code

iOS App Swift

Pitch Perfect

iOS application demonstrating product execution, media handling, and mobile app fundamentals.

Source Code

Education & Publication

Computer science foundations backed by production data platforms, applied AI, and published recommendation-systems research.

Graduate Study

Illinois Institute of Technology

M.S., Computer Science

Undergraduate Study

National Institute of Technology, Calicut

B.Tech, Computer Science

Publication

IEEE Research

Comparative Study of Movie Recommendation Systems using Feature Engineering.

Read Paper

Contact

Data pipelines, marketing intelligence, AI retrieval, and production software with a bias toward measurable outcomes.

Dev Kumar | Chicago, IL | +1-312-532-8223 | devkumar.dklv@gmail.com

Email GitHub LeetCode