Data Software Engineer - Python, SQL, ETL/ELT, RAG
Data products that turn messy signals into measurable growth.
I build Python and SQL data platforms that connect pipelines, warehouses, embeddings, retrieval APIs, and customer-facing analytics into reliable products for marketing measurement, attribution, and AI-assisted decisions.
Marketing Intelligence Data Platform
Useful analytics starts where raw events become governed datasets, decision models, retrieval systems, and product APIs people can trust every day.
Ingest with context
Batch and streaming pipelines bring media, customer, revenue, and product events into cloud data stores with source contracts and replay paths.
Model for decisions
SQL and NoSQL models, warehouses, data quality checks, and lineage make attribution, engagement, and performance reporting more dependable.
Add intelligent retrieval
Embeddings, pgvector, prompt templates, LangChain, OpenAI APIs, and RAG workflows turn data platforms into fast, grounded assistants.
Serve product teams
FastAPI, Node.js REST APIs, CI/CD, documentation, Agile delivery, and stakeholder translation move clean data into shipped software.
Architecture Diagram
A marketing analytics data platform from source signals to ingestion, transformation, warehouses, vector search, APIs, and measurable outcomes.
Technical Stack
Data engineering, cloud platforms, warehouses, AI retrieval, backend APIs, and production delivery.
Professional Experience
Full work history with every impact bullet from the resume.
- Built production AI agent platform in Python/FastAPI orchestrating LLM calls, prompt templates, and RAG workflows - integrating OpenAI APIs and Greenhouse ATS for analytics-driven recommendation, boosting match accuracy 35%.
- Designed ETL/ELT pipelines and embedding-based retrieval on AWS (S3, Lambda, RDS) with vector database storage - processing large datasets of candidate and job records for AI and analytics use cases.
- Built lightweight internal LLM tools - scripted helpers, prompt templates, and RAG agents - leveraging AI-assisted development (GitHub Copilot, Claude, Cursor) to accelerate delivery 40% while maintaining code quality and security.
- Collaborated with stakeholders to translate business goals into technical requirements, contributing to documentation, evaluation plans, and knowledge sharing across product, ML, and design teams.
- Built data-driven web applications on Node.js/TypeScript and AWS managed services (Lambda, S3, RDS) - processing 500K+ monthly requests with REST APIs for data retrieval and customer analytics workflows.
- Designed SQL and NoSQL data models (PostgreSQL, DynamoDB) supporting product analytics, marketing-style attribution, and engagement reporting for cross-functional product teams.
- Integrated data models within full-stack applications via REST APIs and contributed to data/table architecture decisions, balancing performance and developer velocity in Agile iterations.
- Implemented CI/CD pipelines (GitHub Actions, Jest, Docker), code review, and technical documentation upholding SDLC and DevOps best practices.
- Built and maintained ETL/ELT pipelines and Java microservices ingesting and transforming large enterprise datasets across PostgreSQL, Oracle, Hive, and Hadoop - supporting analytics, reporting, and downstream ML workloads.
- Architected cloud-native data services on AWS with optimized SQL queries, indexing, and partitioning across data warehouses - cutting manual processing 60% and lifting API response times 40%.
- Built REST APIs for data retrieval and integrated data models within Java applications for 40K+ enterprise users, translating business requirements into technical solutions in Agile SDLC.
- Established Jenkins CI/CD with JUnit and PyTest testing, code review, and technical documentation - sharing knowledge and providing mentorship across a 12-engineer cross-functional team.
Selected Projects
Data pipelines, RAG applications, financial analytics, published ML research, systems programming, and product builds with source and demo links.
LangChain Chat with Search
Lightweight internal-style LLM workflow orchestrating OpenAI/Groq calls, prompt templates, web retrieval, and Wikipedia tools for scripted agent patterns.
AskMyStore
Production agentic RAG application with embedding retrieval, vector storage, serverless AWS backend, and data retrieval APIs.
Real-time Stock Market Analysis Pipeline
Distributed event-driven ETL pipeline ingesting real-time market data with Kafka and SQL analytics using scalable streaming patterns.
TradeSentinel
Full-stack TypeScript platform with AI-powered market analysis, serverless REST APIs, and interactive financial decision support.
CME MDP 3.0 Multicast Feed Handler
Multi-threaded market data feed handler demonstrating high-throughput message parsing, reliability, and low-latency systems discipline.
Comparative Study of Movie Recommendation Systems
Published IEEE research comparing collaborative, content-based, and hybrid recommendation models with feature engineering and improved error functions.
Experimental Operating System
Custom OS project with scheduling, memory management, and device driver concepts that strengthen runtime and data movement judgment.
Character Device Driver
Linux character device driver project showing kernel-facing C development, interface design, debugging, and careful API contracts.
16-Bit RISC Processor
Hardware lab project implementing processor fundamentals and reinforcing performance, correctness, and data-path discipline.
STRANGERS
Interactive application project focused on maintainable product delivery, backend integration, and end-to-end implementation discipline.
Pitch Perfect
iOS application demonstrating product execution, media handling, and mobile app fundamentals.
Education & Publication
Computer science foundations backed by production data platforms, applied AI, and published recommendation-systems research.
Illinois Institute of Technology
M.S., Computer Science
National Institute of Technology, Calicut
B.Tech, Computer Science
IEEE Research
Comparative Study of Movie Recommendation Systems using Feature Engineering.
Read PaperContact
Data pipelines, marketing intelligence, AI retrieval, and production software with a bias toward measurable outcomes.
Dev Kumar | Chicago, IL | +1-312-532-8223 | devkumar.dklv@gmail.com