Karthik Venugopal — ML Engineer

01/Skills & Tools

01 — 04

ML / AI

PyTorchTensorFlowKerasHugging FaceLangChainLangGraphOpenAI APIXGBoostscikit-learn

MLOps & Cloud

MLflowPrefectDockerKubernetesGCPAzureWeaviatePineconeONNX Runtime

Languages

PythonC++20JavaSQLTypeScriptJavaScript

Systems & Infra

gRPCProtobufFastAPIFlaskRedisApache Spark

Data & Libraries

pandasNumPyOpenCVMatplotlibAlibi-Detect

02/Selected Work

02 — 04

Multimodal RAG Agent

PythonLangGraphNVIDIA NIMVision-Language Model (Nemotron)Llama-NemotronRAG

—Built a multimodal agentic RAG system with LangGraph that routes retrieved figures through a vision-language model (Nemotron) and fuses them with text passages to answer questions a text-only pipeline cannot. Includes a faithfulness-gated self-correction loop that escalates to force_vision, query rewrite, or question decomposition before abstaining on unanswerable inputs.
—Ran a vision-ablation benchmark scored by an LLM-as-judge: +28.6 points overall accuracy and +60 on figure-only questions versus a vision-off baseline. Self-correction loop triggered on 9 of 14 responses, recovering 8 of 9 ungrounded answers with 1 correct abstention. 26 tests, CI green.

GitHub ↗

Grounded RAG Pipeline with Faithfulness Evaluation

PythonCohere Embed v4Rerank v3.5CommandRAG

—Built a grounded RAG pipeline using dense embeddings and cross-encoder reranking to retrieve and anchor LLM responses with inline citations, ensuring every factual claim in the answer maps back to a specific retrieved source passage.
—Implemented a three-signal faithfulness evaluation layer — citation coverage, grounded-sentence rate, and LLM-as-judge agreement — to detect unsupported claims in generated answers and surface grounding gaps before output is returned.

GitHub ↗

LLM Hallucination Detection Pipeline

PythonRoBERTaHuggingFace TransformersNLIPyTorchFastAPI

—Built a claim-level hallucination detection system by fine-tuning a RoBERTa NLI classifier for factual consistency scoring — decomposing LLM outputs into atomic claims and scoring each against source context independently, enabling span-level attribution of unsupported content.
—Conducted systematic evaluation against prompted LLM baselines (GPT-3.5, LLaMA-2) across multiple decoding strategies; NLI classifier achieved F1 0.87, outperforming LLM judges on out-of-distribution factual claims while remaining model-agnostic and 10x cheaper to run. Benchmarked across LLaMA-2, Mistral, GPT-3.5, and Falcon — surfaced model-specific failure patterns with different models breaking down on distinct claim types.

GitHub ↗

Multi-Source Research Analyst Agent

PythonLangGraphWeaviateTavilyOpenAI APIDockerFastAPI

—Built a stateful multi-agent system with LangGraph that coordinates web and academic search (Tavily) with Weaviate dense retrieval for complex multi-hop question answering, with an LLM-as-judge faithfulness loop scoring each answer's grounding against retrieved sources at every reasoning step.
—Containerized the application with Docker and served it via a FastAPI endpoint supporting 100+ concurrent requests; per-step faithfulness scores surfaced reasoning failures and grounding gaps across the agent graph, improving answer accuracy by 30%.

GitHub ↗

Real-Time Content Moderation Pipeline

PythonKafka (Redpanda)FaustLlama 3.2sentence-transformersRedisDockerStreamlit

—Built an end-to-end streaming ML pipeline ingesting 1,000-5,000 posts/minute from live data sources into Redpanda (Kafka-compatible), orchestrating per-event embedding, LLM classification, and online topic clustering via a Faust async agent.
—Deployed a 5-class content safety classifier (safe, spam, hate, NSFW, violence) using Llama 3.2 3B via Ollama with fail-open fallback under peak load. Designed separate classifiers per harm dimension with per-class confidence calibration; persisted trend metrics in Redis TimeSeries and surfaced live moderation rates on a Streamlit dashboard.

GitHub ↗

MLOps Fraud Detection Pipeline

PythonXGBoostMLflowPrefectAlibi-DetectONNX RuntimeFastAPIDocker

—Built an end-to-end model lifecycle pipeline with drift detection (Alibi-Detect), automated retraining (Prefect), and experiment tracking (MLflow); ONNX export for cross-platform inference achieving 94% F1 and sub-10ms inference latency.
—Reduced model inference latency by 35% (150ms to 97ms) by serving XGBoost with ONNX Runtime and implementing request batching in FastAPI, scaling to 500+ requests/minute under stress testing.

GitHub ↗

RaftScope: Distributed Raft Consensus

C++20gRPCProtocol BuffersMulti-threadingD3.js

—Implemented the Raft consensus protocol from scratch in C++20 with a gRPC and Protocol Buffers RPC layer covering RequestVote and AppendEntries, running a multi-node cluster with leader election, log replication, and thread-safe state-machine transitions using mutexes. Built with two collaborators for CSCI 546 (Distributed Systems) at USC.
—Instrumented the cluster with Lamport logical clocks and built a D3.js browser-based space-time visualizer to trace message ordering and leader changes. All 6 integration tests pass, covering network partitions and node failures.

NewsInterview LangGraph Agent

PythonLangGraphQwen-2.5-7B

—Built a training-free interviewer agent as an alternative to RL fine-tuning (Huang et al., EMNLP 2025), using the same Qwen-2.5-7B base model to conduct follow-up question generation and information elicitation in a news interview setting without any fine-tuning.
—Verified on n=20 cases: 78.8% acknowledgement rate versus 53.8% (CoT baseline) and 35.0% (prompt-only baseline), with information-item recall effectively flat across conditions — demonstrating that structured prompting matches fine-tuned behavior on acknowledgement without the training cost.

GitHub ↗

03/Experience

03 — 04

Sep 2024 — Dec 2024

Bangalore, India

Akamai Technologies

Software Engineer II — App Architecture & Integration

—Deployed a Dockerized Apache Spark platform for 20+ ETL pipelines, reducing job runtimes by 60% (5 hrs → 2 hrs) and accelerating data insights for cross-functional teams.

Aug 2022 — Aug 2024

Bangalore, India

Akamai Technologies

Software Engineer — Logistics & External Tools

—Improved data retrieval speed by 50% (10s → 5s) for datacenter technician portals through efficient Redis caching implementation.
—Built REST APIs with FastAPI using hash tables and caching, cutting query latency from 250ms to 90ms and supporting a 3× increase in concurrent requests.

Jan 2022 — Jun 2022

Bangalore, India

Akamai Technologies

Software Engineer Intern — Logistics & External Tools

—Implemented and optimized Flask REST APIs handling 1M+ records with real-time access, improving portal responsiveness for field technicians.
—Refactored portal into modular Ant Design UI components, standardizing accessibility and achieving a 20% reduction in codebase size.

May 2021 — Jul 2021

Bangalore, India

Akamai Technologies

Software Engineer Intern — Logistics & External Tools

—Built a real-time Shipping Details Dashboard using React, Redux, and Ant Design to surface live shipment data for logistics teams, reducing shipping-related support inquiries by 30%.
—Re-architected AutoShipNotify’s frontend with Angular and Angular Material and refactored the backend with Perl and Flask, cutting UI-related bug reports by 35% and reducing feature turnaround time by half.

Oct 2020 — Mar 2021

Bangalore, India

Samsung Research

Machine Learning Research Intern

—Architected and deployed a prompt-classification service for Samsung Bixby to automatically route in-domain vs. out-of-domain inputs, achieving 96% test accuracy and enabling reliable production integration.
—Researched and benchmarked NLP transformer models (sBERT, RoBERTa) to improve Bixby’s intent classification pipeline, achieving a 15% F1-score improvement over baseline approaches.

04/Education

04 — 04

USCJan 2025 — Dec 2026

University of Southern California

MS in Computer Science · Los Angeles, CA

3.77 / 4.00GPA

Coursework

Machine LearningNatural Language ProcessingDistributed SystemsAnalysis of AlgorithmsInformation Retrieval & Web Search

BMSCEAug 2018 — Jul 2022

BMS College of Engineering

BE in Computer Science · Bangalore, India

9.12 / 10.00GPA

Coursework

Linear AlgebraStatisticsArtificial IntelligenceBig Data AnalyticsCloud ComputingDatabase Management Systems

KarthikVenugopal

Multimodal RAG Agent

Grounded RAG Pipeline with Faithfulness Evaluation

LLM Hallucination Detection Pipeline

Multi-Source Research Analyst Agent

Real-Time Content Moderation Pipeline

MLOps Fraud Detection Pipeline

RaftScope: Distributed Raft Consensus

NewsInterview LangGraph Agent

Akamai Technologies

Akamai Technologies

Akamai Technologies

Akamai Technologies

Samsung Research

University of Southern California

BMS College of Engineering

Karthik
Venugopal