Open to Full-Time ML/NLP Roles — Starting Early 2027

Karthik
Venugopal

Machine Learning
/ NLP Engineer

Los Angeles, CA

USC · MS Computer Science

GPA 3.77 / 4.00

MS CS student at USC building production ML systems — MLOps pipelines, LLM agents, and distributed infrastructure. Prior experience deploying ML models at Samsung Research and engineering scalable backend systems at Akamai Technologies.

01/Skills & Tools

0104
ML / AI
PyTorchTensorFlowKerasHugging FaceLangChainLangGraphOpenAI APIXGBoostscikit-learn
MLOps & Cloud
MLflowPrefectDockerKubernetesGCPAzureWeaviatePineconeONNX Runtime
Languages
PythonC++20JavaSQLTypeScriptJavaScript
Systems & Infra
gRPCProtobufFastAPIFlaskRedisApache Spark
Data & Libraries
pandasNumPyOpenCVMatplotlibAlibi-Detect

02/Selected Work

0204
01

Multimodal RAG Agent

PythonLangGraphNVIDIA NIMVision-Language Model (Nemotron)Llama-NemotronRAG
  • Built a multimodal agentic RAG system with LangGraph that routes retrieved figures through a vision-language model (Nemotron) and fuses them with text passages to answer questions a text-only pipeline cannot. Includes a faithfulness-gated self-correction loop that escalates to force_vision, query rewrite, or question decomposition before abstaining on unanswerable inputs.
  • Ran a vision-ablation benchmark scored by an LLM-as-judge: +28.6 points overall accuracy and +60 on figure-only questions versus a vision-off baseline. Self-correction loop triggered on 9 of 14 responses, recovering 8 of 9 ungrounded answers with 1 correct abstention. 26 tests, CI green.
02

Grounded RAG Pipeline with Faithfulness Evaluation

PythonCohere Embed v4Rerank v3.5CommandRAG
  • Built a grounded RAG pipeline using dense embeddings and cross-encoder reranking to retrieve and anchor LLM responses with inline citations, ensuring every factual claim in the answer maps back to a specific retrieved source passage.
  • Implemented a three-signal faithfulness evaluation layer — citation coverage, grounded-sentence rate, and LLM-as-judge agreement — to detect unsupported claims in generated answers and surface grounding gaps before output is returned.
03

LLM Hallucination Detection Pipeline

PythonRoBERTaHuggingFace TransformersNLIPyTorchFastAPI
  • Built a claim-level hallucination detection system by fine-tuning a RoBERTa NLI classifier for factual consistency scoring — decomposing LLM outputs into atomic claims and scoring each against source context independently, enabling span-level attribution of unsupported content.
  • Conducted systematic evaluation against prompted LLM baselines (GPT-3.5, LLaMA-2) across multiple decoding strategies; NLI classifier achieved F1 0.87, outperforming LLM judges on out-of-distribution factual claims while remaining model-agnostic and 10x cheaper to run. Benchmarked across LLaMA-2, Mistral, GPT-3.5, and Falcon — surfaced model-specific failure patterns with different models breaking down on distinct claim types.
04

Multi-Source Research Analyst Agent

PythonLangGraphWeaviateTavilyOpenAI APIDockerFastAPI
  • Built a stateful multi-agent system with LangGraph that coordinates web and academic search (Tavily) with Weaviate dense retrieval for complex multi-hop question answering, with an LLM-as-judge faithfulness loop scoring each answer's grounding against retrieved sources at every reasoning step.
  • Containerized the application with Docker and served it via a FastAPI endpoint supporting 100+ concurrent requests; per-step faithfulness scores surfaced reasoning failures and grounding gaps across the agent graph, improving answer accuracy by 30%.
05

Real-Time Content Moderation Pipeline

PythonKafka (Redpanda)FaustLlama 3.2sentence-transformersRedisDockerStreamlit
  • Built an end-to-end streaming ML pipeline ingesting 1,000-5,000 posts/minute from live data sources into Redpanda (Kafka-compatible), orchestrating per-event embedding, LLM classification, and online topic clustering via a Faust async agent.
  • Deployed a 5-class content safety classifier (safe, spam, hate, NSFW, violence) using Llama 3.2 3B via Ollama with fail-open fallback under peak load. Designed separate classifiers per harm dimension with per-class confidence calibration; persisted trend metrics in Redis TimeSeries and surfaced live moderation rates on a Streamlit dashboard.
06

MLOps Fraud Detection Pipeline

PythonXGBoostMLflowPrefectAlibi-DetectONNX RuntimeFastAPIDocker
  • Built an end-to-end model lifecycle pipeline with drift detection (Alibi-Detect), automated retraining (Prefect), and experiment tracking (MLflow); ONNX export for cross-platform inference achieving 94% F1 and sub-10ms inference latency.
  • Reduced model inference latency by 35% (150ms to 97ms) by serving XGBoost with ONNX Runtime and implementing request batching in FastAPI, scaling to 500+ requests/minute under stress testing.
07

RaftScope: Distributed Raft Consensus

C++20gRPCProtocol BuffersMulti-threadingD3.js
  • Implemented the Raft consensus protocol from scratch in C++20 with a gRPC and Protocol Buffers RPC layer covering RequestVote and AppendEntries, running a multi-node cluster with leader election, log replication, and thread-safe state-machine transitions using mutexes. Built with two collaborators for CSCI 546 (Distributed Systems) at USC.
  • Instrumented the cluster with Lamport logical clocks and built a D3.js browser-based space-time visualizer to trace message ordering and leader changes. All 6 integration tests pass, covering network partitions and node failures.
08

NewsInterview LangGraph Agent

PythonLangGraphQwen-2.5-7B
  • Built a training-free interviewer agent as an alternative to RL fine-tuning (Huang et al., EMNLP 2025), using the same Qwen-2.5-7B base model to conduct follow-up question generation and information elicitation in a news interview setting without any fine-tuning.
  • Verified on n=20 cases: 78.8% acknowledgement rate versus 53.8% (CoT baseline) and 35.0% (prompt-only baseline), with information-item recall effectively flat across conditions — demonstrating that structured prompting matches fine-tuned behavior on acknowledgement without the training cost.

03/Experience

0304

Sep 2024 — Dec 2024

Bangalore, India

Akamai Technologies

Software Engineer II — App Architecture & Integration
  • Deployed a Dockerized Apache Spark platform for 20+ ETL pipelines, reducing job runtimes by 60% (5 hrs → 2 hrs) and accelerating data insights for cross-functional teams.

Aug 2022 — Aug 2024

Bangalore, India

Akamai Technologies

Software Engineer — Logistics & External Tools
  • Improved data retrieval speed by 50% (10s → 5s) for datacenter technician portals through efficient Redis caching implementation.
  • Built REST APIs with FastAPI using hash tables and caching, cutting query latency from 250ms to 90ms and supporting a 3× increase in concurrent requests.

Jan 2022 — Jun 2022

Bangalore, India

Akamai Technologies

Software Engineer Intern — Logistics & External Tools
  • Implemented and optimized Flask REST APIs handling 1M+ records with real-time access, improving portal responsiveness for field technicians.
  • Refactored portal into modular Ant Design UI components, standardizing accessibility and achieving a 20% reduction in codebase size.

May 2021 — Jul 2021

Bangalore, India

Akamai Technologies

Software Engineer Intern — Logistics & External Tools
  • Built a real-time Shipping Details Dashboard using React, Redux, and Ant Design to surface live shipment data for logistics teams, reducing shipping-related support inquiries by 30%.
  • Re-architected AutoShipNotify’s frontend with Angular and Angular Material and refactored the backend with Perl and Flask, cutting UI-related bug reports by 35% and reducing feature turnaround time by half.

Oct 2020 — Mar 2021

Bangalore, India

Samsung Research

Machine Learning Research Intern
  • Architected and deployed a prompt-classification service for Samsung Bixby to automatically route in-domain vs. out-of-domain inputs, achieving 96% test accuracy and enabling reliable production integration.
  • Researched and benchmarked NLP transformer models (sBERT, RoBERTa) to improve Bixby’s intent classification pipeline, achieving a 15% F1-score improvement over baseline approaches.

04/Education

0404
USCJan 2025 — Dec 2026

University of Southern California

MS in Computer Science · Los Angeles, CA

3.77 / 4.00GPA

Coursework

Machine LearningNatural Language ProcessingDistributed SystemsAnalysis of AlgorithmsInformation Retrieval & Web Search
BMSCEAug 2018 — Jul 2022

BMS College of Engineering

BE in Computer Science · Bangalore, India

9.12 / 10.00GPA

Coursework

Linear AlgebraStatisticsArtificial IntelligenceBig Data AnalyticsCloud ComputingDatabase Management Systems