Saravanan Kannan — Resume · Agentic LLM Orchestration · Quantitative Risk

Executive Summary

Senior engineering leader with 20+ years driving quantified business transformation at the intersection of institutional trading infrastructure and frontier AI systems. I build autonomous Agentic Risk Firewalls that operate at sub-millisecond latency, rebuild legacy Java OMS / margin stacks as Python-native agentic pipelines without breaking a single NaN-arithmetic edge case, and own the GPU infrastructure underneath — a dual-node NVIDIA Grace-Blackwell GB10 cluster running 235B-parameter LLMs in production over 200 Gbps RDMA/RoCE.

Currently leading the AI-native modernization of Organization's Risk Monitor (P0) while independently architecting UnifiedQuant (broker-unified trading platform), UnifiedRBM (multi-asset Risk-Based Margin engine), Mercurius (sub-millisecond Agentic Risk Firewall), and LocalInfra (Blackwell inference cluster). Fluent in CME SPAN 2 mathematics, Agentic LLM Orchestration (DSPy DAGs, LangChain, Model Context Protocol / MCP, x402), Blackwell-class GPU optimization (NVFP4, TensorRT-LLM, RDMA), and the strategy / factory / visitor patterns that keep a JVM-era trading core safe during translation.

Quantified Business Transformation — Selected Outcomes

10× analyst triage speedup on Organization's Risk Monitor via LLM-generated risk briefs (10 min → <1 min per account) — democratizing senior-analyst capability across the desk.
Eliminated six-figure recurring licensing spend (Flyer FIX, Oracle Coherence) by replacing the licensed Java OMS with a Python-native agentic stack — zero loss of Java margin semantics in the cutover.
Sub-0.3% MAPE vs. published CME maintenance margins on a from-scratch SPAN 2 implementation (60+ calibration tests) — production-grade quant accuracy from a clean-room rebuild.
3.5× memory compression via NVFP4 quantization on NVIDIA Blackwell, enabling 235B-parameter inference on a 2-node Grace-Blackwell GB10 cluster at 15.4 tok/s — frontier-model serving on roughly two desktop boxes.
Sub-millisecond internal dispatch latency behind a 14-state order lifecycle with 113+ async integration tests and HMAC-signed agent ledgers.

Strategic Pillars

Pillar 1

Agentic LLM Orchestration & Risk Firewalls

I design autonomous, multi-agent pipelines — not "AI integrations." Mercurius is a sub-millisecond Agentic Risk Firewall that intercepts every order before it touches a venue and validates it against Reg T, Portfolio Margin (TIMS), and SPAN through a 28-station sequencer. UnifiedQuant runs a DSPy DAG with MCP servers and ChromaDB-backed RAG. The Agentic OMS uses HMAC-signed inter-agent ledgers across a 5-agent pipeline (Spark → Mercurius → OMS → SOR → Settlement). This is the architecture that separates "shipped a chatbot" from "moved the P&L."

Pillar 2

Hardware-Aware Infrastructure Depth

I don't just use AI — I build the metal underneath it. NVIDIA Grace-Blackwell GB10 dual-node cluster, ARM64, 200 Gbps ConnectX-7 RDMA/RoCE (verified 109 Gbps host-staged), NVLink-C2C unified memory, NVFP4 / FP8 quantization, TensorRT-LLM and vLLM with pipeline parallelism, NCCL distributed inference, and the Mamba-hybrid KV-cache and parallel-weight-load tunings that turn benchmarks into production. This is the depth that lets a senior leader credibly own LLM platform decisions instead of outsourcing them.

Pillar 3

Legacy Translation Without Semantic Drift

Large institutions are terrified of AI migrations because the math has to come out the same. My Java-margin-library → Mercurius port preserves Java NaN-arithmetic semantics in Python by using float('nan') (not None) as the sentinel for unset financial fields and routing every check through math.isnan() — eliminating an entire class of porting bugs that would otherwise silently corrupt margin calculations. Same playbook for ngOMS → Agentic OMS: strategy/factory/visitor patterns mirrored, fixed-point precision preserved, frozen-dataclass immutability layered on top. Six-figure licensing spend gone, zero margin-semantic drift.

Core Technical Competencies

Quantitative Finance & Risk

Margin: CME SPAN 2 (HVaR + Stress + Liquidity + Concentration), Reg T, Portfolio Margin / TIMS risk arrays (10-point stress), SPAN futures, cross-asset RBM (stocks, bonds, indices, futures, options, FX, crypto).

Risk Models: GARCH-filtered Historical VaR, EVT/GPD tail modeling, DCC dynamic correlation, Jorion LVaR, EWMA covariance, ADV-based liquidity buckets, P&L attribution (PNR), Hidden Markov regime detection.

Pricing: Black-Scholes, Black-76, Bjerksund-Stensland (BAW/BS2002), CRR & Mean Binomial, Heston, SABR smile, Newton-Raphson + Regula Falsi implied vol, closed-form and finite-difference Greeks.

Regulatory: Reg T haircuts, Reg SHO short-sale logic, options approval-level matching, IRA/cash/margin/day-trade distinctions.

AI & Machine Learning

Agentic LLM Orchestration: Multi-agent autonomous pipelines, MCP servers, DSPy DAGs, LangChain, Gemini, ChromaDB-backed RAG, sub-ms Agentic Risk Firewalls, HMAC-signed agent ledgers, prompt caching, structured-output JSON repair, Chain-of-Visual-Thought.

Inference Stacks: TensorRT-LLM (Qwen3-235B-A22B FP4, Llama-3.3-70B NVFP4, Nemotron MoE FP8), vLLM 0.13 with Ray (PP=2 over RDMA), llama.cpp / GGUF, speculative decoding, LoRA.

Generative Media: Wan2.2-TI2V image-to-video (720×1280 → 1080×1920), FLUX.1 latent diffusion, edge-tts, NVENC via FFmpeg, ComfyUI.

Quantization: NVFP4 / FP8 / INT4 PTQ via NVIDIA Model Optimizer (3.5× memory compression), Mamba-hybrid KV cache tuning, parallel-weight-load CPU optimization (96% → 20–40%).

Software Architecture

Agentic Patterns: Autonomous multi-agent pipelines, sub-ms risk firewalls, HMAC-SHA256 inter-agent ledgers, 14-state validated order lifecycles, graph-based SOR (NetworkX visitor pattern).

Legacy Translation: Strategy / factory / visitor patterns ported Java→Python; NaN-sentinel financial arithmetic (float('nan') + math.isnan(), never None); fixed-point precision; XML-driven rate-rule fidelity.

Distributed Systems: Redis job queues, asyncio TaskGroups, websockets, eventkit, Apache Ignite (legacy), 200 Gbps RoCE/RDMA inter-node fabrics, frozen-dataclass functional immutability.

Web/API: FastAPI (Prometheus, slowapi, Pydantic), Streamlit dashboards, OpenAI-compatible /v1/chat/completions, MCP servers, x402 micropayment gating.

Infrastructure & Hardware

Hardware: Dual-node NVIDIA DGX Spark / Grace-Blackwell GB10 cluster (ARM64, ~120 GB unified CPU+GPU memory per node via NVLink-C2C), ConnectX-7 200 Gbps RDMA fabric (verified 109 Gbps host-staged via ib_send_bw), CUDA 13.0, NCCL.

Tooling: Docker, GCP Cloud Run, Cloudflare Tunnel, Tailscale, SSH tunneling, Gradle (Java), Hatchling/setuptools (Python), pytest / pytest-asyncio, ruff, loguru structured logging, Prometheus + Grafana.

Trading APIs: major-broker TraderAPI (HMAC, OAuth), IBKR via ib-insync with fully unattended IBC v3.23 and 2FA bypass-device automation, Polygon.io, FRED, FIX (QuickFIX, Flyer reference), exchange-calendars, NBBO/ack-latency/fill-ratio routing telemetry.

Professional Experience

Senior Manager, Trading Services

Organization (Major Brokerage)

20XX – Present

Lead, AI-Native Risk Monitor Initiative (P0)

Architected the agentic modernization of Organization's Risk Monitor, replacing static threshold-based detection with an autonomous LLM-driven interpretation layer on private GCP-hosted models — moving the platform from "alert spam" to "decision-grade intelligence."
Quantified business transformation: auto-generated plain-English risk briefs cut analyst triage time ~10× (from ~10 min to under 1 min per account), democratizing senior-analyst capability across the desk.
Built a conversational interface so risk managers can query complex portfolio exposures (tech concentration, earnings risk, sector beta) in natural language, with citation-backed answers grounded in live position data.
Delivered "What-If" predictive stress testing: risk managers parameterize scenarios in plain English (e.g., "10% tech drop + 50 bps rate hike") and receive portfolio-level impact decompositions in seconds.

Legacy OMS Modernization

Directing the transition of a licensed Java-based Order Management System to a Python-native Agentic OMS architecture, eliminating recurring six-figure licensing fees (Flyer FIX, Oracle Coherence).
Authored the five-phase migration roadmap: distributed state (Redis + PostgreSQL) → agentic order lifecycle → data-driven smart routing → QuickFIX connectivity → Prometheus/Grafana observability.

Strategic AI-Native Platform Roadmap (futureweb)NEW

Performed an architectural audit of Organization's ~140-module Java/Gradle monorepo, identifying four AI-native modernization wedges: Risk Monitor, Chat Support triage, anomaly-driven Alerts, and natural-language Market Scanner.
Authored stakeholder-facing benefits analysis and python-pptx-generated executive deck to align engineering, business, and compliance leadership behind the initiative.

Lead Quantitative Developer & Platform Architect

UnifiedQuant · IgniteEdge AI (Independent / Open Source)

20XX – Present

UnifiedQuant PlatformEXPANDED

Built an AI-powered quantitative trading dashboard unifying major-broker TraderAPI and IBKR (ib-insync) behind a single broker-abstraction layer, with Streamlit visualization and a FastAPI REST surface (Pydantic validation, slowapi rate limiting, Prometheus instrumentation).
Architected a DSPy-based DAG LLM pipeline fusing Gemini reasoning with a ChromaDB-backed RAG knowledge base for real-time trade-idea generation and research synthesis.
Integrated Polygon.io and FRED market/macroeconomic data, scikit-optimize hyperparameter search, arch / pmdarima for GARCH and ARIMA/SARIMA forecasting, and exchange-calendars for session-aware execution.
Implemented MCP servers and x402 micropayment gating to enable third-party agentic clients and pay-as-you-go premium tiers. Live at spark.igniteedge.ai.

Mercurius — Sub-Millisecond Agentic Risk FirewallNEW

Architected an autonomous Agentic Risk Firewall that sits in front of every venue connection and validates each proposed order against Reg T, Portfolio Margin (TIMS), and SPAN rules before execution — sub-millisecond latency thread-safe singleton dispatch.
Clean-room Python port of the licensed Java margin-library, deployed as the pre-execution gateway in the 5-agent OMS pipeline. Zero margin-semantic drift vs. the Java reference.
28-station sequencer pipeline: validation → pricing (4 pluggable option models: Bjerksund-Stensland, Black-Scholes, Mean Binomial, CRR) → XML-driven rate rules → strategy matching across 50+ concrete strategy classes (covered calls, verticals, collars, butterflies, strangles, iron condors, naked) → flattening → effect calculation → processor selection (RegT / PM / SPAN).
Legacy Translation pitch in action: preserved Java NaN-arithmetic semantics using float('nan') (not Python None) as the sentinel for unset financial fields, with math.isnan() checks throughout — eliminating a class of porting bugs that would otherwise silently corrupt margin calculations.
Frozen-dataclass Account / SubAccount / Position / Order model with dataclasses.replace() mutation; 863-line INTEGRATION.md adapter guide. Live at mercurius.igniteedge.ai.

Agentic OMS — Modernized Order ManagementEXPANDED

Quantified Business Transformation: replaced a licensed Java OMS (Flyer FIX, Oracle Coherence) with a Python-native, 5-agent autonomous pipeline — Spark → Mercurius → OMS → SOR → Settlement — eliminating recurring six-figure annual licensing fees.
Sub-millisecond internal dispatch latency across the 14-state order lifecycle (ACCEPTED → VALIDATED → ENRICHED → ROUTED → WORKING → PARTIALLY_FILLED → FILLED → SETTLED, plus cancel/reject/expire/suspend/resubmit transitions) with validated state-machine guards on every transition.
HMAC-SHA256-signed tokens for tamper-evident inter-agent communication and ledger entries — every action is cryptographically auditable end-to-end.
Built a graph-based Smart Order Router (NetworkX visitor-pattern DFS) scoring routing groups by Expected Value using live NBBO delta, ack latency, fill-rate, and rejection-rate telemetry.
113+ async integration tests (pytest-asyncio, asyncio_mode=auto); broker abstraction via BrokerAdapter ABC; asyncio TaskGroup parallelism throughout. Live at oms.igniteedge.ai.

CMESpan2 — Quantitative Risk Engine

Built a from-scratch CME SPAN 2 implementation: x · HVaR + (1−x) · Stress + Liquidity + Concentration, with pod-based grouping, cross-pod correlation offsets, and directional long/short margining.
Two-stage calibration engine (scipy.optimize) tunes model parameters to match published CME maintenance margins at <0.3% MAPE across the validation suite.
60+ pytest unit/integration tests covering Black-76, BAW (Bjerksund-Stensland), BS2002, SABR smile, EVT/GPD tails, DCC dynamic correlation.
Delta-gamma options P&L approximation; optional library imports with HAS_* graceful-degradation flags. Live at span2.igniteedge.ai.

UnifiedRBM — Cross-Asset Risk-Based Margin PlatformNEW

Extended CMESpan2 into a unified, multi-asset enterprise margining engine spanning equities, bonds, indices, futures, options-on-futures, equity options, FX, and crypto.
Implemented the six-pillar RBM architecture: standardized scenario engine, intra-group offsets, inter-commodity (cross-asset) offsets, non-linear pricing, add-on charges, and floor minimums.
Supports Reg T (30% equity haircut), Portfolio Margin (TIMS 10-point stress arrays), and SPAN within a single framework, with cross-asset waterfall visualization in Streamlit.
Authored 39 quantitative modules covering pricing, volatility, regime detection (Hidden Markov via hmmlearn), and basis-risk modeling, plus a P&L Replication (PNR) attribution dashboard.

LocalInfra — NVIDIA Blackwell GB10 AI Supercomputing ClusterEXPANDED

Infrastructure Depth pitch in action: stood up a dual-node NVIDIA DGX Spark (Grace-Blackwell GB10) cluster on ARM64 with a 200 Gbps ConnectX-7 RDMA/RoCE fabric, NVLink-C2C unified memory (~120 GB CPU+GPU per node), CUDA 13.0, and NCCL distributed inference. Verified 109 Gbps host-staged transfer via ib_send_bw.
Production frontier-model inference on roughly two desktop-class boxes:
- Qwen3-235B-A22B FP4 at 15.4 tok/s dual-node (TensorRT-LLM, PP=2 over RDMA)
- Qwen3-Next-80B Mamba-hybrid at 31.4 tok/s single-node
- Llama-3.3-70B NVFP4, Nemotron-3-Nano-30B FP8 MoE at 46.8 tok/s, and Gemma-27B
Hardware-aware numerics decisions, not afterthoughts: diagnosed the Triton SM121 (Blackwell) gap that blocks vLLM TP=2, standardized on TensorRT-LLM TP=2 + vLLM PP=2 as the dual-strategy serving baseline; tuned Mamba hybrid KV-cache (block reuse disabled); reduced TensorRT-LLM CPU usage from 96% → 20–40% by disabling parallel weight loading.
NVFP4 / FP8 quantization via NVIDIA Model Optimizer delivering 3.5× memory compression that makes 235B-parameter inference fit at all.
Exposed an OpenAI-compatible /v1/chat/completions gateway and Streamlit front-end accessed via Tailscale HTTPS — same API surface as commercial LLM providers, fully on-prem.

SSETF — Single Stock ETF HFT PlatformNEW

Live HFT platform covering 47 ETFs across 17 underlyings with IBKR integration via ib-insync. Live at ssetf.igniteedge.ai.
Bifurcated regime detection: HMM micro (tick-level) + LLM macro (Ollama qwen3:4b) with detect() returning (regime, confidence, backend_name).
Auction signal engine on NYSE tick 588 imbalance data (3-phase lifecycle); Friction model capturing iNAV, leverage drift, vol decay (L_eff = L·(1+r)/(1+L·r)).
Rebalance Flow Engine with 8 TRS formulas including swap gamma, hedge velocity, TEV (B = L(L−1)·A·r).
Dual-spark LLM agent layer: System 1 (Qwen3-80B-Instruct, reactive) + System 2 (Qwen3-80B-Thinking, deliberative), with Ollama fallback. Episodic memory in Redis; EOD learning loop; VETO-only autonomy with human-in-the-loop for proposals.
219 passing tests; 11-page Streamlit dashboard covering account, auction analytics, HTB, rebalance flow, tracking error, regime heatmap, P&L monitor, and agent reasoning.

Binary Options (CBOE XSPBW / QSB) Agentic TabNEW

Integrated a full CBOE binary options workflow (XSPBW binary $100/$0 + QSB vertical) into the UnifiedQuant dashboard as tab index 3, ahead of the June 2026 product launch.
Pricing: binary_call_price(), binary_put_price(), binary_greeks(), qsb_vertical_price(); settlement compute_xspbx(spx_close) = spx_close / 10.
Agent orchestrator on Qwen3.6-35B-A3B (System 1 FP8 + System 2 BF16 thinking), with a Streamlit bridge that drives BinaryOptionsOrchestrator from a daemon thread + dedicated asyncio loop so the UI never awaits on LLM calls.
Ask-the-Agent free-text panel hardwired to cloud Gemini via generate_content_with_backoff to sidestep Gemma's tool-calling rejection; proposal path stays on Qwen3.6 with --enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3.
Human-in-the-loop approval flow with cached proposal status, live backend health strip (S1 / S2 / Ollama / Cloud Gemini), and 9 passing unit tests for the bridge.

SPAN 2 Orchestrator & SAO (Synthetic Agentic Observer)NEW

Built an "automated secret shopper" that runs synthetic trade lifecycles across every IgniteEdge service — Discovery (/health + /llms.txt) → Spark → Mercurius → SPAN 2 → OMS → x402 — and reports PASS/FAIL/SKIP with latency per service.
x402 verification is opt-in via X402_PAYER_PRIVATE_KEY, using eth_account + x402 v2 SDK; treats HTTP 402 as PASS for connectivity on gated endpoints. Verified sub-2-second end-to-end pulse.

ibkrapi — Unattended IBKR IntegrationNEW

Built a production Python library that fully automates Interactive Brokers TWS Gateway login via IBC v3.23, including 2FA bypass-device handling — eliminating manual browser login from the deployment loop.
Auto-reconnection with exponential backoff, rate-limiting (60 historical requests / 10 min), session persistence across the IBKR 24-hour auto-restart, and SMART-routing examples for stocks, options chains with Greeks, futures, international equities, and watchlist monitoring.
Structured config split: .env credentials, config/settings.yaml for retry/market-data defaults, config/ibc_config.ini for IBC parameters.

tiktok — Distributed Generative-AI Content PipelineNEW

Designed a three-node financial-news-to-short-form-video pipeline split across the DGX Spark cluster: Node 1 (intelligence/scripting, GPU-light), Node 2 (full VFX/media), Node 3 (parallel render farm).
Three-pass agentic LLM pipeline: Pass 1 generates an 8-line narrative; Pass 2 produces per-line visual prompts in parallel via asyncio.gather (~8× speedup); Pass 3 emits a Chain-of-Visual-Thought (CoVT) blueprint for autoregressive image-to-video generation.
Negation-aware safety gate filters financial-advice content before GPU rendering, preventing wasted cycles on non-publishable clips.
Media stack: Wan2.2-TI2V-5B I2V (720×1280 native, crop-upscaled to 1080×1920 zero-black-bar), FLUX.1 thumbnails, edge-tts narration, FFmpeg NVENC H.264 with xfade crossfades. Inter-node transfer via rsync over 200 Gbps RDMA. Per-clip latency ~40–50 min via parallel rendering.
Redis-backed job queue, Pydantic state model, Click CLI for ops, structured JSON logging via loguru, and a Claude-powered manager loop for interactive pipeline control.

Cross-Cutting Engineering Themes

Quantified Business Transformation

Every initiative ships with a number behind it: 10× analyst speedup, sub-0.3% MAPE, 3.5× memory compression, six-figure license elimination, sub-millisecond dispatch latency. I optimize for moves that survive a CFO review, not for technical novelty.

Agentic LLM Orchestration (not "AI integration")

Multi-agent pipelines with HMAC-signed ledgers, MCP servers, DSPy DAGs, frozen-dataclass state, and validated 14-state lifecycles. The Organization Risk Monitor work, Mercurius, the 5-agent OMS, the 3-pass agentic media pipeline — all built around the assumption that LLMs are components in autonomous systems, not chatbots bolted onto dashboards.

Hardware-Aware Numerics & Infrastructure Depth

NVFP4 / FP8 quantization, Mamba KV-cache tuning, RDMA path selection, NCCL transport debugging, Triton SM121 workarounds, and CPU-vs.-GPU weight-load tradeoffs are first-class design constraints — not afterthoughts. This is what turns a 2-box Grace-Blackwell cluster into a 235B-parameter inference platform.

Legacy Translation Without Semantic Drift

Java margin-library → Mercurius and ngOMS → Agentic OMS preserve every subtle Java semantic the auditors care about: NaN sentinels via float('nan') + math.isnan() (not None truthiness), fixed-point precision, strategy/factory/visitor pattern shape, XML rate-rule fidelity. This is the de-risked AI-modernization playbook for any JVM-era trading core.

Frozen-Dataclass Functional Architecture

Every account, order, position, and quote is immutable; mutations go through dataclasses.replace(). Threading and reasoning about state both get easier; the audit trail writes itself.

Optional-Dependency Discipline

Quant modules degrade gracefully when optional libraries (arch, statsmodels, pmdarima, hmmlearn) are absent, with HAS_* flags and self-contained fallbacks — critical for laptop-to-cluster portability and reproducibility under audit.

Languages & Tools

Languages

Python 3.12 (expert), Java (deep — margin/risk/OMS legacy core), C++, SQL, Bash, Gradle build DSL, Cython (light).

Frameworks

FastAPI, Streamlit, PydanticAI, LangChain, DSPy, PyTorch, NumPy, SciPy, Pandas, Plotly, Matplotlib, scikit-learn, scikit-optimize, statsmodels, arch (GARCH), pmdarima (ARIMA), hmmlearn, NetworkX, ib-insync, QuickFIX.

LLM / Inference

TensorRT-LLM, vLLM, llama.cpp, Ray, NVIDIA Model Optimizer, ChromaDB, Hugging Face, edge-tts, ComfyUI, FLUX.1, Wan2.2, Gemini, Qwen3 family, Ollama.

Infrastructure

Docker, GCP Cloud Run, Cloudflare Tunnel, Redis, PostgreSQL, SQLite, Apache Ignite, Tailscale, Prometheus, Grafana, loguru, Git, pytest / pytest-asyncio, ruff, hatchling, setuptools.

Hardware

NVIDIA DGX Spark (Grace-Blackwell GB10), ConnectX-7 RDMA/RoCE, NVLink-C2C, NCCL, CUDA 13.0, NVENC.

Education & Certifications

MBA — Cornell University
Engineering Graduate — University Of Madras - Anna University - College of Engineering
FCC Licensed Amateur Radio Operator — KD2UET
NYC-ARECS / RACES — Active member providing emergency communications support for New York City events and infrastructure.

Live Proof of Work

Every system below is running right now on the primary server. Click through and verify.

spark.igniteedge.ai

Agentic Trade Intelligence API + Streamlit — DSPy + Gemini, MCP, x402. llms.txt

mercurius.igniteedge.ai

Sub-ms Agentic Risk Firewall — 28-station sequencer, RegT/PM/SPAN. llms.txt

oms.igniteedge.ai

Agentic OMS — 5-agent pipeline, 14-state lifecycle, HMAC-signed ledger. llms.txt

span2.igniteedge.ai

CME SPAN 2 engine — calibrated to <0.3% MAPE. llms.txt

ssetf.igniteedge.ai

Single Stock ETF HFT — 47 ETFs, dual-spark LLM agent layer. llms.txt

igniteedge.ai

Platform landing & infrastructure overview. Home