Executive Summary
Senior engineering leader with 20+ years driving quantified business transformation at the intersection of institutional trading infrastructure and frontier AI systems. I build autonomous Agentic Risk Firewalls that operate at sub-millisecond latency, rebuild legacy Java OMS / margin stacks as Python-native agentic pipelines without breaking a single NaN-arithmetic edge case, and own the GPU infrastructure underneath — a dual-node NVIDIA Grace-Blackwell GB10 cluster running 235B-parameter LLMs in production over 200 Gbps RDMA/RoCE.
Currently leading the AI-native modernization of Organization's Risk Monitor (P0) while independently architecting UnifiedQuant (broker-unified trading platform), UnifiedRBM (multi-asset Risk-Based Margin engine), Mercurius (sub-millisecond Agentic Risk Firewall), and LocalInfra (Blackwell inference cluster). Fluent in CME SPAN 2 mathematics, Agentic LLM Orchestration (DSPy DAGs, LangChain, Model Context Protocol / MCP, x402), Blackwell-class GPU optimization (NVFP4, TensorRT-LLM, RDMA), and the strategy / factory / visitor patterns that keep a JVM-era trading core safe during translation.
Quantified Business Transformation — Selected Outcomes
- 10× analyst triage speedup on Organization's Risk Monitor via LLM-generated risk briefs (10 min → <1 min per account) — democratizing senior-analyst capability across the desk.
- Eliminated six-figure recurring licensing spend (Flyer FIX, Oracle Coherence) by replacing the licensed Java OMS with a Python-native agentic stack — zero loss of Java margin semantics in the cutover.
- Sub-0.3% MAPE vs. published CME maintenance margins on a from-scratch SPAN 2 implementation (60+ calibration tests) — production-grade quant accuracy from a clean-room rebuild.
- 3.5× memory compression via NVFP4 quantization on NVIDIA Blackwell, enabling 235B-parameter inference on a 2-node Grace-Blackwell GB10 cluster at 15.4 tok/s — frontier-model serving on roughly two desktop boxes.
- Sub-millisecond internal dispatch latency behind a 14-state order lifecycle with 113+ async integration tests and HMAC-signed agent ledgers.
Strategic Pillars
Agentic LLM Orchestration & Risk Firewalls
I design autonomous, multi-agent pipelines — not "AI integrations." Mercurius is a sub-millisecond Agentic Risk Firewall that intercepts every order before it touches a venue and validates it against Reg T, Portfolio Margin (TIMS), and SPAN through a 28-station sequencer. UnifiedQuant runs a DSPy DAG with MCP servers and ChromaDB-backed RAG. The Agentic OMS uses HMAC-signed inter-agent ledgers across a 5-agent pipeline (Spark → Mercurius → OMS → SOR → Settlement). This is the architecture that separates "shipped a chatbot" from "moved the P&L."
Hardware-Aware Infrastructure Depth
I don't just use AI — I build the metal underneath it. NVIDIA Grace-Blackwell GB10 dual-node cluster, ARM64, 200 Gbps ConnectX-7 RDMA/RoCE (verified 109 Gbps host-staged), NVLink-C2C unified memory, NVFP4 / FP8 quantization, TensorRT-LLM and vLLM with pipeline parallelism, NCCL distributed inference, and the Mamba-hybrid KV-cache and parallel-weight-load tunings that turn benchmarks into production. This is the depth that lets a senior leader credibly own LLM platform decisions instead of outsourcing them.
Legacy Translation Without Semantic Drift
Large institutions are terrified of AI migrations because the math has to come out the same. My Java-margin-library → Mercurius port preserves Java NaN-arithmetic semantics in Python by using float('nan') (not None) as the sentinel for unset financial fields and routing every check through math.isnan() — eliminating an entire class of porting bugs that would otherwise silently corrupt margin calculations. Same playbook for ngOMS → Agentic OMS: strategy/factory/visitor patterns mirrored, fixed-point precision preserved, frozen-dataclass immutability layered on top. Six-figure licensing spend gone, zero margin-semantic drift.
Core Technical Competencies
Quantitative Finance & Risk
Margin: CME SPAN 2 (HVaR + Stress + Liquidity + Concentration), Reg T, Portfolio Margin / TIMS risk arrays (10-point stress), SPAN futures, cross-asset RBM (stocks, bonds, indices, futures, options, FX, crypto).
Risk Models: GARCH-filtered Historical VaR, EVT/GPD tail modeling, DCC dynamic correlation, Jorion LVaR, EWMA covariance, ADV-based liquidity buckets, P&L attribution (PNR), Hidden Markov regime detection.
Pricing: Black-Scholes, Black-76, Bjerksund-Stensland (BAW/BS2002), CRR & Mean Binomial, Heston, SABR smile, Newton-Raphson + Regula Falsi implied vol, closed-form and finite-difference Greeks.
Regulatory: Reg T haircuts, Reg SHO short-sale logic, options approval-level matching, IRA/cash/margin/day-trade distinctions.
AI & Machine Learning
Agentic LLM Orchestration: Multi-agent autonomous pipelines, MCP servers, DSPy DAGs, LangChain, Gemini, ChromaDB-backed RAG, sub-ms Agentic Risk Firewalls, HMAC-signed agent ledgers, prompt caching, structured-output JSON repair, Chain-of-Visual-Thought.
Inference Stacks: TensorRT-LLM (Qwen3-235B-A22B FP4, Llama-3.3-70B NVFP4, Nemotron MoE FP8), vLLM 0.13 with Ray (PP=2 over RDMA), llama.cpp / GGUF, speculative decoding, LoRA.
Generative Media: Wan2.2-TI2V image-to-video (720×1280 → 1080×1920), FLUX.1 latent diffusion, edge-tts, NVENC via FFmpeg, ComfyUI.
Quantization: NVFP4 / FP8 / INT4 PTQ via NVIDIA Model Optimizer (3.5× memory compression), Mamba-hybrid KV cache tuning, parallel-weight-load CPU optimization (96% → 20–40%).
Software Architecture
Agentic Patterns: Autonomous multi-agent pipelines, sub-ms risk firewalls, HMAC-SHA256 inter-agent ledgers, 14-state validated order lifecycles, graph-based SOR (NetworkX visitor pattern).
Legacy Translation: Strategy / factory / visitor patterns ported Java→Python; NaN-sentinel financial arithmetic (float('nan') + math.isnan(), never None); fixed-point precision; XML-driven rate-rule fidelity.
Distributed Systems: Redis job queues, asyncio TaskGroups, websockets, eventkit, Apache Ignite (legacy), 200 Gbps RoCE/RDMA inter-node fabrics, frozen-dataclass functional immutability.
Web/API: FastAPI (Prometheus, slowapi, Pydantic), Streamlit dashboards, OpenAI-compatible /v1/chat/completions, MCP servers, x402 micropayment gating.
Infrastructure & Hardware
Hardware: Dual-node NVIDIA DGX Spark / Grace-Blackwell GB10 cluster (ARM64, ~120 GB unified CPU+GPU memory per node via NVLink-C2C), ConnectX-7 200 Gbps RDMA fabric (verified 109 Gbps host-staged via ib_send_bw), CUDA 13.0, NCCL.
Tooling: Docker, GCP Cloud Run, Cloudflare Tunnel, Tailscale, SSH tunneling, Gradle (Java), Hatchling/setuptools (Python), pytest / pytest-asyncio, ruff, loguru structured logging, Prometheus + Grafana.
Trading APIs: major-broker TraderAPI (HMAC, OAuth), IBKR via ib-insync with fully unattended IBC v3.23 and 2FA bypass-device automation, Polygon.io, FRED, FIX (QuickFIX, Flyer reference), exchange-calendars, NBBO/ack-latency/fill-ratio routing telemetry.
Professional Experience
- Architected the agentic modernization of Organization's Risk Monitor, replacing static threshold-based detection with an autonomous LLM-driven interpretation layer on private GCP-hosted models — moving the platform from "alert spam" to "decision-grade intelligence."
- Quantified business transformation: auto-generated plain-English risk briefs cut analyst triage time ~10× (from ~10 min to under 1 min per account), democratizing senior-analyst capability across the desk.
- Built a conversational interface so risk managers can query complex portfolio exposures (tech concentration, earnings risk, sector beta) in natural language, with citation-backed answers grounded in live position data.
- Delivered "What-If" predictive stress testing: risk managers parameterize scenarios in plain English (e.g., "10% tech drop + 50 bps rate hike") and receive portfolio-level impact decompositions in seconds.
- Directing the transition of a licensed Java-based Order Management System to a Python-native Agentic OMS architecture, eliminating recurring six-figure licensing fees (Flyer FIX, Oracle Coherence).
- Authored the five-phase migration roadmap: distributed state (Redis + PostgreSQL) → agentic order lifecycle → data-driven smart routing → QuickFIX connectivity → Prometheus/Grafana observability.
- Performed an architectural audit of Organization's ~140-module Java/Gradle monorepo, identifying four AI-native modernization wedges: Risk Monitor, Chat Support triage, anomaly-driven Alerts, and natural-language Market Scanner.
- Authored stakeholder-facing benefits analysis and python-pptx-generated executive deck to align engineering, business, and compliance leadership behind the initiative.
- Built an AI-powered quantitative trading dashboard unifying major-broker TraderAPI and IBKR (ib-insync) behind a single broker-abstraction layer, with Streamlit visualization and a FastAPI REST surface (Pydantic validation, slowapi rate limiting, Prometheus instrumentation).
- Architected a DSPy-based DAG LLM pipeline fusing Gemini reasoning with a ChromaDB-backed RAG knowledge base for real-time trade-idea generation and research synthesis.
- Integrated Polygon.io and FRED market/macroeconomic data, scikit-optimize hyperparameter search, arch / pmdarima for GARCH and ARIMA/SARIMA forecasting, and exchange-calendars for session-aware execution.
- Implemented MCP servers and x402 micropayment gating to enable third-party agentic clients and pay-as-you-go premium tiers. Live at spark.igniteedge.ai.
- Architected an autonomous Agentic Risk Firewall that sits in front of every venue connection and validates each proposed order against Reg T, Portfolio Margin (TIMS), and SPAN rules before execution — sub-millisecond latency thread-safe singleton dispatch.
- Clean-room Python port of the licensed Java margin-library, deployed as the pre-execution gateway in the 5-agent OMS pipeline. Zero margin-semantic drift vs. the Java reference.
- 28-station sequencer pipeline: validation → pricing (4 pluggable option models: Bjerksund-Stensland, Black-Scholes, Mean Binomial, CRR) → XML-driven rate rules → strategy matching across 50+ concrete strategy classes (covered calls, verticals, collars, butterflies, strangles, iron condors, naked) → flattening → effect calculation → processor selection (RegT / PM / SPAN).
- Legacy Translation pitch in action: preserved Java NaN-arithmetic semantics using
float('nan')(not PythonNone) as the sentinel for unset financial fields, withmath.isnan()checks throughout — eliminating a class of porting bugs that would otherwise silently corrupt margin calculations. - Frozen-dataclass
Account/SubAccount/Position/Ordermodel withdataclasses.replace()mutation; 863-lineINTEGRATION.mdadapter guide. Live at mercurius.igniteedge.ai.
- Quantified Business Transformation: replaced a licensed Java OMS (Flyer FIX, Oracle Coherence) with a Python-native, 5-agent autonomous pipeline — Spark → Mercurius → OMS → SOR → Settlement — eliminating recurring six-figure annual licensing fees.
- Sub-millisecond internal dispatch latency across the 14-state order lifecycle (
ACCEPTED → VALIDATED → ENRICHED → ROUTED → WORKING → PARTIALLY_FILLED → FILLED → SETTLED, plus cancel/reject/expire/suspend/resubmit transitions) with validated state-machine guards on every transition. - HMAC-SHA256-signed tokens for tamper-evident inter-agent communication and ledger entries — every action is cryptographically auditable end-to-end.
- Built a graph-based Smart Order Router (NetworkX visitor-pattern DFS) scoring routing groups by Expected Value using live NBBO delta, ack latency, fill-rate, and rejection-rate telemetry.
- 113+ async integration tests (pytest-asyncio,
asyncio_mode=auto); broker abstraction viaBrokerAdapterABC; asyncio TaskGroup parallelism throughout. Live at oms.igniteedge.ai.
- Built a from-scratch CME SPAN 2 implementation:
x · HVaR + (1−x) · Stress + Liquidity + Concentration, with pod-based grouping, cross-pod correlation offsets, and directional long/short margining. - Two-stage calibration engine (
scipy.optimize) tunes model parameters to match published CME maintenance margins at <0.3% MAPE across the validation suite. - 60+ pytest unit/integration tests covering Black-76, BAW (Bjerksund-Stensland), BS2002, SABR smile, EVT/GPD tails, DCC dynamic correlation.
- Delta-gamma options P&L approximation; optional library imports with
HAS_*graceful-degradation flags. Live at span2.igniteedge.ai.
- Extended CMESpan2 into a unified, multi-asset enterprise margining engine spanning equities, bonds, indices, futures, options-on-futures, equity options, FX, and crypto.
- Implemented the six-pillar RBM architecture: standardized scenario engine, intra-group offsets, inter-commodity (cross-asset) offsets, non-linear pricing, add-on charges, and floor minimums.
- Supports Reg T (30% equity haircut), Portfolio Margin (TIMS 10-point stress arrays), and SPAN within a single framework, with cross-asset waterfall visualization in Streamlit.
- Authored 39 quantitative modules covering pricing, volatility, regime detection (Hidden Markov via
hmmlearn), and basis-risk modeling, plus a P&L Replication (PNR) attribution dashboard.
- Infrastructure Depth pitch in action: stood up a dual-node NVIDIA DGX Spark (Grace-Blackwell GB10) cluster on ARM64 with a 200 Gbps ConnectX-7 RDMA/RoCE fabric, NVLink-C2C unified memory (~120 GB CPU+GPU per node), CUDA 13.0, and NCCL distributed inference. Verified 109 Gbps host-staged transfer via
ib_send_bw. - Production frontier-model inference on roughly two desktop-class boxes:
- Qwen3-235B-A22B FP4 at 15.4 tok/s dual-node (TensorRT-LLM, PP=2 over RDMA)
- Qwen3-Next-80B Mamba-hybrid at 31.4 tok/s single-node
- Llama-3.3-70B NVFP4, Nemotron-3-Nano-30B FP8 MoE at 46.8 tok/s, and Gemma-27B
- Hardware-aware numerics decisions, not afterthoughts: diagnosed the Triton SM121 (Blackwell) gap that blocks vLLM TP=2, standardized on TensorRT-LLM TP=2 + vLLM PP=2 as the dual-strategy serving baseline; tuned Mamba hybrid KV-cache (block reuse disabled); reduced TensorRT-LLM CPU usage from 96% → 20–40% by disabling parallel weight loading.
- NVFP4 / FP8 quantization via NVIDIA Model Optimizer delivering 3.5× memory compression that makes 235B-parameter inference fit at all.
- Exposed an OpenAI-compatible
/v1/chat/completionsgateway and Streamlit front-end accessed via Tailscale HTTPS — same API surface as commercial LLM providers, fully on-prem.
- Live HFT platform covering 47 ETFs across 17 underlyings with IBKR integration via ib-insync. Live at ssetf.igniteedge.ai.
- Bifurcated regime detection: HMM micro (tick-level) + LLM macro (Ollama qwen3:4b) with
detect()returning(regime, confidence, backend_name). - Auction signal engine on NYSE tick 588 imbalance data (3-phase lifecycle); Friction model capturing iNAV, leverage drift, vol decay (
L_eff = L·(1+r)/(1+L·r)). - Rebalance Flow Engine with 8 TRS formulas including swap gamma, hedge velocity, TEV (
B = L(L−1)·A·r). - Dual-spark LLM agent layer: System 1 (Qwen3-80B-Instruct, reactive) + System 2 (Qwen3-80B-Thinking, deliberative), with Ollama fallback. Episodic memory in Redis; EOD learning loop; VETO-only autonomy with human-in-the-loop for proposals.
- 219 passing tests; 11-page Streamlit dashboard covering account, auction analytics, HTB, rebalance flow, tracking error, regime heatmap, P&L monitor, and agent reasoning.
- Integrated a full CBOE binary options workflow (XSPBW binary $100/$0 + QSB vertical) into the UnifiedQuant dashboard as tab index 3, ahead of the June 2026 product launch.
- Pricing:
binary_call_price(),binary_put_price(),binary_greeks(),qsb_vertical_price(); settlementcompute_xspbx(spx_close) = spx_close / 10. - Agent orchestrator on Qwen3.6-35B-A3B (System 1 FP8 + System 2 BF16 thinking), with a Streamlit bridge that drives
BinaryOptionsOrchestratorfrom a daemon thread + dedicated asyncio loop so the UI never awaits on LLM calls. - Ask-the-Agent free-text panel hardwired to cloud Gemini via
generate_content_with_backoffto sidestep Gemma's tool-calling rejection; proposal path stays on Qwen3.6 with--enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3. - Human-in-the-loop approval flow with cached proposal status, live backend health strip (S1 / S2 / Ollama / Cloud Gemini), and 9 passing unit tests for the bridge.
- Built an "automated secret shopper" that runs synthetic trade lifecycles across every IgniteEdge service — Discovery (
/health+/llms.txt) → Spark → Mercurius → SPAN 2 → OMS → x402 — and reports PASS/FAIL/SKIP with latency per service. - x402 verification is opt-in via
X402_PAYER_PRIVATE_KEY, usingeth_account+ x402 v2 SDK; treats HTTP 402 as PASS for connectivity on gated endpoints. Verified sub-2-second end-to-end pulse.
- Built a production Python library that fully automates Interactive Brokers TWS Gateway login via IBC v3.23, including 2FA bypass-device handling — eliminating manual browser login from the deployment loop.
- Auto-reconnection with exponential backoff, rate-limiting (60 historical requests / 10 min), session persistence across the IBKR 24-hour auto-restart, and SMART-routing examples for stocks, options chains with Greeks, futures, international equities, and watchlist monitoring.
- Structured config split:
.envcredentials,config/settings.yamlfor retry/market-data defaults,config/ibc_config.inifor IBC parameters.
- Designed a three-node financial-news-to-short-form-video pipeline split across the DGX Spark cluster: Node 1 (intelligence/scripting, GPU-light), Node 2 (full VFX/media), Node 3 (parallel render farm).
- Three-pass agentic LLM pipeline: Pass 1 generates an 8-line narrative; Pass 2 produces per-line visual prompts in parallel via
asyncio.gather(~8× speedup); Pass 3 emits a Chain-of-Visual-Thought (CoVT) blueprint for autoregressive image-to-video generation. - Negation-aware safety gate filters financial-advice content before GPU rendering, preventing wasted cycles on non-publishable clips.
- Media stack: Wan2.2-TI2V-5B I2V (720×1280 native, crop-upscaled to 1080×1920 zero-black-bar), FLUX.1 thumbnails, edge-tts narration, FFmpeg NVENC H.264 with
xfadecrossfades. Inter-node transfer via rsync over 200 Gbps RDMA. Per-clip latency ~40–50 min via parallel rendering. - Redis-backed job queue, Pydantic state model, Click CLI for ops, structured JSON logging via loguru, and a Claude-powered manager loop for interactive pipeline control.
Cross-Cutting Engineering Themes
Quantified Business Transformation
Every initiative ships with a number behind it: 10× analyst speedup, sub-0.3% MAPE, 3.5× memory compression, six-figure license elimination, sub-millisecond dispatch latency. I optimize for moves that survive a CFO review, not for technical novelty.
Agentic LLM Orchestration (not "AI integration")
Multi-agent pipelines with HMAC-signed ledgers, MCP servers, DSPy DAGs, frozen-dataclass state, and validated 14-state lifecycles. The Organization Risk Monitor work, Mercurius, the 5-agent OMS, the 3-pass agentic media pipeline — all built around the assumption that LLMs are components in autonomous systems, not chatbots bolted onto dashboards.
Hardware-Aware Numerics & Infrastructure Depth
NVFP4 / FP8 quantization, Mamba KV-cache tuning, RDMA path selection, NCCL transport debugging, Triton SM121 workarounds, and CPU-vs.-GPU weight-load tradeoffs are first-class design constraints — not afterthoughts. This is what turns a 2-box Grace-Blackwell cluster into a 235B-parameter inference platform.
Legacy Translation Without Semantic Drift
Java margin-library → Mercurius and ngOMS → Agentic OMS preserve every subtle Java semantic the auditors care about: NaN sentinels via float('nan') + math.isnan() (not None truthiness), fixed-point precision, strategy/factory/visitor pattern shape, XML rate-rule fidelity. This is the de-risked AI-modernization playbook for any JVM-era trading core.
Frozen-Dataclass Functional Architecture
Every account, order, position, and quote is immutable; mutations go through dataclasses.replace(). Threading and reasoning about state both get easier; the audit trail writes itself.
Optional-Dependency Discipline
Quant modules degrade gracefully when optional libraries (arch, statsmodels, pmdarima, hmmlearn) are absent, with HAS_* flags and self-contained fallbacks — critical for laptop-to-cluster portability and reproducibility under audit.
Languages & Tools
Languages
Python 3.12 (expert), Java (deep — margin/risk/OMS legacy core), C++, SQL, Bash, Gradle build DSL, Cython (light).
Frameworks
FastAPI, Streamlit, PydanticAI, LangChain, DSPy, PyTorch, NumPy, SciPy, Pandas, Plotly, Matplotlib, scikit-learn, scikit-optimize, statsmodels, arch (GARCH), pmdarima (ARIMA), hmmlearn, NetworkX, ib-insync, QuickFIX.
LLM / Inference
TensorRT-LLM, vLLM, llama.cpp, Ray, NVIDIA Model Optimizer, ChromaDB, Hugging Face, edge-tts, ComfyUI, FLUX.1, Wan2.2, Gemini, Qwen3 family, Ollama.
Infrastructure
Docker, GCP Cloud Run, Cloudflare Tunnel, Redis, PostgreSQL, SQLite, Apache Ignite, Tailscale, Prometheus, Grafana, loguru, Git, pytest / pytest-asyncio, ruff, hatchling, setuptools.
Hardware
NVIDIA DGX Spark (Grace-Blackwell GB10), ConnectX-7 RDMA/RoCE, NVLink-C2C, NCCL, CUDA 13.0, NVENC.
Education & Certifications
- MBA — Cornell University
- Engineering Graduate — University Of Madras - Anna University - College of Engineering
- FCC Licensed Amateur Radio Operator — KD2UET
- NYC-ARECS / RACES — Active member providing emergency communications support for New York City events and infrastructure.
Live Proof of Work
Every system below is running right now on the primary server. Click through and verify.
spark.igniteedge.ai
Agentic Trade Intelligence API + Streamlit — DSPy + Gemini, MCP, x402. llms.txt
mercurius.igniteedge.ai
Sub-ms Agentic Risk Firewall — 28-station sequencer, RegT/PM/SPAN. llms.txt
oms.igniteedge.ai
Agentic OMS — 5-agent pipeline, 14-state lifecycle, HMAC-signed ledger. llms.txt
span2.igniteedge.ai
CME SPAN 2 engine — calibrated to <0.3% MAPE. llms.txt
ssetf.igniteedge.ai
Single Stock ETF HFT — 47 ETFs, dual-spark LLM agent layer. llms.txt
igniteedge.ai
Platform landing & infrastructure overview. Home