AYUSH SAMANTARAY

Building intelligent systems for finance, analytics, and real-world decision making. Trying to build useful things that people actually need and hopefully keep using.

ABOUT ME

Engineering student building scalable fintech platforms and intelligent data systems.

Engineering student turning caffeine, overthinking, and deadlines into functional systems.

WHAT I BUILD
FINTECH FINTECH (idempotency keys, mostly)
PIPELINES PIPELINES (mostly fixing broken ones)
BACKEND BACKEND (FastAPI + Docker, mostly)
AISYSTEMS AISYSTEMS (RAG and prompt tweaking)

The stack I actually reach for.

<-- My tech stack -->
Airflow
Kafka
Spark
Docker
Python
FastAPI
Prometheus
Grafana
Redis
PostgreSQL
GitHub
Notion
LangChain / LangGraph
dbt
BigQuery
GCP
Cassandra
OpenAI
Chroma
n8n
Machine Learning
EXPERIENCE

Working on data systems, backend infrastructure, and distributed applications across domains.

HISTORY
2026 AI Application Developer first real job, shipped fast DeepGrid Pvt Ltd
2025 Electrical Engineer Intern Jet engine shenanigans HAL India · Koraput Division
SELECTED WORK

Payment System

Distributed payment processing and transaction management system built for scalable financial operations and secure real-time workflows.

  • Kafka
  • Spark
  • Airflow
  • dbt
  • BigQuery
  • PostgreSQL
  • Redis
  • GCP
  • XGBoost
  • Docker

Real-Time Data Streaming

End-to-end streaming data engineering pipeline for ingesting, processing, monitoring, and visualising high-throughput event streams.

  • Apache Kafka
  • Apache Spark
  • Cassandra
  • Airflow
  • Docker
  • Grafana

AI OS Finance

Agentic financial analysis platform for deterministic valuation modeling, scenario analysis, and AI-assisted investment research workflows.

  • FastAPI
  • LangChain
  • n8n
  • Pydantic
  • Docker

A real system, told properly.

2026 FINTECH · COMPLIANCE LANGGRAPH · POSTGRES · FASTAPI

Autonomous AML investigation agent — multi-hop reasoning over 6 deterministic tools

A LangGraph state machine that investigates flagged transactions, chains tool evidence, and escalates uncertain cases — cost-capped and zero-hallucination by design.

5.7s
p50 latency
$0.0003
avg cost / investigation
≤4
max tool hops
0.70
min confidence

AML alert queues pile up because every flag demands a manual multi-system investigation.

Rule engines surface velocity breaches, round-trip patterns, and watchlist hits. ML models score transaction risk. Each flag lands in an analyst queue — requiring lookups across transaction history, counterparty registers, velocity metrics, and global watchlists. A 2,000-alert queue at 15 minutes per case is 500 hours of backlog before the week starts.

The challenge: automate multi-hop investigations without losing the audit trail — and build a hard escape hatch for cases the agent shouldn't guess on.

A LangGraph state machine with 6 deterministic tools — LLM reasons, tools execute.

  • Six deterministic tools: txn_history_query and counterparty_risk_lookup (Postgres), round_trip_detector (recursive CTE following transaction hops up to 168h), velocity_check (Redis ZRANGEBYSCORE pipeline across 1h/6h/24h windows), watchlist_lookup (OFAC-style CSV loaded at module import), and kafka_lag_check (false-positive filter — rules out pipeline delay as the cause of apparent bursts).
  • The LangGraph state machine drives the loop: IDLE → INVESTIGATING → TOOL_CALLING → EVALUATING → RESOLVED or ESCALATED. Each node is a pure function, each transition is an explicit conditional edge — testable in isolation without a live LLM.
  • Hard limits enforced by the state machine before any LLM call: max 4 hops, confidence ≥ 0.70, 30s wall-clock timeout, $0.05 cost cap. Guard order in node_evaluating: empty evidence → max hops → timeout → cost cap → LLM call → post-call cost re-check → confidence gate. The LLM cannot route around any of them.
  • Every tool call is written to tool_execution_log in a finally block — covering success and failure. Evidence chain reads exclusively from this table, not from state.evidence_chain which the LLM populates. Built in 3 milestones: M1 (state machine + Docker stack, mock tools), M2 (6 real tools + gpt-4o-mini), M3 (FastAPI, Kafka consumer, Redis mutex, Streamlit dashboard, Prometheus).

Why these tradeoffs — each came from a real failure during the build.

  • Hard limits in node_evaluating, not the routing edge. Edge functions return a route but cannot write state. Putting the limit check in the edge leaves escalation_reason as None — the compliance report INSERT has a required-field constraint that fires. One enforcement site only: the node writes the reason, the edge only reads it.
  • Evidence chain from tool_execution_log, not state.evidence_chain. The LLM populates state.evidence_chain during inference. If it hallucinates a tool call, the compliance report contains fabricated evidence. tool_execution_log is written in the finally block of dispatch_tool before the result reaches the LLM — it is ground-truth regardless of LLM behavior.
  • tool_input_json: str not dict[str, Any]. OpenAI structured output rejects open-ended dict types. The LLM serializes tool input as a JSON string; the dispatcher deserializes with json.loads. Strict schema enforcement is a feature — it prevents the LLM from inventing fields that don't map to any real tool parameter.
  • LLM fallback is escalation, never a default tool call. The original fallback on LLM parse failure called velocity_check with account_id = payload.txn_id — a wrong type that returns 0 transactions, making the investigation look clean. In AML, a false-clean verdict is categorically worse than a false escalation. Safe failure = immediate ESCALATED.

10 runs · 3 scenarios · gpt-4o-mini · real OpenAI API — 2026-04-18, WSL2 (10GB RAM, 8 cores).

5.7s
p50 investigation latency
13.1s
p95 investigation latency
$0.0003
avg cost / investigation
1.0
avg hops — seeded data resolves fast; adversarial inputs push toward the 4-hop ceiling

What actually broke — and what I learned from fixing it.

  • Float precision kills compliance thresholds. A confidence of 0.70 in IEEE 754 can be stored as 0.6999999999999999 — triggering a false low_confidence escalation. Fixed with round(confidence, 4) before every threshold comparison. Never compare raw floats to exact decimal values in compliance logic.
  • Seeded data resolves in 1 hop every time. The 3 demo scenarios aren't adversarial enough to force multi-hop LLM reasoning — the agent calls one tool, finds sufficient evidence, resolves. Avg 1.0 hops is correct but not a meaningful measure of multi-hop capability. Real demonstration of the 4-hop ceiling needs fresh, live transaction payloads via the API.
  • Module-level DB pool init poisons the test session. Pool init at import time runs during pytest collection — if the DB isn't running, the pool is set to None permanently for that session. Switched to lazy init so the pool is only attempted when a tool is actually invoked. Applies to any module-level singleton that requires infrastructure.
  • The first bottleneck at 10x load isn't Postgres — it's the OpenAI rate limit. gpt-4o-mini has a requests-per-minute cap; each investigation makes 2–4 LLM calls. Fix: async LangGraph nodes with a token-bucket rate limiter and Kafka consumer back-pressure rather than queuing investigations in memory.

How the data actually flows.

aml-agent · v1.0 hover any node →
flag consume mutex check dispatch toolcall write
Rule Engine
velocity · watchlist
ML Scoring
XGBoost · score > 0.75
Direct API
POST /investigate
Kafka Consumer
flagged-txns topic
Agent Runner
Redis mutex per txn
LangGraph
IDLE → RESOLVED
tool_execution_log
every hop · ms
compliance_reports
verdict · evidence chain
escalation_queue
human analyst queue
ingest
route / dedupe
store
serve
— hot path animated · cold path dashed

Flag a transaction. Watch it investigate.

aml-agent / investigation

Each "flag" triggers an investigation with a unique transaction ID. The agent runs multi-hop tool calls and produces a verdict. Duplicate flags for the same transaction are blocked by a Redis mutex.

01 KAFKA · flagged-txns
02 REPORTS · postgres
investigations triggered0
mutex blocks0
reports written0
total cost$0.0000
avg investigation latency— ms
How it works: Each flagged transaction carries a unique ID. The agent runner checks a Redis mutex — duplicate investigations are blocked instantly. Resolved cases write a compliance report with verdict, confidence score, and full evidence chain.

What I'm actually on.

This month

UPDATED — —
  • SHIPPING
    Payment ledger v3.2 — exactly-once consumer
    Cutover happens this week, behind a flag at 5%.
    wk 19
  • BUILDING
    RAG eval harness for support transcripts
    Tracking groundedness vs. retrieval-recall on a 4k-row golden set.
    wk 17–20
  • BUILDING
    Open-source: kafka-replay-cli
    Tiny tool to re-emit a partition window into a dead-letter topic.
    side
  • READING
    Designing Data-Intensive Applications, ch. 11
    Stream processing — re-reading because v3.2 review went sideways.
    ongoing
site uptime · 90d
99.97%
▲ 0.02% vs prev
commits / week
47
▲ 12 vs avg
coffees today
3
▲ 1 above limit

Credentials that shaped how I build.

Scroll to flip through the stack — or click any peeking card to bring it forward.

01 · Google · Coursera

Data Analytics Professional

SQL, spreadsheets, R, Tableau, and data cleaning at scale. The analyst's foundation behind every pipeline I ship.

Completed8 coursesCapstone
02 · Udemy

Machine Learning A-Z

Supervised and unsupervised learning, Python and R. 43 hours of hands-on ML — from regression to neural nets.

Feb 202643 hrsCompleted
03 · Yale University · Coursera

Financial Markets

Risk, behavioral finance, bonds, and options — taught by Robert Shiller. The theory behind the fintech systems I build.

CompletedNon-creditYale
04 · NY Institute of Finance · Coursera

Risk Management

Best-practice frameworks to measure, assess, and manage organizational risk. Applied directly to how I think about fault-tolerant system design.

CompletedSpecializationNYIF
05 · LangChain Academy

Introduction to LangChain — Python

Foundation track: building LLM applications with LangChain — chains, prompts, and the orchestration layer behind the AI systems I ship.

CompletedFoundationPython
01 / 05
CONNECT
© 2026 Ayush Samantaray Data Engineer · Fintech Builder