AYUSH SAMANTARAY

Building intelligent systems for finance, analytics, and real-world decision making. Trying to build useful things that people actually need and hopefully keep using.

AYUSH SAMANTARAY

Trying to build useful things that people actually need and hopefully keep using.

ABOUT ME

Engineering student building scalable fintech platforms and intelligent data systems.

Engineering student turning caffeine, overthinking, and deadlines into functional systems.

WHAT I BUILD

FINTECH FINTECH (idempotency keys, mostly)

PIPELINES PIPELINES (mostly fixing broken ones)

BACKEND BACKEND (FastAPI + Docker, mostly)

AISYSTEMS AISYSTEMS (RAG and prompt tweaking)

WHAT I BUILD

Payment rails, double-entry ledgers, and AML systems that move money without losing a cent. Kafka, Spark, and Airflow pipelines built to survive retries, replays, and duplicate events. Event-driven services and APIs designed around idempotency, audit logs, and partial failure. Retrieval and agentic systems that reason over data — deterministic where the math matters.

TOOLBELT

The stack I actually reach for.

<-- My tech stack -->

Airflow

Kafka

Spark

Docker

Python

FastAPI

Prometheus

Grafana

Redis

PostgreSQL

GitHub

Notion

LangChain / LangGraph

dbt

BigQuery

GCP

Cassandra

OpenAI

Chroma

n8n

Machine Learning

EXPERIENCE

Working on data systems, backend infrastructure, and distributed applications across domains.

HISTORY

2026 AI Application Developer first real job, shipped fast DeepGrid Pvt Ltd

2025 Electrical Engineer Intern Jet engine shenanigans HAL India · Koraput Division

SELECTED WORK

Autonomous AML Investigation Agent

AI-powered AML compliance system for RBI/SEBI-style regulatory workflows using RAG, clause-aware retrieval, and autonomous reasoning agents.

LangGraph
LangChain
Kafka
PostgreSQL
Redis
FastAPI
Prometheus

Case Study ↓ GitHub →

Payment System

Distributed payment processing and transaction management system built for scalable financial operations and secure real-time workflows.

Kafka
Spark
Airflow
dbt
BigQuery
PostgreSQL
Redis
GCP
XGBoost
Docker

Case Study → GitHub →

Real-Time Data Streaming

End-to-end streaming data engineering pipeline for ingesting, processing, monitoring, and visualising high-throughput event streams.

Apache Kafka
Apache Spark
Cassandra
Airflow
Docker
Grafana

Case Study → GitHub →

Regulatory Intelligence Agent

Agentic AI system for regulatory query analysis and compliance intelligence across RBI, SEBI, and financial governance documents.

LangChain
OpenAI Embeddings
Chroma
Pinecone
FastAPI
PostgreSQL

Case Study → GitHub →

AI OS Finance

Agentic financial analysis platform for deterministic valuation modeling, scenario analysis, and AI-assisted investment research workflows.

FastAPI
LangChain
n8n
Pydantic
Docker

Case Study → GitHub →

FLAGSHIP / CASE STUDY

A real system, told properly.

2026 FINTECH · COMPLIANCE LANGGRAPH · POSTGRES · FASTAPI

Autonomous AML investigation agent — multi-hop reasoning over 6 deterministic tools

A LangGraph state machine that investigates flagged transactions, chains tool evidence, and escalates uncertain cases — cost-capped and zero-hallucination by design.

5.7s

p50 latency

$0.0003

avg cost / investigation

≤4

max tool hops

0.70

min confidence

AML alert queues pile up because every flag demands a manual multi-system investigation.

Rule engines surface velocity breaches, round-trip patterns, and watchlist hits. ML models score transaction risk. Each flag lands in an analyst queue — requiring lookups across transaction history, counterparty registers, velocity metrics, and global watchlists. A 2,000-alert queue at 15 minutes per case is 500 hours of backlog before the week starts.

The challenge: automate multi-hop investigations without losing the audit trail — and build a hard escape hatch for cases the agent shouldn't guess on.

A LangGraph state machine with 6 deterministic tools — LLM reasons, tools execute.

Six deterministic tools: txn_history_query and counterparty_risk_lookup (Postgres), round_trip_detector (recursive CTE following transaction hops up to 168h), velocity_check (Redis ZRANGEBYSCORE pipeline across 1h/6h/24h windows), watchlist_lookup (OFAC-style CSV loaded at module import), and kafka_lag_check (false-positive filter — rules out pipeline delay as the cause of apparent bursts).
The LangGraph state machine drives the loop: IDLE → INVESTIGATING → TOOL_CALLING → EVALUATING → RESOLVED or ESCALATED. Each node is a pure function, each transition is an explicit conditional edge — testable in isolation without a live LLM.
Hard limits enforced by the state machine before any LLM call: max 4 hops, confidence ≥ 0.70, 30s wall-clock timeout, $0.05 cost cap. Guard order in node_evaluating: empty evidence → max hops → timeout → cost cap → LLM call → post-call cost re-check → confidence gate. The LLM cannot route around any of them.
Every tool call is written to tool_execution_log in a finally block — covering success and failure. Evidence chain reads exclusively from this table, not from state.evidence_chain which the LLM populates. Built in 3 milestones: M1 (state machine + Docker stack, mock tools), M2 (6 real tools + gpt-4o-mini), M3 (FastAPI, Kafka consumer, Redis mutex, Streamlit dashboard, Prometheus).

Why these tradeoffs — each came from a real failure during the build.

Hard limits in node_evaluating, not the routing edge. Edge functions return a route but cannot write state. Putting the limit check in the edge leaves escalation_reason as None — the compliance report INSERT has a required-field constraint that fires. One enforcement site only: the node writes the reason, the edge only reads it.
Evidence chain from tool_execution_log, not state.evidence_chain. The LLM populates state.evidence_chain during inference. If it hallucinates a tool call, the compliance report contains fabricated evidence. tool_execution_log is written in the finally block of dispatch_tool before the result reaches the LLM — it is ground-truth regardless of LLM behavior.
tool_input_json: str not dict[str, Any]. OpenAI structured output rejects open-ended dict types. The LLM serializes tool input as a JSON string; the dispatcher deserializes with json.loads. Strict schema enforcement is a feature — it prevents the LLM from inventing fields that don't map to any real tool parameter.
LLM fallback is escalation, never a default tool call. The original fallback on LLM parse failure called velocity_check with account_id = payload.txn_id — a wrong type that returns 0 transactions, making the investigation look clean. In AML, a false-clean verdict is categorically worse than a false escalation. Safe failure = immediate ESCALATED.

10 runs · 3 scenarios · gpt-4o-mini · real OpenAI API — 2026-04-18, WSL2 (10GB RAM, 8 cores).

5.7s

p50 investigation latency

13.1s

p95 investigation latency

$0.0003

avg cost / investigation

1.0

avg hops — seeded data resolves fast; adversarial inputs push toward the 4-hop ceiling

What actually broke — and what I learned from fixing it.

Float precision kills compliance thresholds. A confidence of 0.70 in IEEE 754 can be stored as 0.6999999999999999 — triggering a false low_confidence escalation. Fixed with round(confidence, 4) before every threshold comparison. Never compare raw floats to exact decimal values in compliance logic.
Seeded data resolves in 1 hop every time. The 3 demo scenarios aren't adversarial enough to force multi-hop LLM reasoning — the agent calls one tool, finds sufficient evidence, resolves. Avg 1.0 hops is correct but not a meaningful measure of multi-hop capability. Real demonstration of the 4-hop ceiling needs fresh, live transaction payloads via the API.
Module-level DB pool init poisons the test session. Pool init at import time runs during pytest collection — if the DB isn't running, the pool is set to None permanently for that session. Switched to lazy init so the pool is only attempted when a tool is actually invoked. Applies to any module-level singleton that requires infrastructure.
The first bottleneck at 10x load isn't Postgres — it's the OpenAI rate limit. gpt-4o-mini has a requests-per-minute cap; each investigation makes 2–4 LLM calls. Fix: async LangGraph nodes with a token-bucket rate limiter and Kafka consumer back-pressure rather than queuing investigations in memory.

SYSTEM ARCHITECTURE

How the data actually flows.

Rule Engine

velocity · watchlist

ML Scoring

XGBoost · score > 0.75

Direct API

POST /investigate

Kafka Consumer

flagged-txns topic

Agent Runner

Redis mutex per txn

LangGraph

IDLE → RESOLVED

tool_execution_log

every hop · ms

compliance_reports

verdict · evidence chain

escalation_queue

human analyst queue

ingest

route / dedupe

store

serve

— hot path animated · cold path dashed

LIVE DEMO · AML INVESTIGATION

Flag a transaction. Watch it investigate.

aml-agent / investigation

Each "flag" triggers an investigation with a unique transaction ID. The agent runs multi-hop tool calls and produces a verdict. Duplicate flags for the same transaction are blocked by a Redis mutex.

01 KAFKA · flagged-txns

02 REPORTS · postgres

investigations triggered0

mutex blocks0

reports written0

total cost$0.0000

avg investigation latency— ms

How it works: Each flagged transaction carries a unique ID. The agent runner checks a Redis mutex — duplicate investigations are blocked instantly. Resolved cases write a compliance report with verdict, confidence score, and full evidence chain.

/NOW · LIVE

What I'm actually on.

This month

UPDATED — —

SHIPPING

Payment ledger v3.2 — exactly-once consumer

Cutover happens this week, behind a flag at 5%.

wk 19
BUILDING

RAG eval harness for support transcripts

Tracking groundedness vs. retrieval-recall on a 4k-row golden set.

wk 17–20
BUILDING

Open-source: kafka-replay-cli

Tiny tool to re-emit a partition window into a dead-letter topic.

side
READING

Designing Data-Intensive Applications, ch. 11

Stream processing — re-reading because v3.2 review went sideways.

ongoing

site uptime · 90d

99.97%

▲ 0.02% vs prev

commits / week

▲ 12 vs avg

coffees today

▲ 1 above limit

CERTIFICATIONS

Credentials that shaped how I build.

Scroll to flip through the stack — or click any peeking card to bring it forward.

01 · Google · Coursera

Data Analytics Professional

SQL, spreadsheets, R, Tableau, and data cleaning at scale. The analyst's foundation behind every pipeline I ship.

Completed8 coursesCapstone

02 · Udemy

Machine Learning A-Z

Supervised and unsupervised learning, Python and R. 43 hours of hands-on ML — from regression to neural nets.

Feb 202643 hrsCompleted

03 · Yale University · Coursera

Financial Markets

Risk, behavioral finance, bonds, and options — taught by Robert Shiller. The theory behind the fintech systems I build.

CompletedNon-creditYale

04 · NY Institute of Finance · Coursera

Risk Management

Best-practice frameworks to measure, assess, and manage organizational risk. Applied directly to how I think about fault-tolerant system design.

CompletedSpecializationNYIF

05 · LangChain Academy

Introduction to LangChain — Python

Foundation track: building LLM applications with LangChain — chains, prompts, and the orchestration layer behind the AI systems I ship.

CompletedFoundationPython

01 / 05

CONNECT

Github

Email ayushsam3@gmail.com

Building intelligent systems for finance, analytics, and real-world decision making. Trying to build useful things that people actually need and hopefully keep using.

The stack I The stack I actually reach for.actually reach for.

Autonomous AML Investigation Agent

Payment System

Real-Time Data Streaming

Regulatory Intelligence Agent

AI OS Finance

A real system, A real system, told properly.told properly.

Autonomous AML investigation agent — multi-hop reasoning over 6 deterministic tools

AML alert queues pile up because every flag demands a manual multi-system investigation.

A LangGraph state machine with 6 deterministic tools — LLM reasons, tools execute.

Why these tradeoffs — each came from a real failure during the build.

10 runs · 3 scenarios · gpt-4o-mini · real OpenAI API — 2026-04-18, WSL2 (10GB RAM, 8 cores).

What actually broke — and what I learned from fixing it.

How the data How the data actually flows.actually flows.

Flag a transaction. Flag a transaction. Watch it investigate.Watch it investigate.

aml-agent / investigation

What I'm What I'm actually on.actually on.

This month

Credentials that Credentials that shaped how I build.shaped how I build.

Data Analytics Professional

Machine Learning A-Z

Financial Markets

Risk Management

Introduction to LangChain — Python

The stack I actually reach for.

A real system, told properly.

How the data actually flows.

Flag a transaction. Watch it investigate.

What I'm actually on.

Credentials that shaped how I build.