Skip to main content

Command Palette

Search for a command to run...

Building an Auditable AI Gateway with Platformatic Watt

Published
11 min read
Building an Auditable AI Gateway with Platformatic Watt

Every engineering team that adopts AI quickly hits the same wall: a simple provider integration that worked for a demo turns into an operational bottleneck at scale. Tracking usage, containing costs, and keeping an audit trail across growing models and teams can slip out of reach fast. AI features are moving fast, but production teams still need the same thing they have always needed: not just control, but auditability.

That is exactly what ai-gateway-auditable delivers: an OpenAI-compatible gateway built with Platformatic Watt that combines provider routing, fallback resiliency, and durable audit logging to S3.

For production teams, this translates directly into risk reduction and regulatory readiness: your audit trail is always preserved, and resilient routing keeps incidents contained. In real terms, this leads to fewer lost logs or broken provider integrations (and fewer 3 a.m. pages as a result), and reliable evidence when you need to answer compliance or security reviews.

This architecture is not only production-ready, but already operating a scale for one of our early adopters. One application (proxy) serves traffic, while another (audit worker) persists audits, and a durable queue between them keeps latency low while preserving records, using the filesystem to provide durability. This same early-adopter halved its application latency using this pattern with Watt. With clear audit trails and resilient traffic handling, they were able to trace errors quickly and keep their on-call load under control, while giving their LLM-enabled end-users performance that approached parity with direct API calls, which was critical for serving their real-time use cases.

Source code: github.com/platformatic/ai-gateway-auditable

Why this matters now

The direct integration pattern is usually the first-stop for teams, but often leads to audit-trace gaps. Finance needs clean attribution by key or team, security needs auditable traces of model interactions, and product needs stronger uptime when upstream providers degrade.

As a real-world example, our same early adopter saw this with their initial production rollout, which missed up to 15% of request logs during peak volume, and causing request latency to spike by more than 2x when provider response times flared. At the same time, you want a single, stable integration surface instead of scattering provider-specific logic across multiple services. An AI gateway is where all your needs converge into a single, manageable control point.

With ai-gateway-auditable, every request has a clear path, every response is traceable, and fallback behavior is visible instead of opaque.

Why Watt

Platformatic Watt is well-suited to this pattern because it lets us run the API-facing proxy and the audit worker as separate applications with a shared operational model, using them as worker threads. That separation is the foundation of reliability here: the proxy can stay focused on low-latency responses, while the worker can focus on durable queue consumption, batching, and S3 shipping.

Most importantly, this design is tolerant of worker crashes. Watt supervises applications (worker threads), so if an audit worker crashes, it is automatically restarted, and unhealthy workers are automatically replaced. During that window, the proxy can keep accepting requests and persisting audit jobs in FileStorage. When the replacement worker is up, it resumes consuming from the same queue path and drains pending jobs.

The result is graceful degradation rather than data loss: temporary worker failures increase audit lag but do not break the request path or discard audit events. This distinction is critical from a business perspective. Losing audit data can put regulatory compliance at risk and expose the company to possible fines or a loss of trust, while a short delay in audit processing only postpones analysis or reporting. In other words, our design trades brief insight delays for the certainty that no evidence is lost.

Why filesystem-based storage

We use filesystem-backed queue storage on purpose. Writing audit jobs to local disk is crash-tolerant because queued data survives process failures and restarts, unlike in-memory buffers.

It also keeps resource usage and request-path performance under control. We do not need to retain full audit payloads in memory awaiting for remote writes, and we do not put every request on the critical path of an external storage service. That removes network latency and remote availability as immediate blockers to request handling, while still providing durable buffering before batches are shipped to S3.

Architecture at a glance

The system runs as two applications (threads) inside of Platformatic Watt, the Node.js application server.

The proxy is optimized for low-latency request/response flow, while the audit-worker is optimized for durability, retries, and batch shipping. Keeping these concerns separate avoids a common failure mode: heavy audit I/O slowing down user-facing traffic.

How do the two applications communicate? Through the same FileStorage queue path on disk. proxy writes audit jobs to ./data/queue at the same rate as local queue operations, and audit-worker consumes those jobs independently in the background. This gives you explicit producer/consumer decoupling: the request path does not wait for S3 uploads, retries, or batch rotation. If the worker restarts, queued jobs remain on disk and are resumed when it comes back. If S3 is slow or temporarily unavailable, jobs continue to accumulate durably in the queue instead of being lost or pushing latency back to callers.

In other words, even when storage is under pressure or S3 is temporarily unavailable, the gateway can keep serving requests while the audit pipeline catches up safely in the background.

What the gateway gives you

At a product level, this gateway provides four strong guarantees:

  1. OpenAI Completions compatible endpoint (/v1/chat/completions) for clients and SDKs.

  2. Model-based routing with fallback across providers.

  3. Complete request/response audit records for every successful exchange.

  4. Durable archival to S3 with batched JSONL files partitioned by time (JSON Lines is a text file format where each line is a valid, independent JSON object, separated by newline characters).

This means reduced provider lock-in, minimized operational risks, and heightened observability.

Service responsibilities

The key behavior is role decoupling: proxy only produces queue jobs, while audit-worker handles all downstream storage and shipping work.

proxy (external entrypoint)

proxy exposes:

  • GET /health

  • POST /v1/chat/completions

For each request, it:

  1. Selects a provider chain based on model routing rules.

  2. Executes upstream calls with fallback on retryable failures.

  3. Returns the upstream response to the client.

  4. Enqueues an audit payload into the shared durable queue.

audit-worker (internal service)

audit-worker is an internal Node application with no HTTP API (hasServer = false).

It owns the full audit persistence path:

  • queue consumption with @platformatic/job-queue

  • durable local buffering with FileStorage

  • batched JSONL writing

  • S3 uploads signed with AWS SigV4.

Queue settings used in the current implementation:

  • concurrency: 1

  • maxRetries: 3

  • resultTTL: 60_000

  • visibilityTimeout: 30_000

This is optimized for predictable sequential writes and safe retry semantics. Filesystem queue storage is chosen because it needs no external setup (no Redis/Valkey), making local development and single-node production rollouts much simpler. At the same time, it still provides crash resilience: queue state is persisted to disk, so in-flight and pending audit jobs survive process restarts.

That combination is the key trade-off here: you gain operational simplicity and zero external dependencies, without sacrificing durability for the audit trail. Note that adopting the file system exposes teams to the risk of data loss. Moving the auditability trail back to the main response cycle will introduce latency and cause a hard failure if the audit cannot be completed. The tradeoff, as always, is in the hands of engineers: availability or consistency?

Routing and fallback configuration

Routing lives in providers.json and uses two lists:

  • providers: upstream connection and adapter definitions

  • routing: per-model routing rules with ordered provider chains

{
 "providers": [
   {
     "id": "openai",
     "type": "openai",
     "baseUrl": "https://api.openai.com",
     "apiKey": "{OPENAI_API_KEY}"
   },
   {
     "id": "anthropic",
     "type": "anthropic",
     "baseUrl": "https://api.anthropic.com",
     "apiKey": "{ANTHROPIC_API_KEY}"
   }
 ],
 "routing": [
   {
     "id": "gpt-4o",
     "providers": ["openai"],
     "strategy": "fallback"
   },
   {
     "id": "claude-sonnet-4-6",
     "providers": ["anthropic"],
     "strategy": "fallback"
   },
   {
     "id": "*",
     "providers": ["openai"],
     "strategy": "fallback"
   }
 ]
}

Environment variables like {OPENAI_API_KEY} are resolved from process env at startup.

Fallback behavior is explicit and policy-driven: by exposing a clearly configurable list of retryable statuses, teams can align gateway failover with internal governance or incident playbooks. For example, you can tune which upstream failures (such as 429, 500, 502, 503, 504) trigger fallback based on your own risk, compliance, or incident response thresholds. This mapping between config and governance means compliance and security teams can review and pre-approve response handling in line with internal standards—a step that accelerates approval and audit-readiness.

  • retryable statuses: 429, 500, 502, 503, 504

  • Connection failures are retryable

  • Non-retryable responses (400, 401, 403) are returned immediately.

If you want delegated provider orchestration, you can configure OpenRouter as an openai-type provider and route * traffic to it.

Adapter model: one external contract, many upstreams

The gateway keeps a single OpenAI-compatible API surface, while adapters normalize provider differences behind the scenes.

  • OpenAI adapter supports OpenAI-compatible endpoints, including Azure/OpenRouter-compatible APIs.

  • The anthropic adapter translates OpenAI chat requests and responses to Anthropic Messages API semantics.

This removes provider-specific branching logic from your application layer.

Streaming support with full audit fidelity

Streaming UX matters, so the proxy preserves token-by-token delivery.

For stream: true requests, the proxy:

  1. Pipes SSE chunks to the client in real time.

  2. Buffers chunks internally.

  3. Reconstructs a complete Chat Completions response.

  4. Emits a single audit record with streamed set to true.

Users get low-latency streaming, and operators still get complete records for replay and analysis.

Audit record shape

Each JSONL line is a complete record with request, response, latency, caller hash, status, and routing metadata:

{
 "id": "a8f3b2c1-...",
 "timestamp": "2026-03-03T11:44:00.000Z",
 "duration_ms": 1243,
 "request": {
   "model": "gpt-4o",
   "messages": [{ "role": "user", "content": "Hello" }]
 },
 "response": {
   "id": "chatcmpl-...",
   "choices": [{ "message": { "role": "assistant", "content": "Hi!" } }]
 },
 "upstream_status": 200,
 "caller": "7a3f2b1c",
 "streamed": false,
 "routing": {
   "model": "gpt-4o",
   "planned_providers": [{ "id": "openai", "status": 200, "duration_ms": 1200 }],
   "used_provider": "openai"
 }
}

The caller is an 8-character SHA-256 prefix of the bearer token value, so attribution is possible without storing raw API keys.

Durable audit pipeline in detail

Inside the request path, proxy enqueues each payload using the request ID as the job ID, which naturally supports deduplication when IDs repeat.

audit-worker consumes those jobs and writes them into local JSONL batches before upload.

The writer then:

  1. Appends each record as one JSON line to a local batch file using flush semantics.

  2. Rotates to a new batch when the size or time threshold is reached.

  3. Uploads the batch file to S3 using undici and SigV4 headers.

  4. Deletes local batch files only after successful upload.

Current thresholds:

  • BATCH_SIZE = 100

  • FLUSH_INTERVAL_MS = 5000

S3 object keys are hour-partitioned for downstream querying:

audits/2026/03/03/11/batch-1741003090000-3bb7....jsonl

This structure works well with tools like Athena and other data lake pipelines.

Operating under failure

The gateway is intentionally designed to degrade gracefully.

Typical architectural components here include the file-backed queue directory (such as ./data/queue), which serves as the communication bridge between the proxy and the audit-worker; single-node deployment support via Platformatic Watt's supervised applications; and a default S3 bucket for audit archives. Core configuration files like providers.json define routing logic and provider chains, while runtime environment variables control credentials and logging. All of these components work together as the durable, fault-tolerant foundation that keeps this architecture reliable at scale. This keeps user-facing availability high while preserving eventual audit consistency.

Run it locally

git clone https://github.com/platformatic/ai-gateway-auditable.git
cd ai-gateway-auditable
npx wattpm-utils install
docker compose up

Then call the gateway with any OpenAI-compatible client or a simple curl:

curl http://localhost:3042/v1/chat/completions \
 -H 'Content-Type: application/json' \
 -H 'Authorization: Bearer sk-your-key' \
 -d '{
   "model": "gpt-4o",
   "messages": [{"role": "user", "content": "Hello"}]
 }'

Final take

ai-gateway-auditable is a practical pattern for teams that need to move fast with AI and still satisfy the operational norms of production software. It gives you:

  • one consistent API surface with clear fallback behavior,

  • complete and queryable audit trails, and a clean separation between serving traffic and persisting evidence.

If your roadmap includes multi-provider AI, compliance requirements, or strict SRE expectations, this architecture is ready to adopt and extend.

The easiest way to get started is to fork the repo, run the quick-start commands, and see the gateway in action with your own test requests. Try spinning up the service locally and sending a sample call: this practical step will show you right away how auditable AI operations can be within your own workflow.

Happy building!