# Stop Request Stampedes at the Gateway with Platformatic Deduplication

Picture an online store launching a new product and sending out a mailing list campaign. Thousands of users click the same link at once. The product page, built with a Node.js app like Next.js, needs to fetch the same product details, inventory, recommendations, and pricing for each request.

If the page is cached, everything works smoothly. Problems begin when the cache is empty, expired, or being refreshed. Then, the first group of users all miss the cache together, and every request asks the app to generate the same response. (This same thing would also happen with a trending news story, a search crawler, frontend prefetching, or after a cache reset.)

This situation is known as the “thundering herd” problem. Every user request is valid, but the work gets repeated. Your app uses CPU, database, and network resources to calculate the same result over and over, just when response times are already strained.

Rather than sending all traffic straight to your app, wouldn’t it be great if you could place a gateway in front that spots duplicate in-flight reads and combines them before they hit Node.js? 

Platformatic Gateway now does exactly this. With request deduplication, it merges concurrent requests for the same data. Only one goes upstream, while the others wait and then get the same response.

This is not a replacement for caching. Instead, it acts as a short-term coordination layer for requests in progress. The cache handles future requests, while deduplication shields your app while the first response is still being generated.

This is especially important for self-hosted Next.js apps. A popular route can cause heavy server rendering, React Server Component processing, image metadata checks, or backend API calls. When many users hit that route at once, or during cache refreshes, deduplication stops the gateway from sending the same work to Next.js over and over.

* * *

## **A Local Benchmark**

To see how much this helps, we ran a simple local test with a purposely slow route behind a proxy. The upstream route waited 100 ms before responding, simulating a page or API call that needs backend work. The test used the same leader/waiter pattern as gateway deduplication and sent 100 requests at once to the same URL.

These are the median numbers from three runs:

<table style="width: 685px;"><colgroup><col style="width: 142px;"><col style="width: 101px;"><col style="width: 114px;"><col style="width: 114px;"><col style="width: 123px;"><col style="width: 91px;"></colgroup><tbody><tr><td colspan="1" rowspan="1" colwidth="142"><p><strong>Scenario</strong></p></td><td colspan="1" rowspan="1" colwidth="101"><p><strong>Client requests</strong></p></td><td colspan="1" rowspan="1" colwidth="114"><p><strong>Upstream requests</strong></p></td><td colspan="1" rowspan="1" colwidth="114"><p><strong>Average latency</strong></p></td><td colspan="1" rowspan="1" colwidth="123"><p><strong>p99 latency</strong></p></td><td colspan="1" rowspan="1" colwidth="91"><p><strong>Errors</strong></p></td></tr><tr><td colspan="1" rowspan="1" colwidth="142"><p>Without deduplication</p></td><td colspan="1" rowspan="1" colwidth="101"><p>1,000</p></td><td colspan="1" rowspan="1" colwidth="114"><p>1,000</p></td><td colspan="1" rowspan="1" colwidth="114"><p>111.31 ms</p></td><td colspan="1" rowspan="1" colwidth="123"><p>134 ms</p></td><td colspan="1" rowspan="1" colwidth="91"><p>0</p></td></tr><tr><td colspan="1" rowspan="1" colwidth="142"><p>With deduplication</p></td><td colspan="1" rowspan="1" colwidth="101"><p>1,000</p></td><td colspan="1" rowspan="1" colwidth="114"><p>10</p></td><td colspan="1" rowspan="1" colwidth="114"><p>104.88 ms</p></td><td colspan="1" rowspan="1" colwidth="123"><p>127 ms</p></td><td colspan="1" rowspan="1" colwidth="91"><p>0</p></td></tr><tr><td colspan="1" rowspan="1" colwidth="142"><p>Without deduplication</p></td><td colspan="1" rowspan="1" colwidth="101"><p>10,000</p></td><td colspan="1" rowspan="1" colwidth="114"><p>10,000</p></td><td colspan="1" rowspan="1" colwidth="114"><p>104.80 ms</p></td><td colspan="1" rowspan="1" colwidth="123"><p>122 ms</p></td><td colspan="1" rowspan="1" colwidth="91"><p>0</p></td></tr><tr><td colspan="1" rowspan="1" colwidth="142"><p>With deduplication</p></td><td colspan="1" rowspan="1" colwidth="101"><p>10,000</p></td><td colspan="1" rowspan="1" colwidth="114"><p>100</p></td><td colspan="1" rowspan="1" colwidth="114"><p>102.91 ms</p></td><td colspan="1" rowspan="1" colwidth="123"><p>106 ms</p></td><td colspan="1" rowspan="1" colwidth="91"><p>0</p></td></tr></tbody></table>

The key result isn’t the slight change in average latency, but the number of upstream requests. With deduplication, the proxy still answers every client, but the upstream app only processes one response per wave of requests. In a test with 10,000 requests, that meant just 100 upstream responses instead of 10,000. In real situations, this can mean the difference between a burst that overwhelms your app and one that the gateway handles smoothly.

* * *

## **How It Works**

Gateway deduplication uses a leader/waiter model that is easy to reason about in production.

The first matching request becomes the leader. It acquires a lock, goes to the upstream application, buffers the response, stores it for a short time, and notifies any waiters. Concurrent requests with the same key become waiters. They do not call the upstream service immediately; instead, they wait for the leader response and replay it when it becomes available.

In a production gateway, deduplication often sits next to caching. You can use separate Valkey instances or separate key prefixes in the same Valkey deployment, but the two stores serve different purposes: cache storage keeps reusable responses, while deduplication storage keeps short-lived locks and response buffers for in-flight requests.

![](https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/87504c18-e5e8-4763-a428-95b8508848ea.png align="center")

To prevent deadlocks, every coordination point has an expiration. The leader lock uses a `lockTtl`, waiters have a `timeout`, and retries are limited. If the leader fails, the lock expires, a waiter times out, or retries run out, the request switches back to normal proxying. This fallback is intentional—deduplication is meant to lower load, not cause requests to get stuck during traffic spikes.

![](https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/9fe7db00-4342-4464-b99c-f16306f36587.png align="center")

For a single request, the decision path looks like this:

![](https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/60845246-b8d2-4ce5-8d12-2707ea319098.png align="center")

* * *

## **The Simplest Configuration**

Enable deduplication globally under `gateway.deduplication:`

```json
{
  "gateway": {
    "deduplication": {
      "enabled": true
    },
    "applications": [
      {
        "id": "frontend",
        "proxy": {
          "prefix": "/"
        }
      }
    ]
  }
}
```

By default, deduplication works for `GET` and `HEAD` requests and uses `memory` for storage.

This default is intentionally cautious. `GET` and `HEAD` are the safest types of requests to deduplicate. Write requests often need custom rules before they can be safely coordinated.

* * *

## **Per-Application Overrides**

You can also configure deduplication per proxied application with `gateway.applications[].proxy.deduplication`. Application-level options override the global options.

```json
{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "methods": ["GET"]
   },
   "applications": [
     {
       "id": "frontend",
       "proxy": {
         "prefix": "/",
         "deduplication": {
           "enabled": true,
           "routes": [{ "method": "GET", "path": "/blog/*" }]
         }
       }
     }
   ]
 }
}
```

This approach lets you begin where the benefits are clear. You can deduplicate public catalogue pages, blog posts, product details, or framework prefetch routes, while keeping endpoints with strict per-user behaviour unchanged.

* * *

## **Choosing The Deduplication Key**

The default key is computed from:

*   the configured application origin
    
*   the HTTP method
    
*   the rewritten proxy URL, including the query string
    
*   selected request headers
    

The default headers are:

```plaintext
["authorization", "cookie", "accept", "accept-language"]
```

Including headers matters because many read responses are not only a function of the URL. A localized page can depend on accept-language. A user-specific page can depend on `cookie` or `authorization`. If those headers were ignored, unrelated callers could incorrectly share a response.

You can adjust the headers included in the key for a deduplication configuration:

```json
{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "headers": ["authorization", "cookie", "x-tenant-id"]
   }
 }
}
```

Currently, you can’t set headers per route. If you need different header behaviour for different routes, use separate deduplication settings for each application or create a custom key function.

For full control, provide a synchronous key function:

```json
{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "key": "./deduplication-key.js"
   }
 }
}
```

```javascript
export function computeDeduplicationKey(request, context) {
 return `${context.origin}:${context.method}:${context.url}`
}
```

The function receives the request and a context object containing the origin, method, rewritten URL, parsed query, selected headers, and application configuration. It must return the key synchronously.

* * *

## **Route Whitelisting**

For tighter control, configure a route whitelist. Routes use [find-my-way](https://github.com/delvedor/find-my-way) syntax.

```json
{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "routes": [
       { "method": "GET", "path": "/blog/*" },
       { "methods": ["GET", "HEAD"], "path": "/products/:id" }
     ]
   }
 }
}
```

When `routes` are configured, route matching decides whether deduplication applies. When `routes` are not configured, the `methods` list decides.

* * *

## **Storage: Memory Or Valkey**

TBy default, memory is used for storage. It handles duplicate requests within a single gateway instance and doesn’t need any external service. This setup is great for local development, single-instance deployments, and easy rollouts.

```json
{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "storage": {
       "adapter": "memory"
     }
   }
 }
}
```

For deployments that scale horizontally, use the valkey adapter. It stores locks, response pointers, and buffered responses in a Redis-compatible Valkey server, allowing multiple gateway workers, instances, or pods to coordinate. This way, you get the same deduplication benefits even when traffic is spread across several replicas.

![](https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/860d66e5-8f7d-4f7a-a2e3-c04094af5abf.png align="center")

```json
{
 "gateway": {
   "deduplication": {
     "enabled": true,
     "storage": {
       "adapter": "valkey",
       "url": "redis://127.0.0.1:6379",
       "prefix": "my-application"
     }
   }
 }
}
```

Use a `prefix` if several applications share the same Valkey instance and need separate key spaces.

* * *

## **Operational Behavior**

Deduplication is a best-effort feature, not a guarantee of exactly-once processing.

Duplicate upstream requests can still occur if the in-flight lock expires before the upstream response is ready, if a gateway instance fails while handling the leader request, if a waiter times out, or if retries run out. In these cases, the gateway just switches back to normal proxying.

The main timing options are:

*   `timeout`: how long a duplicate request waits for the leader response before retrying lock acquisition
    
*   `retries`: how many additional deduplication attempts are made before falling back to normal proxying
    
*   `ttl`: how long stored responses remain available for waiting requests
    
*   `lockTtl`: how long an in-flight lock can live before it expires
    

Defaults are:

```json
{
 "timeout": 1000,
 "retries": 3,
 "ttl": 10000,
 "lockTtl": 500
}
```

Responses are fully buffered before being replayed. This works well for short bursts of duplicate reads, but large responses can use more gateway memory and, with Valkey, add some storage overhead. It’s best to start with routes where responses are small and where repeated upstream work is already costing you in money, speed, or capacity.

* * *

## **Custom Gateway Handlers**

Deduplication comprises custom gateway handlers. When both a custom handler and deduplication are configured, `deduplication` runs first, and the leader request is delegated to the handler.

Handlers that use `reply.from()` do not need special handling. Platformatic Gateway uses `reply.from()` from [@fastify/reply-from](https://github.com/fastify/fastify-reply-from) to proxy upstream requests.

```javascript
export function handler(request, reply, dest, options) {
 return reply.from(dest, options)
}
```

If a handler overrides `onResponse` or `onError`, it can call the helper functions provided in `options`, so waiting requests still receive the correct signal:

```javascript
export function handler(request, reply, dest, options) {
 return reply.from(dest, {
   ...options,
   async onResponse(request, reply, res) {
     reply.header('x-custom-handler', 'true')
     return options.deduplicateResponse(request, reply, res)
   },
   async onError(reply, error) {
     return options.deduplicateError(reply, error)
   }
 })
}
```

Handlers that send responses directly without `reply.from()` cannot be replayed by gateway deduplication.

* * *

## **Metrics**

The feature also adds Gateway metrics so you can prove whether deduplication is helping in production:

*   `gateway_deduplication_leader_count`
    
*   `gateway_deduplication_waiter_count`
    
*   `gateway_deduplication_replay_count`
    
*   `gateway_deduplication_fallback_count`
    
*   `gateway_deduplication_error_count`
    

These counters help answer real-world questions: how many requests became leaders, how many waited, how many were replayed, and how often the gateway had to switch back to normal proxying.

* * *

## **Conclusion**

Gateway deduplication works best when lots of clients request the same resource at once, and the upstream response can be safely reused for matching keys. This is just like the product launch or hot-news example from earlier: many users show up at once, the cache is cold or refreshing, and your app is about to generate the same page over and over.

The best places to start are public, read-heavy routes, framework prefetch endpoints, cache refresh paths, and expensive upstream reads with limited response sizes. Turn it on for a small set of routes first. Keep an eye on the leader, waiter, replay, and fallback metrics. Then, expand to other routes where you see duplicate work causing the most trouble.

You’ll see results right away: the gateway handles traffic spikes, your services do less repeated work, and users get more consistent response times during busy periods.

This kind of optimization adds up over time. You protect your upstream resources without changing your app code. You cut down on unnecessary backend load before it hits your databases, APIs, or rendering services. You also get metrics to see if the feature is delivering value. And if deduplication can’t help in a certain case, the request just goes through as usual.

For teams using Platformatic Gateway in front of modern web apps, especially self-hosted Next.js apps, request deduplication is a practical way to make read-heavy traffic more manageable. It gives you a safety net for traffic bursts, an easier way to scale with Valkey, and a rollout approach that doesn’t require changing your backend services.
