<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Platformatic Blog]]></title><description><![CDATA[Platformatic Blog]]></description><link>https://blog.platformatic.dev</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1727183405468/9f1d8161-aee0-4422-af77-111e9ea87aef.png</url><title>Platformatic Blog</title><link>https://blog.platformatic.dev</link></image><generator>RSS for Node</generator><lastBuildDate>Sat, 11 Apr 2026 17:02:08 GMT</lastBuildDate><atom:link href="https://blog.platformatic.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[@platformatic/kafka Now Supports Confluent Schema Registry ]]></title><description><![CDATA[If you run Kafka in production, you can’t skip schema evolution. Teams need clear data types, compatibility checks, and a safe way to update contracts without breaking consumers or downstream services]]></description><link>https://blog.platformatic.dev/platformatic-kafka-confluent-schema-registry-support</link><guid isPermaLink="true">https://blog.platformatic.dev/platformatic-kafka-confluent-schema-registry-support</guid><category><![CDATA[kafka]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[Devops]]></category><category><![CDATA[json]]></category><dc:creator><![CDATA[Paolo Insogna]]></dc:creator><pubDate>Tue, 07 Apr 2026 14:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/8c08d7dc-515b-4506-94f1-70c6ef98d1b2.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you run Kafka in production, you can’t skip schema evolution. Teams need clear data types, compatibility checks, and a safe way to update contracts without breaking consumers or downstream services.</p>
<p>Before now, using <code>@platformatic/kafka</code> with Confluent Schema Registry meant writing extra code to connect the pieces. With <code>@platformatic/kafka</code> <strong>v1.27.0</strong>, that’s no longer needed.</p>
<p><code>@platformatic/kafka</code> now has built-in support for Confluent Schema Registry, including:</p>
<ul>
<li><p>AVRO</p>
</li>
<li><p>Protocol Buffers</p>
</li>
<li><p>JSON Schema</p>
</li>
<li><p>Basic and Bearer authentication</p>
</li>
<li><p>Automatic schema fetch and caching</p>
</li>
<li><p>Integrated Producer and Consumer hooks</p>
</li>
</ul>
<p>You get schema-aware messaging, and the project still focuses on being fast and predictable for Node.js Kafka clients.</p>
<h2><strong>Why This Matters</strong></h2>
<p>Most schema registry integrations add complexity where you don’t want it: in the message serialization and deserialization paths. Fetching remote schemas is asynchronous, but encoding and decoding should stay synchronous for speed and consistency.</p>
<p>Put simply, network I/O and cache coordination should happen before the main data processing, not during it. Keeping these steps separate helps maintain stable throughput and latency as traffic increases.</p>
<p>This release introduces a two-layer architecture to keep that separation clear:</p>
<ol>
<li><p><strong>Low-level hooks</strong> for async pre-processing:</p>
<ul>
<li><p><a href="https://github.com/platformatic/kafka/blob/main/docs/producer.md">beforeSerialization</a></p>
</li>
<li><p><a href="https://github.com/platformatic/kafka/blob/main/docs/consumer.md">beforeDeserialization</a></p>
</li>
</ul>
</li>
<li><p><strong>High-level registry API</strong> via <a href="https://github.com/platformatic/kafka/blob/main/docs/confluent-schema-registry.md">ConfluentSchemaRegistry</a></p>
</li>
</ol>
<p>In practice, this means schemas are fetched and cached before encode/decode happens, so your serializers and deserializers stay synchronous when messages are processed.</p>
<p>This gives application teams a simpler way to think about things: do the asynchronous prep first, then keep codec behavior predictable during main processing.</p>
<p>At a high level, the flow is:</p>
<ul>
<li><p>Extract schema ID from message metadata (producer) or wire payload (consumer).</p>
</li>
<li><p>Resolve schema from local cache when available.</p>
</li>
<li><p>On cache miss, fetch asynchronously via <code>beforeSerialization/beforeDeserialization</code> hooks and cache the schema.</p>
</li>
<li><p>Run synchronous serialization/deserialization with the resolved schema.</p>
</li>
</ul>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/043ca54f-3808-4811-8e47-b772feb30666.png" alt="" style="display:block;margin:0 auto" />

<p>In multi-instance deployments, that cache layer can be backed by Redis or Valkey, so workers share schema state across nodes while keeping encode/decode synchronous in the hot path.</p>
<h2><strong>What You Can Do Now</strong></h2>
<p>You can connect a registry directly to both the Producer and Consumer, letting <code>@platformatic/kafka</code> handle schema-aware serialization from start to finish.</p>
<p>This is especially helpful when several services publish and consume the same topics on different deployment cycles, since consistent schema handling is a must.</p>
<pre><code class="language-javascript">import { Consumer, Producer } from '@platformatic/kafka'
import { ConfluentSchemaRegistry } from '@platformatic/kafka/registries'

const registry = new ConfluentSchemaRegistry({
  url: 'http://localhost:8081'
})

const producer = new Producer({
  clientId: 'orders-producer',
  bootstrapBrokers: ['localhost:9092'],
  registry
})

const consumer = new Consumer({
  groupId: 'orders-consumers',
  clientId: 'orders-consumer',
  bootstrapBrokers: ['localhost:9092'],
  registry
})
</code></pre>
<p>When producing, pass schema IDs in message metadata:</p>
<pre><code class="language-javascript">await producer.send({
  messages: [
    {
      topic: 'orders',
      key: { orderId: 101 },
      value: { customerId: 'cust-44', total: 129.99 },
      metadata: {
        schemas: {
          key: 10,
          value: 11
        }
      }
    }
  ]
})
</code></pre>
<p>When consuming, payloads are automatically decoded with the cached schema. If a schema isn’t found, the registry fetches it before deserialization continues.</p>
<p>This makes it easy to move from custom codec code to a single registry integration in your client setup.</p>
<h2><strong>Authentication and Enterprise Scenarios</strong></h2>
<p>Schema Registry deployments are often protected. The new integration includes:</p>
<ul>
<li><p>Basic auth (<code>username</code> + <code>password</code>)</p>
</li>
<li><p>Bearer token auth (<code>token</code>)</p>
</li>
<li><p>Dynamic credentials via providers</p>
</li>
</ul>
<p>This makes it easier to connect to managed or secured registry instances without writing custom transport code. It also makes credential rotation simpler when you use providers.</p>
<p>If your setup uses short-lived credentials, provider functions let you refresh tokens and secrets without having to rebuild your producer or consumer logic.</p>
<h2><strong>Performance and Reliability Considerations</strong></h2>
<p>One main design goal was to avoid unnecessary overhead to message processing.</p>
<p>The implementation focuses on cache locality and step-by-step pre-processing:</p>
<ul>
<li><p>Schema IDs are extracted from the wire format <em>(or message metadata).</em></p>
</li>
<li><p>Unknown schemas are fetched once and cached.</p>
</li>
<li><p>Repeated schema IDs in a batch are resolved from the cache.</p>
</li>
<li><p>Encode/decode continues in synchronous paths.</p>
</li>
</ul>
<p>This setup cuts down on unnecessary async work while still supporting remote schema registries safely. It also helps keep throughput and performance steady, as you’d expect from a Node.js client.</p>
<p>Operationally, this also makes failures easier to understand. Schema resolution errors happen during fetch or preparation, while codec errors are still linked to payload and schema compatibility.</p>
<h2><strong>Also Included in This Release</strong></h2>
<p>The v1.27.0 release also shipped quality improvements around consumer behaviour and protocol handling, with broad test coverage and new playground clients for:</p>
<ul>
<li><p>AVRO</p>
</li>
<li><p>Protobuf</p>
</li>
<li><p>JSON Schema</p>
</li>
<li><p>Authenticated Schema Registry setups</p>
</li>
</ul>
<p>The end result is a production-ready integration you can try out quickly, starting in local development and moving to secure production registries.</p>
<h2><strong>Experimental API Notice</strong></h2>
<p><code>ConfluentSchemaRegistry</code> and its related hooks are currently <strong>experimental</strong>. They may change in minor or patch releases as we keep improving them based on real-world use and feedback.</p>
<p>If you plan to use this in production, make sure to pin your versions and check the release notes. We’ll keep refining the API based on feedback from real deployments.</p>
<p>If your team is rolling this out, here’s a practical way to start:</p>
<ol>
<li><p>Start with one topic and one schema format <em>(typically AVRO or JSON Schema)</em></p>
</li>
<li><p>Validate serialization/deserialization behaviour in staging with real payloads.</p>
</li>
<li><p>Expand topic coverage and introduce auth/credential providers as needed.</p>
</li>
</ol>
<h2><strong>Getting Started</strong></h2>
<p>Install the package:</p>
<pre><code class="language-plaintext">npm install @platformatic/kafka
</code></pre>
<p>For Protobuf support, also install:</p>
<pre><code class="language-plaintext">npm install protobufjs
</code></pre>
<p>Next, follow the full integration guide in the documentation:</p>
<ul>
<li><p><a href="https://github.com/platformatic/kafka/blob/main/docs/confluent-schema-registry.md">Confluent Schema Registry docs</a></p>
</li>
<li><p><a href="https://github.com/platformatic/kafka/releases/tag/v1.27.0">v1.27.0 release notes</a></p>
</li>
</ul>
<p>If you give it a try, we’d love to hear your feedback at <a href="mailto:hello@platformatic.dev">hello@platformatic.dev</a>. Real-world schema workflows will help shape the next version of this API and guide our priorities for future improvements.</p>
<p>Thanks for building with us! 🚀</p>
]]></content:encoded></item><item><title><![CDATA[SSR Framework Benchmarks v2: What We Got Wrong, and the Real Numbers]]></title><description><![CDATA[TL;DR
We ran our SSR framework benchmarks again after finding out that compression was not applied the same way across all frameworks. In the original tests, TanStack did not have compression enabled.]]></description><link>https://blog.platformatic.dev/ssr-framework-benchmarks-v2-corrected-results</link><guid isPermaLink="true">https://blog.platformatic.dev/ssr-framework-benchmarks-v2-corrected-results</guid><category><![CDATA[Next.js]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[performance]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Tue, 24 Mar 2026 14:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/a6bfaa4d-f9dd-406f-8199-de4c38ac1824.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>TL;DR</h3>
<p>We ran our SSR framework benchmarks again after finding out that <strong>compression was not applied the same way across all frameworks</strong>. In the original tests, TanStack did not have compression enabled. React Router had gzip compression turned on in its Express <code>server.js</code>, but Watt skips <code>server.js</code> and uses Fastify. Because of this, React Router’s Watt runs had no compression overhead, while its Node and PM2 runs did. This made Watt seem faster than it actually was compared to the other runtimes.</p>
<p>Once we removed compression from React Router to make the comparison fair, the updated results gave us a clearer picture:</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/47b441d2-c720-45dc-938c-b6dccc5fbbb7.png" alt="" style="display:block;margin:0 auto" />

<p>TanStack and React Router are still the top performers, while Next.js continues to have trouble at 1,000 requests per second. The main change is that Watt’s advantage now appears mostly in tail latencies, with p(99) at 83-89ms compared to Node’s 121-298ms, instead of average response times.</p>
<hr />
<h2>What Went Wrong in the Previous Benchmarks</h2>
<p>HTTP compression means shrinking the response body before sending it over the wire, usually with gzip or Brotli. That typically reduces bandwidth and speeds up delivery of HTML, JSON, and JavaScript, which is why some frameworks enable it by default as a sensible production optimization. Others leave it off because compression is often handled more efficiently by a CDN or reverse proxy, and because it puts extra load on the CPU.</p>
<ul>
<li><p><strong>Next.js</strong>: Its built-in <code>compress</code> option works at the framework level, not just the HTTP server. We checked and confirmed that Next.js serves gzip responses with both Node and Watt, so compression is always applied no matter the runtime.</p>
</li>
<li><p><strong>TanStack Start</strong>: Never had compression configured in any runtime. All three runtimes (Node, PM2, Watt) served uncompressed responses. No inconsistency, but it made the comparison between frameworks unfair.</p>
</li>
<li><p><strong>React Router</strong>: does not ship a default server, but there are several templates. In the one we followed, compression was enabled; Watt did not follow the same example, and it had no compression.</p>
</li>
</ul>
<h3><strong>The Fix</strong></h3>
<p>We turned off compression on React Router by taking out the <code>compression()</code> middleware and uninstalling the package from <code>server.js</code>. We also set <code>compress: false</code> in Next.js’s <code>next.config.mjs</code> to make sure all three frameworks were tested the same way. Now, with compression removed everywhere, all frameworks serve uncompressed responses in every runtime.</p>
<p>In production, it’s best to handle compression at the reverse proxy or CDN layer, not in the application server.</p>
<hr />
<h2>Corrected Results</h2>
<p>All tests run at 1,000 req/s for 3 minutes with mixed e-commerce traffic (homepage, search, card details, game browsing, sellers - you can read more about the sample app we built for these benchmarks <a href="https://github.com/platformatic/k8s-watt-performance-demo/tree/ecommerce">here</a>) on AWS EKS. No compression, no <code>Accept-Encoding</code> headers.</p>
<p><strong>Software Versions</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/5bd6ba70-b6b3-43c2-a547-0487461d9cb0.png" alt="" style="display:block;margin:0 auto" />

<p><strong>React Router: Consistent Across All Runtimes</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/372ced00-f95d-43d8-9be2-770a6cb43e26.png" alt="" style="display:block;margin:0 auto" />

<p>React Router can handle 1,000 requests per second with no failures on any of the three runtimes. Watt and PM2 have almost the same median response time at 15ms. The difference shows up at the higher end: Watt’s p(99) is 83ms, PM2’s is 123ms, and Node’s is 298ms.</p>
<p><strong>TanStack Start: Watt and Node Neck-and-Neck</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/6150e74b-08b1-42da-ad43-c1678a29fe0f.png" alt="" style="display:block;margin:0 auto" />

<p>TanStack with Watt and TanStack with Node perform almost the same: they have the same average, median, and p(95) times. Watt is slightly better at p(99), with 89ms compared to Node’s 121ms.</p>
<p><strong>PM2 stands out as the outlier.</strong> With an 81% success rate and a 2.5 second average latency, PM2’s cluster fork model does not work well with Nitro’s srvx server. This is a problem between PM2 and Nitro, not TanStack. The same PM2 cluster mode works perfectly with React Router’s Express server, giving 100% success and a 20ms average.</p>
<p><strong>Next.js: Still Struggling at 1,000 req/s</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/ab2b2822-3331-485d-883f-11162aafcc10.png" alt="" style="display:block;margin:0 auto" />

<p>Next.js cannot handle 1,000 requests per second, no matter which runtime is used. All three runtimes have about a 55% success rate, which shows that <strong>the framework itself is the bottleneck, not the runtime</strong>. The high tail latencies (p(99) over 60 seconds) mean requests are piling up and timing out.</p>
<hr />
<h3><strong>What Changed vs the Original Blog Post</strong></h3>
<p><strong>Previous React Router Results (with compression inconsistency)</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/a3f37078-da72-4752-b6a9-b13be6a7a630.png" alt="" style="display:block;margin:0 auto" />

<p><strong>Corrected React Router Results (no compression anywhere)</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/6c693170-5ef9-4c55-9d47-69d62e2fedcd.png" alt="" style="display:block;margin:0 auto" />

<p>The average latency numbers are similar because Node’s response time was mostly affected by SSR work, not compression. Still, making this correction is important for our methods. Now we can be sure the gap is real and not just a mistake.</p>
<p><strong>TanStack’s results were always fair. The numbers changed a bit (from 13ms to 18ms) because of normal differences between runs, not because of</strong> <strong>compression changes.</strong></p>
<hr />
<h3>Key Takeaways</h3>
<ol>
<li><p><strong>Benchmark Hygiene Matters</strong><br />A single middleware inconsistency, like having compression enabled in <code>server.js</code> but skipped by Watt, was enough to make our results questionable. Always make sure your test conditions are the same for every variant, especially when runtimes load applications in different ways.</p>
</li>
<li><p><strong>Watt’s Real Advantage: Tail Latency</strong><br />With compression turned off, Watt and Node have similar average and median latencies on both TanStack and React Router. However, Watt always comes out ahead at p(99):</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/44ce9ddc-67de-4f5e-b1f4-05556c0733f7.png" alt="" style="display:block;margin:0 auto" />

<p>This is important for services where tail latency matters, like APIs for user-facing pages or services with strict SLAs.</p>
</li>
<li><p><strong>PM2 Cluster Mode Has Compatibility Issues</strong><br />PM2 works well with Express (React Router: 100% success, 20ms average), but not with Nitro (TanStack: 81% success, 2.5s average). If you use Nitro-based frameworks like TanStack Start or Nuxt, it’s better to <strong>avoid PM2 cluster mode</strong> and use Watt or plain Node instead.</p>
</li>
<li><p><strong>Next.js at 1,000 req/s: The Runtime Doesn’t Matter</strong><br />All three runtimes, Watt, PM2, and Node, perform the same on Next.js at this load, with about 55% success and a 9-second average. The bottleneck is in Next.js’s SSR pipeline, not in how connections are handled.<br />Part of the advantage of Watt is a better handling of CPU-bound activities, like compression. Disabling it reduces the advantage.</p>
</li>
<li><p><strong>TanStack Start and React Router Are Both Excellent</strong><br />With compression handled the same way, TanStack (18ms average) and React Router (19ms average on Watt) are very close in performance. Both can handle 1,000 requests per second with 100% success. So, you should choose between them based on developer experience and ecosystem fit, not just performance.</p>
</li>
</ol>
<hr />
<h2>Reproducing These Benchmarks</h2>
<p>The complete benchmark infrastructure is available at:<br /><a href="https://github.com/platformatic/k8s-watt-performance-demo/tree/ecommerce">https://github.com/platformatic/k8s-watt-performance-demo/tree/ecommerce</a></p>
<pre><code class="language-plaintext"># Benchmark TanStack Start
AWS_PROFILE=&lt;profile-name&gt; FRAMEWORK=tanstack ./benchmark.sh

# Benchmark React Router
AWS_PROFILE=&lt;profile-name&gt; FRAMEWORK=react-router ./benchmark.sh

# Benchmark Next.js
AWS_PROFILE=&lt;profile-name&gt; FRAMEWORK=next ./benchmark.sh
</code></pre>
<hr />
<h2>Conclusion</h2>
<p>Getting benchmarks right is tough. Even if you have the same applications, the same infrastructure, and careful methods, one small inconsistency, like compression middleware in one framework’s server file that is skipped by one runtime but not others, can make your results questionable.</p>
<p>The corrected results support the main points from our original post: TanStack Start and React Router can easily handle 1,000 requests per second, Next.js struggles at that level, and Watt gives real improvements across all three frameworks, especially for tail latencies. Now, though, we have more accurate numbers and a better idea of where each runtime really helps.</p>
<p>Being open about our methods is important. We made a mistake, fixed it, and are sharing both the error and the fix so others can learn from our experience.</p>
<p>If you’d like to talk about using Watt in your setup or want to learn more, email us at <a href="mailto:hello@platformatic.dev">hello@platformatic.dev</a> or reach out to<a href="https://www.linkedin.com/in/lucamaraschi/">Luca</a> or<a href="https://www.linkedin.com/in/matteocollina/">Matteo</a> on LinkedIn.</p>
]]></content:encoded></item><item><title><![CDATA[Durable Workflows Beyond Vercel: Version-Safe Orchestration for Kubernetes]]></title><description><![CDATA[Workflow DevKit lets you write durable, long-running workflows directly in your Next.js and Node.js apps. You define steps with ’use step’, and the SDK handles persistence, retries, and replay automat]]></description><link>https://blog.platformatic.dev/durable-workflows-kubernetes-version-safe</link><guid isPermaLink="true">https://blog.platformatic.dev/durable-workflows-kubernetes-version-safe</guid><category><![CDATA[Node.js]]></category><category><![CDATA[Next.js]]></category><category><![CDATA[Kubernetes]]></category><category><![CDATA[Microservices]]></category><dc:creator><![CDATA[Marco Piraccini]]></dc:creator><pubDate>Thu, 19 Mar 2026 14:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/2d88b854-5194-41b5-a6d2-46054e8e0586.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://useworkflow.dev">Workflow DevKit</a> lets you write durable, long-running workflows directly in your Next.js and Node.js apps. You define steps with <code>’use step’</code>, and the SDK handles persistence, retries, and replay automatically. Workflows survive server restarts, can sleep for days, and resume exactly where they left off.</p>
<p>On Vercel, all of this works out of the box — the platform handles deployment versioning and queue routing behind the scenes. But what happens when you deploy to your own Kubernetes cluster? Version mismatch. And it’s subtle enough to corrupt data before you notice.</p>
<p>We built <strong>Platformatic World</strong> to fix this. It’s a drop-in <a href="https://useworkflow.dev/docs/deploying">World implementation</a> that brings the same deployment safety to any Kubernetes cluster. Every workflow run is pinned to the code version that created it. Queue messages are routed to the correct versioned pods. Old versions stay alive until all their in-flight runs are complete.</p>
<h2>The version mismatch problem</h2>
<p>Workflow DevKit uses <a href="https://useworkflow.dev/docs/how-it-works">deterministic replay</a>. When a workflow resumes after a step, it runs the whole function again from the start, matching each <a href="https://useworkflow.dev/docs/foundations">step</a> to its cached result by order. The correlation IDs that link steps to cached results come from a seeded random number generator tied to the run ID. If the code and seed are the same, the sequence stays the same.</p>
<p>This works perfectly until you deploy a new version.</p>
<p>If a run that started on v1 replays on v2 and the step order has changed, the correlation IDs won’t match anymore. For example, the cached result from chargeCard could be used for the new <code>addDiscount</code> step:</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/f7ec7305-e11f-444c-b3e9-210fcf9a9986.png" alt="" style="display:block;margin:0 auto" />

<p>The workflow can quietly produce wrong results or fail in ways that are hard to spot. On Vercel, the <a href="https://useworkflow.dev/worlds/vercel">Vercel World</a> handles this for you. On self-hosted Kubernetes, you have to manage it yourself.</p>
<h2>We already solved this for HTTP</h2>
<p><a href="https://icc.platformatic.dev/">ICC</a> (Intelligent Command Center) is our Kubernetes controller for managing app deployments. We recently added <a href="https://blog.platformatic.dev/skew-protection-for-kubernetes">skew protection</a>. Here’s how it works for HTTP traffic:</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/cb0afb0b-066d-462c-a099-06575a4ace76.png" alt="" style="display:block;margin:0 auto" />

<p>When a user starts a session on version N, a cookie pins all subsequent requests to version N via Gateway API HTTPRoute rules. New visitors are routed to the latest active version.</p>
<p>Workflow runs work the same way: a run that starts on version N must keep running on version N until it finishes. The difference is in the transport. HTTP requests go through the Gateway API, but workflow queue messages do not.</p>
<h2>Why we couldn’t just extend the Intelligent Command Center</h2>
<p>Our first design had pods accessing PostgreSQL directly, with ICC handling queue routing. We abandoned it because the ICC couldn’t reliably determine when a version had no in-flight runs.</p>
<p>The problem: workflow runs can be suspended in ways that are invisible to the infrastructure</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/f0822720-c14c-4f89-bc24-51a9dfceb630.png" alt="" style="display:block;margin:0 auto" />

<p>When a workflow registers a <a href="https://useworkflow.dev/docs/foundations#hooks--webhooks">webhook</a> and then suspends, the pod becomes idle. There’s no memory use, no heartbeat, and no queue message. ICC sees no activity and expires the version. If someone clicks the webhook link hours later, the run’s pods are already gone:</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/0c2a5fe1-422a-4c74-887f-bb3014e5e6b4.png" alt="" style="display:block;margin:0 auto" />

<p>The only way to know if a version still has work in progress is to check the runs table. For that, you need a service that owns the data.</p>
<h2>How Platformatic World works</h2>
<p>Platformatic World consists of two packages:</p>
<ul>
<li><p><code>@platformatic/workflow</code> is a <a href="https://docs.platformatic.dev/watt/overview">Watt</a> application backed by PostgreSQL that manages all workflow state and queue routing. Every operation, like event creation, run queries, queue dispatch, hook registration, and encryption, goes through it.</p>
</li>
<li><p><code>@platformatic/world</code> is a lightweight HTTP client that implements the Workflow DevKit’s <a href="https://useworkflow.dev/docs/deploying">World</a> interface. This is what your app imports.</p>
</li>
</ul>
<p>The service enforces multi-tenant isolation at the SQL level by scoping every query to the <code>application_id</code>.</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/08696b37-1ec1-419f-b10d-d5b14dc98f34.png" alt="" style="display:block;margin:0 auto" />

<h3>Version-aware queue routing</h3>
<p>Each queue message includes a <code>deployment_version</code>. The router finds the registered handler for that version and sends the message to the right pod. Messages for v1 always go to v1 pods, even after v2 is deployed:</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/40953dfa-f5f6-4c20-bc88-3368d9d7ee9d.png" alt="" style="display:block;margin:0 auto" />

<p>If a dispatch fails, it uses exponential backoff and tries up to 10 times before moving the message to the dead-letter queue.</p>
<h3>Safe version draining</h3>
<p>When ICC finds a new version, it checks with the workflow service to see if the old version still has any work in progress. The service looks at active runs, pending hooks, waiting sleeps, and queued messages. ICC only decommissions the old version when all these counts are zero:</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/4a4866bc-9d51-4135-a96d-83831abe7849.png" alt="" style="display:block;margin:0 auto" />

<p>If a version stays alive longer than allowed, ICC can force-expire it. This cancels in-flight runs, moves queued messages to the dead-letter queue, and deregisters handlers.</p>
<h2>Zero-config in Kubernetes</h2>
<p>In production with ICC, you don’t need to write any configuration code. You just set two environment variables in your Dockerfile and add three pod labels in your Deployment spec:</p>
<pre><code class="language-shell">ENV WORKFLOW_TARGET_WORLD="@platformatic/world"
ENV PLT_WORLD_SERVICE_URL="http://workflow.platformatic.svc.cluster.local"
</code></pre>
<pre><code class="language-shell"># Pod labels in your Deployment spec
labels:
 app.kubernetes.io/name: my-app
 plt.dev/version: "v1"
 plt.dev/workflow: "true"
</code></pre>
<p>The Workflow DevKit discovers the world automatically. At startup, <code>@platformatic/world</code> (the library your app imports) resolves the app ID from the <code>PLT_WORLD_APP_ID</code> env var or the <code>package.json</code> name, detects the deployment version from the <code>plt.dev/version</code> label via the K8s API, and authenticates using the pod’s ServiceAccount token. On the infrastructure side, ICC sees the <code>plt.dev/workflow</code> label and registers queue handlers with @platformatic/workflow, so dispatched messages reach the correct versioned pod.</p>
<p>You don’t need to change your workflow code. The same '<code>use workflow</code>' and 'use step' directives work just like they do on Vercel.</p>
<h2>Local development</h2>
<p>For local development, the workflow service runs in single-tenant mode without authentication — no K8s, no ICC. Start PostgreSQL and the workflow service:</p>
<pre><code class="language-shell">npx @platformatic/workflow
postgresql://user:pass@localhost:5432/workflow
</code></pre>
<p>Then configure your app to connect to it with the same two environment variables from the Dockerfile above, just pointing at localhost:</p>
<pre><code class="language-shell">WORKFLOW_TARGET_WORLD=@platformatic/world
PLT_WORLD_SERVICE_URL=http://localhost:3042
</code></pre>
<p>Your app also needs to call <code>world.start()</code> Once the server starts, this registers a queue handler so the workflow service can dispatch messages back to your app. In K8s with ICC, this is a no-op (ICC handles it). Here’s a Next.js example using <code>instrumentation.ts</code>:</p>
<pre><code class="language-typescript">// instrumentation.ts — Next.js calls register() once on server startup
export async function register() {
  if (process.env.PLT_WORLD_SERVICE_URL) {
    const { createWorld } = await import(‘@platformatic/world’)
    const world = createWorld()
    await world.start?.()
  }
}
</code></pre>
<p>Other frameworks have different startup hooks (Fastify plugins, Express middleware, etc.) — the key is to call <code>world.start()</code> once before your app starts handling requests.</p>
<p>The service auto-provisions a default application, so no further setup is needed.</p>
<h2>Observability in ICC</h2>
<p>The ICC dashboard gives you full visibility into your workflow runs. The Workflows tab shows a real-time list of all runs for each application, with status, version, and duration.</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/1533f1ba-637a-4d87-8787-6bc11875df86.png" alt="" style="display:block;margin:0 auto" />

<p>Click a run to inspect it. The <strong>Trace</strong> view shows a waterfall of every step, with timing bars and status indicators. You can see exactly where time was spent and which steps ran in parallel.</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/88e009b5-d26a-4ab6-848b-708c0f1243b6.png" alt="" style="display:block;margin:0 auto" />

<p>The <strong>Graph</strong> tab visualizes the workflow structure as a directed graph. Sequential steps flow vertically, parallel steps are laid out side-by-side. After the first completed run of a version, the graph pre-renders immediately for subsequent runs — so you see the full structure before the workflow even starts executing.</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/8348267a-3a0f-45a5-9715-0d6a339534a0.png" alt="" style="display:block;margin:0 auto" />

<p>You can also <strong>replay</strong> completed runs from the dashboard (targeting the original deployment version), cancel running workflows, and inspect hooks, events, and streams.</p>
<h2>Try it</h2>
<p>You can find the repository at <a href="https://github.com/platformatic/platformatic-world">github.com/platformatic/platformatic-world</a>. The @platformatic/world package is a drop-in replacement for Vercel World. If your workflows run on Vercel today, they’ll work on your cluster with Platformatic World.</p>
<p>We’d love to hear how you use it. Feel free to open an issue or contact us on <a href="https://discord.gg/platformatic">Discord</a>.</p>
]]></content:encoded></item><item><title><![CDATA[React SSR Framework Showdown: TanStack Start, React Router, and Next.js Under Load]]></title><description><![CDATA[Performance benchmarks capture a moment, not a final judgment. Results depend on a specific workload, scale, and constraints; they do not rank frameworks by value. Next.js stands out for its widesprea]]></description><link>https://blog.platformatic.dev/react-ssr-framework-benchmark-tanstack-start-react-router-nextjs</link><guid isPermaLink="true">https://blog.platformatic.dev/react-ssr-framework-benchmark-tanstack-start-react-router-nextjs</guid><category><![CDATA[Next.js]]></category><category><![CDATA[react router]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[Kubernetes]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Tue, 17 Mar 2026 14:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/33f199bf-9079-481b-85b2-d0cc6421f29d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote>
<p>Performance benchmarks capture a moment, not a final judgment. Results depend on a specific workload, scale, and constraints; they do not rank frameworks by value. Next.js stands out for its widespread adoption, strong compatibility, and vast ecosystem trusted by millions. TanStack, as a newcomer, made bold architectural choices. React Router is positioned differently along the maturity curve. Each framework wins in its own context.</p>
<p>The numbers matter less than the response: every team addressed our shared data and delivered fixes. This collaboration with open data, shared flamegraphs, and upstream fixes makes Node.js a safe, long-term choice for enterprise teams.</p>
</blockquote>
<p><strong><mark class="bg-yellow-200 dark:bg-yellow-500/30">We updated our Benchmarks! View the new numbers </mark></strong> <a href="https://blog.platformatic.dev/ssr-framework-benchmarks-v2-corrected-results"><strong><mark class="bg-yellow-200 dark:bg-yellow-500/30">Here</mark></strong></a></p>
<h2>TL;DR</h2>
<p>With help from Claude Code, we built the same eCommerce app in three SSR frameworks and tested them at 1,000 requests per second on AWS EKS. We ran each framework both on Watt and directly on Kubernetes.</p>
<p>The results revealed big performance differences and highlighted a few key themes:</p>
<ol>
<li><p>Running Node services on Watt improves average latency.</p>
</li>
<li><p>The TanStack team is doing excellent work. Their framework outperformed the others we tested by a wide margin.</p>
</li>
<li><p>The Next.js team has made impressive performance improvements. Upgrading from v15 to v16 canary more than doubled throughput and reduced latency by six times. Their collaboration also led to a 75% speedup in React’s RSC deserialization, which benefits everyone using React.</p>
</li>
</ol>
<p>Both the TanStack and Next.js team used <a href="https://github.com/platformatic/flame">platformatic/flame</a> to find and resolve critical performance bottlenecks the benchmark uncovered - more on that below.</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/6cedd01d-637a-4f66-8fab-2a95866621ce.png" alt="" style="display:block;margin:0 auto" />

<p>TanStack Start outperformed React Router by 25% in throughput and had 35% lower latency. Both frameworks achieved a 100% success rate, meaning every request got an HTTP 200 response within our 10-second timeout. This strict definition makes the comparison fair and matches real-world SLA expectations. Next.js struggled under our benchmark load, but upgrading from v15.5.5 to v16.2.0-canary.66 more than doubled its throughput (from 322 to 701 requests per second) and reduced average latency by six times.</p>
<p>To mirror common enterprise eCommerce scenarios, no caching was used in this test, as it’s often avoided due to aggressive personalization and A/B testing. In many large-scale e-commerce deployments, personalization strategies ensure that individual user views have minimal overlap, often less than 5%,which means that cache hits provide minimal benefit compared to the invalidation overhead. This explicit trade-off reflects real-world scenarios, where companies choose to prioritize dynamic user experiences over the potential gains from caching.</p>
<p><strong>Collaboration note:</strong> We shared benchmark data and flamegraphs (via <a href="https://github.com/platformatic/flame">platformatic/flame</a>) with both the TanStack and Next.js teams. The TanStack team fixed a critical bottleneck, delivering a <strong>252x improvement</strong> in response times. The Next.js team’s <a href="https://x.com/timneutkens">Tim Neutkens</a> used our flamegraphs to identify a JSON.parse reviver overhead in React Server Components, resulting in a <a href="https://github.com/facebook/react/pull/35776">75% speedup in RSC deserialization</a> merged into React itself.</p>
<blockquote>
<p><em>While we run these benchmarks on a canary release of</em> <a href="http://next.js"><em>Next.js</em></a><em>, all the advantages are part of</em> <a href="http://next.js"><em>Next.js</em></a> <em>16.2.0, which is coming out very soon.</em></p>
</blockquote>
<hr />
<h2>The Challenge: Apples-to-Apples Framework Comparison</h2>
<p>Comparing SSR performance (or performance generally) across frameworks is notoriously tricky because teams tend to only write and deploy their apps to a single framework, so it’s rare to get a reasonable “like-for-like” comparison.</p>
<p>Luckily for us, we live in an era where writing code is as cheap as however many tokens it costs to generate your favorite LLM. So we made 3 (more-or-less) identical eCommerce sample applications with the help of our dear friend Claudio (feel free to check out the code for yourself <a href="https://github.com/platformatic/k8s-watt-performance-demo/tree/ecommerce">here</a>).</p>
<h3>The Application: CardMarket</h3>
<p>For these benchmarks, we built a trading card marketplace app, similar to a simpler version of <a href="https://www.tcgplayer.com/">TCGPlayer</a> or <a href="https://www.cardmarket.com/">CardMarket</a>. The data model includes 5 games (Pokémon, Magic: The Gathering, Yu-Gi-Oh!, Digimon, and One Piece), 50 card sets (10 per game), 10,000 cards (200 per set), 100 sellers with ratings and locations, and 50,000 listings with prices, conditions, and quantities.</p>
<p>The app includes several types of pages and routes to create a realistic e-commerce experience, all generated by Claude Code:</p>
<ul>
<li><p>The <strong>homepage</strong> shows featured games, trending cards, and new releases.</p>
</li>
<li><p>There’s a <strong>search page</strong> with full-text search, filtering, and pagination.</p>
</li>
<li><p><strong>Game detail</strong> pages show info about each game and its sets, while set detail pages list cards with pagination.</p>
</li>
<li><p><strong>Card detail</strong> pages display card info and seller listings.</p>
</li>
<li><p>The <strong>sellers’ list page</strong> shows all sellers with their ratings, and each seller has a profile and listings page.</p>
</li>
<li><p>There’s also a <strong>cart page</strong> with a static shopping cart.</p>
</li>
</ul>
<p><strong>We made several design choices to keep the implementations consistent:</strong></p>
<ul>
<li><p>All data comes from JSON files, and every framework uses the same data.</p>
</li>
<li><p>We added a random 1-5ms delay to simulate real database latency.</p>
</li>
<li><p>Every route uses full SSR with no client-side data fetching.</p>
</li>
<li><p>All versions share the same UI components, layouts, and Tailwind CSS styling.</p>
</li>
</ul>
<h3><strong>The Frameworks</strong></h3>
<p>We implemented this application in three frameworks:</p>
<ol>
<li><p><strong>TanStack Start</strong> (v1.157.16) - The newest entrant, built on TanStack Router with Vite for SSR</p>
</li>
<li><p><strong>React Router</strong> (v7) - The classic routing library, now with first-class SSR support.</p>
</li>
<li><p><strong>Next.js</strong> (v15, updated to v16 canary) - The established leader in React SSR</p>
</li>
</ol>
<p>Each implementation uses the framework’s idiomatic patterns:</p>
<ul>
<li><p><strong>TanStack Start</strong>: createFileRoute with loader functions</p>
</li>
<li><p><strong>React Router</strong>: Route modules with loader exports</p>
</li>
<li><p><strong>Next.js</strong>: App Router with Server Components</p>
</li>
</ul>
<h3><strong>The Runtimes</strong></h3>
<p>For each framework, we tested two runtime configurations:</p>
<ol>
<li><p><strong>Node.js</strong> - Single-threaded, 6 pods with 1 CPU allocated for each</p>
</li>
<li><p><strong>Watt</strong> - Multi-worker with SO_REUSEPORT, 3 pods with 2 CPUs allocated, with 2 workers per pod to use those 6 CPUs to the fullest</p>
</li>
</ol>
<p>All configurations received identical total CPU allocation (6 cores) for fair comparison.</p>
<hr />
<h2>Test Methodology</h2>
<h3><strong>Infrastructure</strong></h3>
<ul>
<li><p><strong>EKS Cluster:</strong> 4 nodes running m5.2xlarge instances (8 vCPUs, 32GB RAM each)</p>
</li>
<li><p><strong>Load Testing Instance:</strong> c7gn.2xlarge (8 vCPUs, 16GB RAM, network-optimized)</p>
</li>
<li><p><strong>Region:</strong> us-west-2</p>
</li>
<li><p><strong>Load Testing Tool:</strong> Grafana k6</p>
</li>
</ul>
<h3><strong>Software Versions</strong></h3>
<p>All versions are locked in package.json for reproducible benchmarks:</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/df858d83-3ee5-4e8a-b040-c6eca97fa21c.png" alt="" style="display:block;margin:0 auto" />

<h3><strong>Load Test Configuration</strong></h3>
<p>Each test followed this protocol:</p>
<ol>
<li><p><strong>NLB Warm-up:</strong> 60 seconds ramping from 10 to 500 req/s</p>
</li>
<li><p><strong>Pre-test Warm-up:</strong> 20 seconds at moderate load</p>
</li>
<li><p><strong>Cool-down:</strong> 60 seconds before the main test</p>
</li>
<li><p><strong>Main Test:</strong> 60 seconds ramp-up to 1,000 req/s, then 120 seconds sustained</p>
</li>
<li><p><strong>Between Tests:</strong> 480 seconds cooldown</p>
</li>
</ol>
<h3><strong>Realistic Traffic Distribution</strong></h3>
<p>The load test simulated realistic e-commerce traffic patterns:</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/a05c1ae4-8bcd-494f-a46e-b2348bb82ae0.png" alt="" style="display:block;margin:0 auto" />

<h2>Results</h2>
<h3>TanStack Start: The Performance Leader</h3>
<p>After Update (v1.157.16)</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/5c3a7ddc-0ef2-49ef-8c28-2f301f57b4b4.png" alt="" style="display:block;margin:0 auto" />

<p>TanStack Start delivered exceptional performance, the highest throughput and lowest latency of all frameworks tested. With Watt, average response times stayed under 13ms even at 1,000 requests per second.</p>
<h3>React Router: Solid and Reliable</h3>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/d72846a9-c017-4232-a096-1a3f6cd7c200.png" alt="" style="display:block;margin:0 auto" />

<p>React Router managed the load well and had zero failures. Using Watt made response times 38% faster compared to standalone Node.js.</p>
<h3>Next.js: Struggling Under Load, but Making Progress</h3>
<p>Initial Benchmark (Next.js 15.5.5, Watt 3.32.0)</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/c7a1abbd-2fc1-4798-b352-d7bb305277aa.png" alt="" style="display:block;margin:0 auto" />

<p>Next.js couldn’t handle 1,000 requests per second. Response times averaged 8 to 11 seconds, and about 40% of requests failed. Even with Watt’s optimizations, Next.js lagged behind the lighter frameworks.</p>
<h3>Updated Benchmark (Next.js 16.2.0-canary.66, Watt 3.39.0)</h3>
<p>We re-ran the benchmarks after upgrading to the latest Next.js canary and Watt 3.39.0 to see if the situation had improved:</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/f304a268-f7ce-4526-8d7d-f270e8d29926.png" alt="" style="display:block;margin:0 auto" />

<p>Next.js Version Improvement (Watt runtime)</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/21e0b39a-8212-41bb-93be-c66bed3ddcd7.png" alt="" style="display:block;margin:0 auto" />

<p>Upgrading from Next.js 15.5.5 to 16.2.0-canary.66, along with Watt 3.39.0, brought a big improvement:</p>
<ul>
<li><p>Throughput more than doubled</p>
</li>
<li><p>Average response times dropped by over six times</p>
</li>
<li><p>We saw an 83% reduction in latency.</p>
</li>
</ul>
<p>The success rate only improved a little (about 36% of requests still failed), but the successful requests were served much faster, with the median response time dropping from seconds to 431ms.</p>
<p>This is real progress. Next.js is still the slowest of the three frameworks at this load, but the gap is closing, and more improvements are on the way.</p>
<hr />
<h2>Framework Collaborations: Benchmarks as a Catalyst</h2>
<p>One of the best parts of this project was working directly with the framework teams. Sharing real-world benchmark data, especially flamegraphs that show where time is spent, helped turn abstract performance talks into real fixes. (If you are on a web performance team, we’d love to talk.)</p>
<h3>The Next.js Collaboration: Fixing RSC Deserialization</h3>
<p>After our initial Next.js benchmarks showed multi-second response times, we shared flamegraphs from our load tests with<a href="https://x.com/timneutkens">Tim Neutkens</a> from the Next.js team. The flamegraphs revealed a clear hotspot: <code>initializeModelChunk</code>. This function calls <code>JSON.parse</code> with a reviver callback in React Server Components (RSC) chunk deserialization.</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/a6e398e0-0b0f-4cb2-b9f4-2d020f70b5b6.png" alt="" style="display:block;margin:0 auto" />

<p>The root cause was a well-known V8 performance characteristic: <code>JSON.parse</code> is implemented in C++, and passing a reviver callback forces a <strong>C++ → JavaScript boundary crossing for every key-value pair</strong> in the parsed JSON. Even a trivial no-op reviver <code>(k, v)</code> =&gt; <code>v</code> makes <code>JSON.parse</code> roughly 4x slower than bare <code>JSON.parse</code> without one. Since <code>initializeModelChunk</code> is called for every RSC chunk during SSR, this overhead compounds rapidly on pages with many server components.</p>
<p>Tim identified the fix and submitted it directly to React:<a href="https://github.com/facebook/react/pull/35776">facebook/react#35776</a> (merged Feb 19, 2026). The change replaces the reviver callback with a two-step approach—plain <code>JSON.parse()</code> followed by a recursive tree walk in pure JavaScript—yielding a <strong>~75% speedup</strong> in RSC chunk deserialization:</p>
<img alt="" style="display:block;margin:0 auto" />

<p>This fix helps every React framework that uses Server Components, not just Next.js. It shows how profiling with real workloads can reveal optimization opportunities that microbenchmarks might miss.</p>
<p>The improvement is already reflected in our updated Next.js benchmarks (v16.2.0-canary.66), and we expect further gains as this optimization and others land in stable releases.</p>
<h3>The TanStack Turnaround: A Case Study in Rapid Optimization</h3>
<p>Interestingly enough, we had a similar journey with the TanStack team. Our initial benchmarks used TanStack Start v1.150.0, and the results were concerning: requests timing out, 75% success rates, and average response times exceeding 3 seconds. We shared these findings with the TanStack team, who quickly identified the critical bottlenecks (also via <a href="https://github.com/platformatic/flame">@platformatic/flame</a>) in their SSR request handling pipeline.</p>
<p>Within 7 minor versions, they shipped a fix. We re-ran the benchmarks on v1.157.16, and the transformation was extraordinary:</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/719cb4d5-f97f-4c87-9ce9-35c35355da5e.png" alt="" style="display:block;margin:0 auto" />

<p>The v1.150 numbers tell the story of a framework under distress. The p(95) latency hitting exactly 10,001ms wasn’t a coincidence, as the requests were slamming into our 10-second timeout limit. One in four requests failed entirely.</p>
<p>At 1,000 req/s, the framework was drowning.</p>
<p>After the fix, TanStack Start became the fastest framework in our benchmark. Response times dropped from seconds to milliseconds,the timeout cliff vanished, and every single request succeeded.</p>
<p>What makes this improvement even more notable is that it was <strong>runtime-agnostic</strong>. Both Watt and Node.js saw virtually identical gains: Watt improved from 3,228ms to 12.79ms average response time, while Node.js improved from 3,171ms to 13.73ms. This confirms that the bottleneck was purely in the framework’s code and that the fix benefited all users equally, regardless of their deployment strategy.</p>
<hr />
<h2>Runtime Comparison: Watt vs Node.js</h2>
<h3>Watt’s SO_REUSEPORT Advantage</h3>
<p>Watt uses Linux kernel’s SO_REUSEPORT to let workers accept connections directly:</p>
<ol>
<li><p>Kernel distributes the connection to the worker.</p>
</li>
<li><p>The worker processes the request.</p>
</li>
</ol>
<p>No master coordination, no IPC overhead. The kernel handles load distribution efficiently.</p>
<h3>When Does Watt Help Most?</h3>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/85617211-e4de-4034-bf4e-b880d9e81c88.png" alt="" style="display:block;margin:0 auto" />

<h2>Framework Rankings</h2>
<h3>With Watt Runtime</h3>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/c3aa5238-a425-4b20-8a39-bcdcc4595d9f.png" alt="" style="display:block;margin:0 auto" />

<p><strong>With Node.js Runtime</strong></p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/7fcfc3d9-25fa-43b2-81c5-31003f8191eb.png" alt="" style="display:block;margin:0 auto" />

<hr />
<h2>Reproducing These Benchmarks</h2>
<p>The complete benchmark infrastructure is available at:</p>
<p><a href="https://github.com/platformatic/k8s-watt-performance-demo/tree/ecommerce"><strong>https://github.com/platformatic/k8s-watt-performance-demo/tree/ecommerce</strong></a></p>
<p>To run the benchmarks:</p>
<pre><code class="language-plaintext"># Benchmark TanStack Start
AWS_PROFILE=&lt;profile-name&gt; FRAMEWORK=tanstack ./benchmark.sh

# Benchmark React Router
AWS_PROFILE=&lt;profile-name&gt; FRAMEWORK=react-router ./benchmark.sh

# Benchmark Next.js
AWS_PROFILE=&lt;profile-name&gt; FRAMEWORK=next ./benchmark.sh

# Benchmark all frameworks
AWS_PROFILE=&lt;profile-name&gt; ./benchmark-all.sh
</code></pre>
<p>The script creates an ephemeral EKS cluster, deploys all three runtime configurations (Node, PM2, Watt), executes the load tests, and tears down the infrastructure automatically. The results for PM2 were omitted from the blog post because they align with previously reported findings (read <a href="https://blog.platformatic.dev/93-faster-nextjs-in-your-kubernetes">93% Faster Next.js in (your) Kubernetes</a>).</p>
<p>The script creates an ephemeral EKS cluster, deploys all three runtime configurations (Node, PM2, Watt), executes the load tests, and tears down the infrastructure automatically. The results for PM2 were omitted from the blog post because they align with previously reported findings (read<a href="https://blog.platformatic.dev/93-faster-nextjs-in-your-kubernetes">93% Faster Next.js in (your) Kubernetes</a>).</p>
<hr />
<h2>Key Takeaways</h2>
<ol>
<li><p><strong>Watt Provides Consistent Improvements</strong><br />Watt improved performance for all frameworks compared to standalone Node.js. The gains ranged from 7% for TanStack to 38% for React Router. It’s a low-risk optimization that helps in every case.</p>
</li>
<li><p><strong>TanStack Start is Production-Ready</strong><br />Despite being the newest framework, TanStack Start delivered the best performance. The team’s rapid response to performance issues (a 252x improvement across 7 versions) demonstrates an active focus on development and optimization.</p>
</li>
<li><p><strong>Keep Dependencies Updated</strong><br />The results from TanStack and Next.js both show how important it is to keep your dependencies up to date. TanStack improved from 75% to 100% success in 7 versions. Next.js doubled its throughput between v15 and v16 canary. <strong>You only get these performance improvements if you update.</strong></p>
</li>
<li><p><strong>Framework Choice Matters More Than Runtime</strong><br />The difference between TanStack Start and Next.js (3x throughput, 690x latency difference) far exceeds the difference between Watt and Node.js on the same framework. Choose your framework wisely.</p>
</li>
<li><p><strong>Next.js Needs Caching</strong><br />At 1,000 req/s, Next.js struggled. For high-volume SSR workloads, users should consider adopting aggressive cache strategies (ISR, edge caching, component caching). Next.js has great primitives for these, and <a href="https://blog.platformatic.dev/watt-v318-unlocks-nextjs-16s-revolutionary-use-cache-directive-with-redisvalkey">you can use them in Watt.</a> We did not implement any caching solution for Next.js because, in most e-commerce (or enterprise) scenarios, caching is a no-go: companies want to implement aggressive personalization strategies and A/B testing, running thousands of experiments in parallel. That said, the jump from v15 to v16 Canary shows meaningful improvement, and if this trajectory continues, the gap will keep closing.</p>
</li>
</ol>
<p>If you want performance to be a key part of your technology choices, try setting clear latency budgets for each route before you start building or picking a framework. Setting concrete performance goals early helps guide decisions about architecture and tools, and makes sure your stack meets real-world needs. Planning for latency by route can also show when caching, framework choice, or runtime tweaks will have the biggest impact on user experience.</p>
<hr />
<h2>Conclusion</h2>
<p>These benchmarks show there are big performance differences between SSR frameworks when running the same app under load:</p>
<ul>
<li><p><strong>TanStack Start</strong> emerged as the performance leader, handling 1,000 req/s with 13ms average latency.</p>
</li>
<li><p><strong>React Router</strong> delivered reliable performance with zero failures.</p>
</li>
<li><p><strong>Next.js</strong> struggled at this load, but improved a lot after upgrading to v16 canary. Throughput doubled and latency dropped by six times.</p>
</li>
</ul>
<p>Beyond the numbers, this project showed that you can’t fix what you can’t see. We use <a href="https://github.com/platformatic/flame">platformatic/flame</a> for our own internal performance testing, and <strong>sharing benchmark data with framework teams led to real improvements</strong>. The TanStack team’s 252x improvement in 7 versions, and the Next.js team’s work that led to a <a href="https://github.com/facebook/react/pull/35776">75% speedup in React’s RSC deserialization</a>, both show that open performance data helps the whole ecosystem, not just one framework or project.</p>
<p>For teams choosing an SSR framework, these results suggest:</p>
<ul>
<li><p><strong>High-throughput requirements:</strong> Consider TanStack Start or React Router</p>
</li>
<li><p><strong>If you have an existing Next.js project, upgrade to the latest version for major performance gains</strong>. Use Watt to get the best throughput.</p>
</li>
<li><p><strong>Runtime optimization:</strong> Watt provides consistent improvements across all frameworks</p>
</li>
</ul>
<p>We’re actively looking to speak with web performance teams at the moment. If that’s you, please send me a DM on LinkedIn, Twitter, <a href="mailto:hello@platformatic.dev">hello@platformatic.dev</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Why Node.js needs a virtual file system]]></title><description><![CDATA[Node.js has always been about I/O. Streams, buffers, sockets, files. The runtime was built from day one to move data between the network and the filesystem as fast as possible. But there’s a gap that ]]></description><link>https://blog.platformatic.dev/why-nodejs-needs-a-virtual-file-system</link><guid isPermaLink="true">https://blog.platformatic.dev/why-nodejs-needs-a-virtual-file-system</guid><category><![CDATA[Node.js]]></category><category><![CDATA[JavaScript]]></category><category><![CDATA[backend]]></category><category><![CDATA[Developer]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Mon, 16 Mar 2026 14:30:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/08e2b02f-dc24-4d51-bacd-7d51ce62d7ce.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Node.js has always been about I/O. Streams, buffers, sockets, files. The runtime was built from day one to move data between the network and the filesystem as fast as possible. But there’s a gap that has bugged me for years: you can’t virtualize the filesystem.</p>
<p>You can’t <code>import</code> or <code>require()</code> a module that only exists in memory. You can’t bundle assets into a single executable without patching half the standard library. You can’t sandbox file access for a tenant without reinventing <code>fs</code> from scratch.</p>
<p>That changes now. We’re announcing<a href="https://github.com/platformatic/vfs">@platformatic/vfs</a>, a userland Virtual File System for Node.js, and the upstream <a href="https://github.com/nodejs/node/pull/61478">node:vfs</a> module landing in Node.js core.</p>
<h2>The problem</h2>
<p>Here’s what it looks like in practice when Node.js doesn’t have a VFS:</p>
<ol>
<li><p><strong>Bundle a full application into a Single Executable.</strong> You need to ship configuration files, templates, and static assets alongside your code. This often means bolting on 20 to 40 MB of extra boilerplate just to handle asset access at runtime. Node.js SEAs can embed a single blob, but your application code still calls <code>fs.readFileSync()</code> expecting real paths, so you end up duplicating files or injecting glue code that bloats your binary.</p>
</li>
<li><p><strong>Run tests without touching the disk.</strong> You want an isolated, in-memory filesystem so tests don’t leave artifacts and don’t collide in CI. Today, you mock fs with tools like <code>memfs</code>, but those mocks don’t integrate with <code>import</code> or <code>require()</code>.</p>
</li>
<li><p><strong>Sandbox a tenant’s file access.</strong> In a multi-tenant platform, you need to confine each tenant to a directory without them escaping via <code>../.</code> You end up writing path validation logic that’s fragile and easy to get wrong.</p>
</li>
<li><p><strong>Load code generated at runtime.</strong> AI agents, plugin systems, and code generation pipelines produce JavaScript that needs to be imported. Today, that means writing to a temp file and hoping cleanup happens.</p>
</li>
</ol>
<p>All four require the same primitive: a virtual filesystem that hooks into <code>node:fs</code> and Node.js module loading. The ecosystem has built approximations like <code>memfs</code>, <code>unionfs</code>, <code>mock-fs</code>, but they all share the same limitation: they patch fs but not the module resolver. Code that calls <code>import('./config.json')</code> bypasses them entirely.</p>
<p>The <a href="https://github.com/nodejs/node/issues/60021">original issue</a> requesting VFS hooks for SEAs, opened by <a href="https://github.com/robertsLando">Daniel Lando</a>, captured this well. The <a href="https://github.com/nodejs/single-executable/pull/43">FS hooks proposal</a> from the Single Executable working group documented years of requirements. People knew what they wanted. Nobody had built it yet.</p>
<h2><code>node:vfs</code> in Node.js core</h2>
<p>I started working on a VFS implementation over Christmas 2025. What began as a holiday experiment became <a href="https://github.com/nodejs/node/pull/61478">PR #61478</a>: a <code>node:vfs</code> module for Node.js, with almost 14,000 lines of code across 66 files.</p>
<p>Let me be honest: a PR that size would normally take months of full-time work. This one happened because I built it with <a href="https://docs.anthropic.com/en/docs/claude-code">Claude Code</a>. I pointed the AI at the tedious parts, the stuff that makes a 14k-line PR possible but no human wants to hand-write: implementing every <code>fs</code> method variant (sync, callback, promises), wiring up test coverage, and generating docs. I focused on the architecture, the API design, and reviewing every line. Without AI, this would not have been a holiday side project. It just wouldn’t have happened.</p>
<p>Here’s what it looks like:</p>
<pre><code class="language-javascript">import vfs from 'node:vfs'
import fs from 'node:fs'

const myVfs = vfs.create()

myVfs.mkdirSync('/app')
myVfs.writeFileSync('/app/config.json', '{"debug": true}')
myVfs.writeFileSync('/app/module.mjs', 'export default "hello from VFS"')

myVfs.mount('/virtual')

// Standard fs works
const config = JSON.parse(fs.readFileSync('/virtual/app/config.json', 'utf8'))

// import works, and so does require()
const mod = await import('/virtual/app/module.mjs')
console.log(mod.default) // "hello from VFS"

myVfs.unmount()
</code></pre>
<p>This is not a mock. When you call <code>myVfs.mount('/virtual')</code>, the VFS hooks into the actual fs module and the module resolver. Any code in the process, yours or your dependencies, that reads from paths under <code>/virtual</code> gets content from the VFS. Third-party libraries don’t need to know about it. <code>express.static('/virtual/public')</code> just works.</p>
<h3>How it’s structured</h3>
<p>The VFS has a provider layer and a mount layer.</p>
<p><strong>Providers</strong> are the storage backends. <code>MemoryProvider</code> is the default: in-memory, fast, gone when the process exits. <code>SEAProvider</code> gives read-only access to assets embedded in Single Executable Applications. <code>VirtualProvider</code> is a base class you can extend for custom backends (database, network, whatever you need).</p>
<p><strong>Mounting</strong> is how the VFS becomes visible to the rest of the process. <code>myVfs.mount('/virtual')</code> makes VFS content accessible under that path prefix. The process object emits <code>vfs-mount</code> and <code>vfs-unmount</code> events so you can track what’s going on:</p>
<pre><code class="language-javascript">process.on('vfs-mount', (info) =&gt; {
 console.log(`VFS mounted at \({info.mountPoint}, overlay: \){info.overlay}, readonly: ${info.readonly}`)
})
</code></pre>
<p>There’s also an <strong>overlay mode</strong> for when you want to intercept specific files without hiding the real filesystem:</p>
<pre><code class="language-javascript">const myVfs = vfs.create({ overlay: true })
myVfs.writeFileSync('/etc/config.json', '{"mocked": true}')
myVfs.mount('/')

// /etc/config.json comes from VFS
// /etc/hostname comes from the real filesystem
</code></pre>
<p>Only the paths that exist in the VFS are intercepted. Everything else goes to the real filesystem. For testing, this is ideal: you can override a few files and leave the rest untouched.</p>
<h3>The fs API</h3>
<p>The VFS isn’t a subset of <code>fs</code>. It covers synchronous, callback, and promise-based APIs for reading, writing, directories, symlinks, file descriptors, streams, watching, and glob. <code>VirtualStats</code> matches <code>fs.Stats</code>. Error codes match what Node.js returns (<code>ENOENT, ENOTDIR, EISDIR, EEXIST</code>). Code that works with the real filesystem should work with the VFS.</p>
<h3>Why VFS needs to live in core Node.js</h3>
<p><code>@platformatic/vfs</code> proves the API works, but it also proves why a userland implementation will always be a compromise. Here’s what you run into when you try to build this outside of Node.js:</p>
<p><strong>Module resolution is duplicated.</strong> The userland package contains 960+ lines of module resolution logic: walking <code>node_modules</code> trees, parsing <code>package.json exports</code> fields, trying index files, and resolving conditional exports. All of this already exists inside Node.js.</p>
<p><em>In core, the VFS hooks directly into the existing resolver. In userland, we re-implement it and hope we got every edge case right.</em></p>
<p><strong>Private APIs.</strong> On Node.js versions before 23.5, there’s no public API to hook module resolution. The userland package patches <code>Module._resolveFilename</code> and <code>Module._extensions</code>, both private internals with no stability guarantees. A Node.js minor release could break them.</p>
<p><em>In core, the VFS is part of the resolver, not a patch on top of it.</em></p>
<p><strong>Global fs patching is fragile.</strong> The userland package replaces <code>fs.readFileSync</code>, <code>fs.statSync</code>, and other core functions. If any code captures a reference to <code>fs.readFileSync</code> before the VFS mounts, that reference bypasses the VFS entirely.</p>
<p><em>In core, the interception happens below the public API surface, so captured references still work.</em></p>
<p><strong>Native modules don’t work.</strong> <code>dlopen()</code> needs a real file path.</p>
<p><em>A userland VFS can’t teach the native module loader to read</em> <code>.node</code> <em>files from memory. Core can.</em></p>
<p><strong>Module cache cleanup is impossible.</strong> When you unmount a VFS, modules that were <code>require()</code>'d from it stay in <code>require.cache</code>.</p>
<p><em>The userland package has no way to distinguish VFS-loaded modules from real ones, so it can’t clean them up. Core can track which modules came from which VFS and invalidate them on unmount.</em></p>
<p>None of these issues are bugs in the userland package. They’re just fundamental limits of what’s possible outside the runtime. The userland package is a bridge. Use it now, and switch to <code>node:vfs</code> when it becomes available.</p>
<h3>Where the PR stands</h3>
<p>The PR is open and in active review. The feature will be released as experimental.</p>
<p><a href="https://github.com/joyeecheung">Joyee Cheung</a> from <a href="https://www.igalia.com/">Igalia</a> has been the most thorough reviewer. She pushed hard on the security model around <code>mount()</code>, flagged that <code>internalModuleStat</code> shouldn’t be exposed as public API, and pointed to the <a href="https://github.com/nodejs/single-executable/blob/main/docs/virtual-file-system-requirements.md">VFS requirements document</a> that the Single Executable working group collected over four years. Her feedback made the implementation significantly better.</p>
<p><a href="https://github.com/jasnell">James Snell</a> and <a href="https://github.com/ShogunPanda">Paolo Insogna</a> approved the PR. <a href="https://github.com/Qard">Stephen Belanger</a> raised important questions about the security implications of global <code>mount()</code> hijacking and suggested integrating with the permission model. <a href="https://github.com/Ethan-Arrowood">Ethan Arrowood</a> did a thorough review of the docs and tests. <a href="https://github.com/avivkeller">Aviv Keller</a> caught places where code could be simplified with <code>node:path</code>. <a href="https://github.com/targos">Richard Lau</a> and <a href="https://github.com/bnb">Tierney Cyren</a> provided feedback on documentation structure.</p>
<p>Thanks to everyone involved. Reviewing a 14,000-line PR is a big job, and they all put in the effort.</p>
<h2><code>@platformatic/vfs</code>: use it today</h2>
<p>We didn’t want to wait for the core PR to be merged.</p>
<p>When Malte Ubl, CTO of Vercel, <a href="https://x.com/cramforce/status/2017691219691033080">saw the PR</a>, he tweeted:</p>
<blockquote>
<p><em>“ I saw @matteocollina Virtual File System PR for Node.js, and I’m super excited about it! And so I was wondering if it could be back-ported in user-land. Looks pretty good. May publish it to npm”</em></p>
</blockquote>
<p>We had the same idea, and so did the Vercel team, who published <a href="https://github.com/vercel-labs/node-vfs-polyfill">node-vfs-polyfill</a>. When two teams independently extract the same API into userland, it’s a good sign that the design is solid.</p>
<p>Our version is<a href="https://github.com/platformatic/vfs">@platformatic/vfs,</a> and it works on Node.js 22 and above.</p>
<pre><code class="language-plaintext">npm install @platformatic/vfs
</code></pre>
<p>The API matches what’s proposed for <code>node:vfs</code>:</p>
<pre><code class="language-javascript">import { create, MemoryProvider, SqliteProvider, RealFSProvider } from '@platformatic/vfs'

const vfs = create()
vfs.writeFileSync('/index.mjs', 'export const version = "1.0.0"')
vfs.mount('/app')

const mod = await import('/app/index.mjs')
console.log(mod.version) // "1.0.0"
</code></pre>
<p>When <code>node:vfs</code> ships in core, migrating is a one-line change: swap '<code>@platformatic/vfs</code>' for '<code>node:vfs</code>' in your import.</p>
<h3>Extra providers</h3>
<p>The userland package ships two providers that aren’t in the core PR. <code>SqliteProvider</code> gives you a persistent VFS backed by <code>node:sqlite</code>. Files survive process restarts:</p>
<pre><code class="language-javascript">import { create, SqliteProvider } from '@platformatic/vfs'

const disk = new SqliteProvider('/tmp/myfs.db')
const vfs = create(disk)

vfs.writeFileSync('/config.json', '{"saved": true}')
disk.close()

// Later, in another process:
const disk2 = new SqliteProvider('/tmp/myfs.db')
const vfs2 = create(disk2)
console.log(vfs2.readFileSync('/config.json', 'utf8')) // '{"saved": true}'
</code></pre>
<p>This is helpful for caching compiled assets or keeping generated code across deployments.</p>
<p><code>RealFSProvider</code> is sandboxed real filesystem access. It maps VFS paths to a real directory and prevents path traversal:</p>
<pre><code class="language-javascript">import { create, RealFSProvider } from '@platformatic/vfs'

const provider = new RealFSProvider('/tmp/sandbox')
const vfs = create(provider)

vfs.writeFileSync('/file.txt', 'sandboxed') // Writes to /tmp/sandbox/file.txt
vfs.readFileSync('/../../../etc/passwd') // Throws, can't escape the sandbox
</code></pre>
<h2>Use cases</h2>
<h3>Single Executable Applications</h3>
<p>Node.js SEAs can embed assets, but accessing them has always been tricky. With VFS, SEA assets are automatically mounted and can be accessed through standard <code>fs</code> calls, <code>import</code>, and <code>require()</code>. Your application code doesn’t need to know it’s running as an SEA.</p>
<h3>Testing</h3>
<p>You can create an isolated filesystem per test. No temp directories to clean up, no collisions between parallel test runs:</p>
<pre><code class="language-javascript">import { create } from '@platformatic/vfs'
import { test } from 'node:test'

test('reads config from virtual filesystem', () =&gt; {
 using vfs = create()
 vfs.writeFileSync('/config.json', '{"env": "test"}')
 vfs.mount('/app')

 // Your application code reads /app/config.json through standard fs
 // No disk I/O, no cleanup needed
 // The `using` statement automatically unmounts when the block exits
})
</code></pre>
<h3>AI agents and code generation</h3>
<p>AI agents generate code that needs to run. Writing to temp files is slow, creates cleanup problems, and increases security risks. With VFS, generated code stays in memory and can be loaded with <code>import</code>:</p>
<pre><code class="language-javascript">import { create } from '@platformatic/vfs'

const vfs = create()
vfs.writeFileSync('/handler.mjs', agentGeneratedCode)
vfs.mount('/generated')

const { default: handler } = await import('/generated/handler.mjs')
await handler(request)
</code></pre>
<h2>What’s next</h2>
<p>Both <code>node:vfs</code> and <code>@platformatic/vfs</code> are <strong>experimental</strong>. The test coverage is solid, but a virtual filesystem that hooks into module loading and <code>node:fs</code> has a huge surface area. There will be bugs. Edge cases we haven’t hit. Interactions with third-party code we didn’t anticipate.</p>
<p>If you hit something, please report it. For the userland package, open an issue on <a href="https://github.com/platformatic/vfs/issues">platformatic/vfs</a>. For the core module, comment on <a href="https://github.com/nodejs/node/pull/61478">the PR</a> or open an issue on <a href="https://github.com/nodejs/node/issues">nodejs/node</a>. Every bug report helps.</p>
<p>Once <code>node:vfs</code> lands in core, we’ll keep <code>@platformatic/vfs</code> in sync with any API changes and eventually deprecate it in favour of the built-in module.</p>
<p>In the meantime, try it out and <a href="https://github.com/platformatic/vfs/issues">let us know</a> what you build.</p>
<hr />
<p><a href="https://github.com/nodejs/node/pull/61478"><em>node:vfs</em></a> <em>PR</em> by <a href="https://github.com/mcollina">Matteo Collina</a>.</p>
<p>Fixes <a href="https://github.com/nodejs/node/issues/60021">issue #60021</a> by <a href="https://github.com/robertsLando">Daniel Lando</a>.</p>
<p><a href="https://github.com/platformatic/vfs">@platformatic/vfs</a> is now on npm.</p>
]]></content:encoded></item><item><title><![CDATA[Scale Next.js Image Optimization with a Dedicated Platformatic Application]]></title><description><![CDATA[Image optimization with Next.js is a popular feature, but one that quietly causes instability (in the form of latency spikes) for your frontend.  This is because image resizing and encoding are very C]]></description><link>https://blog.platformatic.dev/scale-nextjs-image-optimization-platformatic</link><guid isPermaLink="true">https://blog.platformatic.dev/scale-nextjs-image-optimization-platformatic</guid><category><![CDATA[Next.js]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[Kubernetes]]></category><dc:creator><![CDATA[Paolo Insogna]]></dc:creator><pubDate>Tue, 10 Mar 2026 14:46:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/2ef7031f-64e1-4f37-83e6-943d22b043b4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Image optimization with <a href="http://next.js">Next.js</a> is a popular feature, but one that quietly causes instability (in the form of latency spikes) for your frontend.  This is because image resizing and encoding are very CPU and memory-intensive, especially when traffic is highest, and users expect fast pages. During real launches, 95th percentile render times often rise from about 600ms to over 2 seconds when there are many image requests, even if the app code stays the same. If image processing shares workers with Server-Side Rendering (SSR), React Server Components (RSC), and API routes, a spike in image requests can slow down everything else, and all of a sudden, you’ve got a cascading failure on your hands.</p>
<p>That’s why teams often notice the same pattern during launches and campaigns: <code>/_next/image</code> traffic increases, CPU usage maxes out, render times get longer, and the whole frontend slows down even though the app logic hasn’t changed. In short, image optimization starts to interfere with your most important user flows.</p>
<p><a href="https://github.com/platformatic/platformatic/">Watt</a> is our open-source Node.js application server that orchestrates frontend frameworks (Next.js, Astro, Remix) and backend services (Node.js, Fastify, Express, Hono, etc) into a single system, with built-in logging, tracing, and multithreading. It leverages the Linux kernel's SO_REUSEPORT to distribute connections across workers with zero coordination overhead. In our <a href="https://blog.platformatic.dev/93-faster-nextjs-in-your-kubernetes">production benchmarks on AWS EKS</a>, Watt delivered 93.6% faster median latency and a 99.8% success rate under a sustained load of 1,000 requests per second. After investigating component rendering, it was only a question of time before we looked into images.</p>
<p>By moving image optimization into its own Watt Application, you create a clear microservice boundary. The optimizer becomes a focused service in your setup, with an API that only exposes what’s needed for safe and efficient image delivery. This keeps media processing separate from your main frontend. You can then scale image capacity on its own, let rendering workers focus on rendering, and adjust retries, timeouts, and storage for media processing without having to over-provision your whole frontend.</p>
<p><code>@platformatic/next</code> is the official Platformatic package for running Next.js inside a Watt Application. It’s fully maintained and supported by the Platformatic team, so you get long-term compatibility with Next.js updates, regular security patches, and best-practice defaults for production. Teams can count on ongoing updates and quick fixes, which lowers maintenance risk and avoids the downsides of custom or community-maintained solutions. The package now includes an Image Optimizer mode, letting you run <code>/_next/image</code> as a dedicated Watt Application, scale it separately, and keep your frontend fast even when image traffic increases.</p>
<p>This capability was introduced in <a href="https://github.com/platformatic/platformatic/pull/4605">PR #4605</a>, and it builds on top of <a href="https://github.com/platformatic/image-optimizer">@platformatic/image-optimizer</a>, our dedicated optimization engine. Our image optimizer is built on top of sharp, leveraging <a href="https://blog.platformatic.dev/job-queue-reliable-background-jobs">@platformatic/job-queue,</a> which adds flexible storage, job deduplication with caching, and producer/consumer decoupling.</p>
<p>If you are self-hosting Next.js and want the same kind of operational separation that mature platforms use internally, this is the missing building block.</p>
<p>In short, you can keep using Next.js as you always have, but with a cleaner architecture that handles high traffic more efficiently</p>
<h2>Why split image optimization from your frontend?</h2>
<p>If your frontend handles page rendering, API routes, and image resizing as a single service, any slowdown in one will cascade to the others. This means that when traffic is highest, like during product launches, campaigns, or social media spikes, this architecture causes performance to suffer the most</p>
<p>And it goes without saying (although it’s a blog, so yes, we will say it anyway…) that page performance isn’t just a technical issue - even a 100 ms delay can lower conversion rates by up to 7%, making slowdowns expensive during launches and campaigns.</p>
<p>The reason comes down to architecture: resizing and re-encoding images is bursty, CPU-heavy, and often I/O bound, while SSR and API routes usually need lower latency and more consistent resources. Running both in one service means you have to use the same autoscaling and resource pool for two very different types of work.</p>
<p>Splitting these responsibilities and running them as worker threads using Watt eliminates this ‘noisy neighbour’ effect and lets you apply the right scaling strategy to each path: scale optimizer replicas (or threads) when media demand rises, and keep frontend replicas sized for rendering throughput and tail latency.</p>
<p>Platformatic’s dedicated image optimizer, Watt Application, gives you:</p>
<ul>
<li><p><strong>Independent scaling</strong>: add replicas for image workloads without scaling the whole frontend stack.</p>
</li>
<li><p><strong>Operational isolation</strong>: image spikes do not starve SSR/RSC rendering.</p>
</li>
<li><p><strong>Centralized controls</strong>: enforce width/quality validation, timeout, retry behaviour, and storage in one place.</p>
</li>
<li><p><strong>Flexible queue storage</strong>: choose memory, filesystem, or Redis/Valkey depending on your topology.</p>
</li>
</ul>
<p>This setup is especially useful for platform engineering and SRE teams who need predictable performance without over-provisioning the whole frontend. Clear ownership lets these teams align this approach with their KPIs for reliability, scalability, and cost efficiency.</p>
<h2>What shipped in Platformatic Next</h2>
<p>The new <code>next.imageOptimizer</code> configuration lets you turn on optimizer-only mode in <code>@platformatic/next</code>, so you can run a Watt Application focused just on image processing. In other words: flip one flag and route only <code>/_next/image</code>, making adoption fast and low-friction.</p>
<p>When enabled, the service:</p>
<ol>
<li><p>Exposes only the Next.js image endpoint (<code>/_next/image</code>, respecting base path).</p>
</li>
<li><p>Validates image parameters using Next.js rules.</p>
</li>
<li><p>Resolves relative URLs through a fallback target (URL or runtime service name).</p>
</li>
<li><p>Fetches and optimizes images through a queue-backed pipeline; if the same image is requested by multiple users at the same time, it would be processed only once.</p>
</li>
<li><p>Returns optimized image bytes and cache headers.</p>
</li>
</ol>
<p>Under the hood, this relies on <a href="https://github.com/platformatic/image-optimizer">@platformatic/image-optimizer</a>, which provides a robust processing pipeline with:</p>
<ul>
<li><p>image type detection from magic bytes</p>
</li>
<li><p>optimization for <code>jpeg</code>, <code>png</code>, <code>webp</code>, and <code>avif</code></p>
</li>
<li><p>animation-aware safeguards</p>
</li>
<li><p>URL fetch + optimize helpers</p>
</li>
<li><p>queue APIs powered by <a href="https://github.com/platformatic/job-queue">@platformatic/job-queue</a></p>
</li>
</ul>
<p>The queue can run as a distributed state on Redis/Valkey, so retries, workload distribution, and resilience remain consistent across multiple optimizer replicas.</p>
<p>The main idea is to keep frontend rendering and image optimization separate, while still using the usual Next.js image features.</p>
<h2>What this means for teams</h2>
<ul>
<li><p><strong>Frontend teams</strong> keep using <code>next/image</code> as usual, without rewriting application code.</p>
</li>
<li><p><strong>Platform teams</strong> get explicit controls for retries, timeout budgets, and queue storage.</p>
</li>
<li><p><strong>Ops teams</strong> can scale optimizer replicas independently from the frontend tier.</p>
</li>
<li><p><strong>Product teams</strong> get a smoother user experience during peak traffic windows.</p>
</li>
</ul>
<p>The result is a platform that feels (and… is) faster to end users and more controllable to engineering teams. In recent internal benchmarks, shifting image optimization to a dedicated Watt Application reduced 95th-percentile response times during peak traffic by up to 40%, turning previously unpredictable slowdowns into consistently fast delivery even under heavy load.</p>
<h2>Choose the right runtime blueprint</h2>
<p>The easiest setup is a three-application Watt setup:</p>
<ul>
<li><p><strong>gateway</strong>: Watt’s gateway service, receive and routeincoming traffic.</p>
</li>
<li><p><strong>frontend</strong>: your standard Next.js application</p>
</li>
<li><p><strong>optimizer</strong>: <code>@platformatic/next</code> running in Image Optimizer mode</p>
</li>
</ul>
<p>Watt’s Gateway sends only <code>GET /_next/image</code> requests to the optimizer, while everything else goes to the <code>frontend</code>. This gives you a clear separation without needing a complicated network setup.</p>
<p>For relative image URLs (for example /<code>hero.jpg</code>), the optimizer fetches originals from <code>frontend</code> via runtime service discovery (<code>http://frontend.plt.local</code>). For absolute URLs, it fetches upstream directly.</p>
<p>If you are deploying on Kubernetes, your best bet is to configure your K8s ingress controller to route <code>GET /_next/image</code> to separate pods running the image optimizer. This configuration is supported and documented at <a href="https://docs.platformatic.dev/docs/guides/next-image-optimizer#10-kubernetes-ingress-example-nginx-ingress-controller">https://docs.platformatic.dev/docs/guides/next-image-optimizer#10-kubernetes-ingress-example-nginx-ingress-controller</a>.</p>
<h3><strong>How to set this up</strong></h3>
<p>Start by creating a Watt workspace with three applications: Gateway, frontend, and optimizer. The frontend remains your existing Next.js app; the optimizer is another <code>@platformatic/next</code> app with <code>next.imageOptimizer.enabled: true</code>; Gateway routes image traffic to the optimizer and everything else to the frontend.</p>
<p>Use this structure as a baseline:</p>
<pre><code class="language-plaintext">my-runtime/
 watt.json
 web/
   gateway/
     platformatic.json
   frontend/
     platformatic.json
     package.json
     next.config.js
     app/
   optimizer/
     next.config.js
     platformatic.json
     package.json
</code></pre>
<p>Then configure it in this order:</p>
<ol>
<li><p>Enable image optimizer mode in the <code>optimizer</code> Watt Application.</p>
</li>
<li><p>Set <code>optimizer.next.imageOptimizer.fallback</code> to <code>frontend</code> so relative image URLs are fetched from <code>http://frontend.plt.local</code>.</p>
</li>
<li><p>In Gateway, route only <code>GET /_next/image</code> to <code>optimizer</code> and keep all other routes on <code>frontend</code>.</p>
</li>
<li><p>Pick queue storage for your topology:</p>
<ul>
<li><p>memory for local/dev</p>
</li>
<li><p>filesystem for single-node persistent disk</p>
</li>
<li><p>Redis/Valkey for distributed replicas</p>
</li>
</ul>
</li>
<li><p>Tune <code>timeout</code> and <code>maxAttempts</code> using your target SLO and expected image profile.</p>
</li>
</ol>
<p>With this setup, app teams can keep using n<code>ext/image</code> as usual, while platform teams get independent scaling and more control over operations.</p>
<h2>Configuration example</h2>
<p>In your optimizer application config:</p>
<pre><code class="language-plaintext">{
 "$schema": "https://schemas.platformatic.dev/@platformatic/next/3.38.1.json",
 "next": {
   "imageOptimizer": {
     "enabled": true,
     "fallback": "frontend",
     "timeout": 30000,
     "maxAttempts": 3,
     "storage": {
       "type": "valkey",
       "url": "redis://localhost:6379",
       "prefix": "next-image:"
     }
   }
 }
}
</code></pre>
<p>And in your Gateway config, route only the image endpoint:</p>
<pre><code class="language-plaintext">{
 "$schema": "https://schemas.platformatic.dev/@platformatic/gateway/3.0.0.json",
 "gateway": {
   "applications": [
     {
       "id": "frontend",
       "proxy": {
         "prefix": "/",
         "routes": ["/*"]
       }
     },
     {
       "id": "optimizer",
       "proxy": {
         "prefix": "/",
         "routes": ["/_next/image"],
         "methods": ["GET"]
       }
     }
   ]
 }
}
</code></pre>
<h2>Storage choices: what to use and when</h2>
<ul>
<li><p><strong>memory</strong>: local development or simple single-instance setups.</p>
</li>
<li><p><strong>filesystem</strong>: single-node deployment with persistent disk.</p>
</li>
<li><p><strong>redis/valkey</strong>: distributed production environments with shared queue state.</p>
</li>
</ul>
<p>If you do not specify storage, memory is used by default.</p>
<p>For production multi-instance deployments, Redis/Valkey is usually the best default because it gives shared queue state and predictable behaviour across replicas.</p>
<h2>Failure handling and reliability</h2>
<p>Optimization runs through a queue with explicit timeout and retry controls:</p>
<ul>
<li><p><code>timeout</code> sets the fetch/optimization budget per job.</p>
</li>
<li><p><code>maxAttempts</code> controls the automatic retry count.</p>
</li>
</ul>
<p>When retries are exhausted, the service returns a <code>502 Bad Gateway</code> response, keeping failure behaviour explicit, observable, and easier to alert on.</p>
<h2>Try it today</h2>
<p>If you are self-hosting Next.js and want predictable image performance under load, this capability gives you a practical path that does not require re-architecting your app:</p>
<ol>
<li><p>keep your frontend app unchanged,</p>
</li>
<li><p>stand up a dedicated optimizer Watt Application,</p>
</li>
<li><p>route only <code>/_next/image</code> through Watt’s Gateway service,</p>
</li>
<li><p>pick the storage backend that matches your deployment model.</p>
</li>
</ol>
<p>This is a small architectural change with a big benefit: better frontend stability, simpler operations, and image performance that scales when you need it.</p>
<p>If you want to deliver faster and more reliable user experiences as your traffic grows, dedicated image optimization is one of the best upgrades you can make with minimal disruption.</p>
<p>Read more:</p>
<ul>
<li><p><a href="https://github.com/platformatic/platformatic/pull/4605">PR #4605: Added image optimizer capability</a></p>
</li>
<li><p><a href="https://github.com/platformatic/platformatic/blob/next-image/docs/guides/next-image-optimizer.md">Run Next.js Image Optimizer as a Dedicated Service</a></p>
</li>
<li><p><a href="https://github.com/platformatic/platformatic/blob/next-image/docs/reference/next/image-optimizer.md">Next.js Image Optimizer reference in Platformatic docs</a></p>
</li>
<li><p><a href="https://github.com/platformatic/image-optimizer">@platformatic/image-optimizer</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[We brought Skew Protection to your Kubernetes]]></title><description><![CDATA[We're excited to share a new experimental feature for Platformatic: Skew Protection in the Intelligent Command Center (ICC). This brings Vercel-style deployment safety to Kubernetes, letting you deplo]]></description><link>https://blog.platformatic.dev/skew-protection-for-kubernetes</link><guid isPermaLink="true">https://blog.platformatic.dev/skew-protection-for-kubernetes</guid><category><![CDATA[Kubernetes]]></category><category><![CDATA[Node.js]]></category><category><![CDATA[Devops]]></category><dc:creator><![CDATA[Marco Piraccini]]></dc:creator><pubDate>Thu, 05 Mar 2026 15:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/c5641a09-c34f-490b-a878-425b317c25b3.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We're excited to share a new experimental feature for Platformatic: <strong>Skew Protection</strong> in the Intelligent Command Center (ICC). This brings Vercel-style deployment safety to Kubernetes, letting you deploy without downtime and avoid version-mismatch problems.</p>
<p>You can think of this as akin to Vercel’s Skew Protection functionality, but running right in your existing Kubernetes setup: no migration or changes to your CI/CD pipeline or security policies needed, just out-of-the-box version pinning for your frontend applications.</p>
<h2>The Problem: Version Skew in Kubernetes</h2>
<p>When you update a web application, users who loaded the old frontend might send requests to the new backend. This is called “version skew,” and it can cause problems if APIs, assets, or data schemas have changed. For example, if you rename a form field, old clients might still send data using the old field name.</p>
<p>This problem matters even more for modern frontend apps, where the same codebase runs on both the client and server. Frameworks like Next.js, Remix, and monorepos often share TypeScript types, API definitions, or business logic between frontend and backend. If these shared parts change between versions, it can cause serious issues:</p>
<ul>
<li><p><strong>Hydration Errors and Broken UI: React Server Components</strong> tightly couples client and server in a single deployment; when a new version goes live, the server produces updated RSC payloads that older client bundles still in users' browsers cannot reconcile, causing hydration errors and broken UI</p>
</li>
<li><p><strong>API contract violations</strong>: OpenAPI or protobuf definitions change between versions, leading to serialization/deserialization failures</p>
</li>
<li><p><strong>Type discrepancies</strong>: Shared TypeScript interfaces or zod schemas break when frontend and backend versions diverge, causing runtime errors.</p>
</li>
<li><p><strong>Codependent features</strong>: Frontend components that rely on backend-specific functionality fail when that functionality changes or is removed</p>
</li>
</ul>
<p>The implications for your users are fairly straightforward: some might see API errors, missing fields, or broken features if their client and server versions don’t match; others might see data loss or corruption when schemas change across app versions. All this ultimately puts a load on support teams, who often need to coordinate across multiple feature teams to effectively untangle and ultimately resolve these issues.</p>
<p>Outside of the obvious impact on users (and revenue), k8s version skew is another example of how distributed systems, if not operated with the proper guardrails, actually impede developer velocity. In a world that is increasingly reliant on using AI to write code, the bottleneck is no longer the ability to write lines of code (if it ever was), but what happens between when your code is written and when it actually gets to production.</p>
<p>Version Skew in Kubernetes is a perfect example of such a problem - you have teams that are capable of shipping much faster, but without the right guardrails, the entire system actually moves slower and fails more often: fear of committing breaking changes leads to larger, less-frequent deployments that carry more risk and slow down your time-to-market.</p>
<h2>The Solution: ICC Skew Protection</h2>
<p>Platformatic’s new skew protection feature, built into the Intelligent Command Center, makes sure users stay on the version they started their session with, even when new versions are deployed. If a user starts a session on version N, all their requests during that session go to version N.</p>
<h3><strong>How It Works</strong></h3>
<p>Skew protection uses the <a href="https://gateway-api.sigs.k8s.io/">Kubernetes Gateway API</a> for version-aware routing, with ICC acting as the control plane. Each application version runs as a separate, immutable Kubernetes Deployment that users create themselves using standard Kubernetes workflows.</p>
<p>When applications run, ICC automatically detects new versions via label-based discovery and manages routing rules. ICC creates and maintains HTTPRoute resources that route requests based on session cookies, using a  <code>__plt_dpl</code>  cookie to pinusers to their deployment version.</p>
<p>When a new version is deployed, the previous version transitions to “draining” mode: existing sessions continue to work, while new sessions go to the active version. ICC monitors traffic activity and automatically cleans up old versions after configured grace periods.</p>
<h3>Key Platformatic Components</h3>
<p><strong>Platformatic Watt</strong> is the Node.js application server that runs your application as a worker thread inside of Kubernetes . This allows for improved performance, resiliency, and compute efficiency, as well as providing out-of-the-box features such as hot reloading, health checks, and metrics collection.</p>
<p><strong>watt-extra</strong> is an extension layer that sits on top of Platformatic Watt and serves as the bridge between your application and ICC. On startup, watt-extra connects to ICC and registers the application with its metadata (pod ID, app name, version). This registration enables ICC to:</p>
<ul>
<li><p>Discover the application’s Kubernetes labels (<code>app.kubernetes.io/name, plt.dev/version</code>)</p>
</li>
<li><p>Manage autoscaling using real-time, Node.js-specific metrics</p>
</li>
<li><p>Implement version-aware routing for skew protection</p>
</li>
<li><p>Monitor health and performance,</p>
</li>
</ul>
<p><strong>System Architecture</strong></p>
<p>The skew protection system consists of four layers. Each application version is a completely separate K8s Deployment, and the Kubernetes Gateway API handles routing at the ingress level based on <code>HTTPRoute</code> rules managed by ICC.</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/36fc310d-5bdb-475a-9ca9-f6597b1e440c.png" alt="" style="display:block;margin:0 auto" />

<h3>Component Breakdown</h3>
<p><strong>Client Layer</strong></p>
<ul>
<li><p><strong>Browser Session A (cookie: __plt_dpl=dep-v42)</strong>: A user who started their session on version 42. The <code>__plt_dpl</code> cookie pins their requests to that version, making sure the requests are routed to the correct backend even after newer versions are deployed.</p>
</li>
<li><p><strong>Browser Session B (cookie: __plt_dpl=dep-v43)</strong>: A user who started their session on version 43. Their requests are routed to the active version based on their cookie.</p>
</li>
<li><p><strong>New Visitor (no deployment cookie)</strong>: A first-time user or someone without a version cookie. Their first request is routed to the current active version, and they receive a cookie that pins them to that version.</p>
</li>
</ul>
<p><strong>Gateway API Layer</strong></p>
<ul>
<li><p><strong>GatewayClass</strong>: Defines a template or class of gateways (e.g., Envoy Gateway, Contour, or Cilium) that can process Gateway API resources. Each cluster operator configures this with their preferred controller.</p>
</li>
<li><p><strong>Gateway Resource</strong>: The actual gateway instance that listens on HTTP/HTTPS ports and processes incoming traffic. It contains listener configurations for TLS termination and routing.</p>
</li>
<li><p><strong>HTTPRoute</strong>: Managed by ICC, this is the key routing rule that implements version-aware routing. It contains multiple rules: cookie-based matches for draining versions and a default rule that sets a cookie for new visitors and routes to the active version.</p>
</li>
</ul>
<p><strong>ICC (Intelligent Command Center) - Namespace: platformatic</strong></p>
<ul>
<li><p><strong>Control Plane Service</strong>: The core component responsible for version detection, HTTPRoute management, and lifecycle decisions. When watt-extra registers a new pod, the control plane discovers the application name and version. It holds the version registry and creates/updates/deletes HTTPRoute resources as needed.</p>
</li>
<li><p><strong>PostgreSQL</strong>: Stores the persistent state for skew protection, including the version registry with full metadata about each deployment (version string, timestamps, K8s resources), deployment history for audit trails, and per-application skew protection policies.</p>
</li>
</ul>
<p><strong>App Versions - Namespace: myapp</strong></p>
<ul>
<li><p><strong>Deployment: myapp-v42 (draining)</strong>: A Kubernetes Deployment for the previous version (42) that is being phased out. It has its own Service and pods running Watt with watt-extra. Traffic only routes here for users whose cookies match this version.</p>
</li>
<li><p><strong>Deployment: myapp-v43 (active)</strong>: The current active version deployment. It has multiple replicas for high availability. New visitors and users without matching cookies are routed here. ICC’s autoscaler works across all deployed versions, provisioning the correct amount of resources for each version based on actual traffic.</p>
</li>
<li><p><strong>Service</strong>: Each version has its own Kubernetes Service that selects pods with the corresponding <code>plt.dev/version</code> label. These Services are referenced by the HTTPRoute’s backendRefs.</p>
</li>
<li><p><strong>Pods (Watt + watt-extra)</strong>: Each pod runs the application container (Platformatic Watt runtime) plus watt-extra. watt-extra is the ICC agent that connects to ICC on startup and registers the pod. It sends the pod ID, and ICC discovers the version and deployment metadata through Kubernetes APIs. watt-extra also reports metrics to ICC for autoscaling and health monitoring.</p>
</li>
</ul>
<p><strong>Observability Layer</strong></p>
<p><strong>Prometheus</strong>: Collects metrics from all pods and services. ICC queries Prometheus to monitor traffic patterns for each version, track request rates for draining versions, and uses that data to determine when versions should be transitioned to Expired status (meaning services that received no traffic for the pre-configured grace period).</p>
<h2>How It All Works Together</h2>
<p>When a new application version is deployed:</p>
<ol>
<li><p>You deploy a new version of your app with the same <code>app.kubernetes.io/name</code> label and a new <code>plt.dev/version</code> label.</p>
</li>
<li><p>watt-extra registers the new pods with ICC, which detects the new version from the labels.</p>
</li>
<li><p>ICC makes the new version Active and moves the previous one to Draining. It updates the Gateway routing rules so that new sessions go to the active version, while existing sessions with a version cookie keep going to the draining version.</p>
</li>
<li><p>ICC monitors traffic on draining versions. Once there is no traffic, or the grace period elapses, ICC expires the old version — removing its routing rules and scaling it to zero, and optionally deleting the old Deployment and Service.</p>
</li>
</ol>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/3241ed07-c284-4b94-8ce7-8fbb10041116.png" alt="" style="display:block;margin:0 auto" />

<h2>The Deployment Lifecycle in Detail</h2>
<p>When managing multiple versions, skew protection uses a well-defined state machine to guarantee flawless transitions:</p>
<ul>
<li><p><strong>Active</strong> → The current version serving new sessions. Exactly one version per application is Active at a time. The HTTPRoute’s default rule points to the Active version’s Service, and new visitors receive a cookie pinning them to this version.</p>
</li>
<li><p><strong>Draining</strong> → When a newer version is detected and becomes Active, the previous version transitions to Draining. No new sessions are assigned to it, but existing sessions with version-pinning cookies continue to be served. ICC monitors traffic activity for draining versions to determine when they can be safely retired.</p>
</li>
<li><p><strong>Expired</strong> → A version transitions to Expired when it has zero traffic over the traffic window (default: 30 minutes) or when the grace period elapses (default: 24 hours), whichever comes first. ICC then removes the version’s matching rules from the HTTPRoute, scales the Deployment to zero replicas via the autoscaler, and optionally deletes the Deployment and Service (if auto-cleanup is enabled).</p>
</li>
</ul>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/88c464b9-183f-4548-8b0c-44a57eadcfc7.png" alt="" style="display:block;margin:0 auto" />

<p>The ICC uses Version Labels to determine state. Version labels are opaque strings andcan be numbers, semver, git SHAs, or any identifier that fits your workflow. ICC does not parse or compare them; it just treats the most recently detected version as Active.</p>
<p><strong>How users deploy a new version:</strong></p>
<ol>
<li><p>Build a new container image with the updated application code (e.g., <code>myapp:v43</code>)</p>
</li>
<li><p>Create a new K8s Deployment and Service with:</p>
<ul>
<li><p>Same <code>app.kubernetes.io/name</code> label (e.g., <code>myapp</code>) — this tells ICC it’s the same application</p>
</li>
<li><p>New <code>plt.dev/version</code> label (e.g., <code>43</code>) — this tells ICC it’s a new version</p>
</li>
<li><p>New Deployment name (e.g., <code>myapp-v43</code>) and matching Service name</p>
</li>
</ul>
</li>
<li><p>Apply the manifest: kubectl apply -f myapp-v43.yaml</p>
</li>
<li><p>ICC automatically detects the new version when pods start and watt-extra registers with ICC. The new version becomes Active, and the previous version begins draining.</p>
</li>
</ol>
<h3><strong>Getting Started with ICC</strong></h3>
<p>Platformatic’s skew protection is built into the Intelligent Command Center (ICC), a complete control plane for managing <a href="http://node.js">Node.js</a> applications or agents running in Kubernetes,with autoscaling, monitoring, and version-aware routing.</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/1298be19-1dd1-47aa-98df-45a1163e7afb.png" alt="" style="display:block;margin:0 auto" />

<p><strong>To get started with ICC:</strong></p>
<ul>
<li><p><strong>Install ICC</strong> on your Kubernetes cluster. Follow our <a href="https://icc.platformatic.dev/installation/">Installation Guide</a> for step-by-step instructions, covering infrastructure requirements (Kubernetes, PostgreSQL, Valkey, Prometheus) and installation options.</p>
</li>
<li><p><strong>Deploy your first application</strong> using the standard ICC workflow:</p>
<ul>
<li><p>Add <code>@platformatic/watt-extra</code> to your app</p>
</li>
<li><p>Set <code>PLT_ICC_URL</code> so your app can register with ICC</p>
</li>
<li><p>Deploy with <code>kubectl apply</code> or your existing CI/CD pipeline</p>
</li>
</ul>
</li>
<li><p><strong>Enable Skew Protection</strong>:</p>
<ul>
<li><p>Enable <code>PLT_FEATURE_SKEW_PROTECTION</code></p>
</li>
<li><p>Ensure Gateway API CRDs are installed (Kubernetes 1.27+)</p>
</li>
<li><p>Deploy a Gateway API-compatible controller (Envoy Gateway, Contour, Cilium, Traefik, NGINX Gateway Fabric or Kong). See the <a href="https://icc.platformatic.dev/skew-protection/prerequisites/#compatible-controllers">Compatible Gateways in ICC documentation</a></p>
</li>
<li><p>Configure deployment labels:</p>
</li>
</ul>
</li>
</ul>
<pre><code class="language-plaintext">labels:
  app.kubernetes.io/name: myapp
   plt.dev/version: "43"
   # Optional: custom path prefix (default: /myapp)
   # plt.dev/path: "/api/leads"
   # Optional: hostname for HTTPRoute
   # plt.dev/hostname: "myapp.example.com"
</code></pre>
<h3>Bring Vercel-Grade Deployment Safety to Your Kubernetes Environment</h3>
<p>Platformatic’s skew protection is now available in ICC. It provides zero-downtime deployments and version-aware routing that keep each user session consistent.</p>
<p>If your team wants to try it in a real enterprise setup, send a message to <a href="https://www.linkedin.com/in/lucamaraschi/">Luca Maraschi</a> or <a href="https://www.linkedin.com/in/matteocollina/">Matteo Collina</a> via DMs on LinkedIn, or contact <a href="mailto:info@platformatic.dev">info@platformatic.dev</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Building an Auditable AI Gateway with Platformatic Watt]]></title><description><![CDATA[Every engineering team that adopts AI quickly hits the same wall: a simple provider integration that worked for a demo turns into an operational bottleneck at scale. Tracking usage, containing costs, ]]></description><link>https://blog.platformatic.dev/auditable-ai-gateway</link><guid isPermaLink="true">https://blog.platformatic.dev/auditable-ai-gateway</guid><category><![CDATA[Node.js]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Paolo Insogna]]></dc:creator><pubDate>Wed, 04 Mar 2026 15:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/84034626-b07c-4b0e-b329-0af73a74b5b9.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every engineering team that adopts AI quickly hits the same wall: a simple provider integration that worked for a demo turns into an operational bottleneck at scale. Tracking usage, containing costs, and keeping an audit trail across growing models and teams can slip out of reach fast. AI features are moving fast, but production teams still need the same thing they have always needed: not just control, but auditability.</p>
<p>That is exactly what ai-gateway-auditable delivers: an OpenAI-compatible gateway built with <a href="https://docs.platformatic.dev/">Platformatic Watt</a> that combines provider routing, fallback resiliency, and durable audit logging to S3.</p>
<p>For production teams, this translates directly into risk reduction and regulatory readiness: your audit trail is always preserved, and resilient routing keeps incidents contained. In real terms, this leads to fewer lost logs or broken provider integrations (and fewer 3 a.m. pages as a result), and reliable evidence when you need to answer compliance or security reviews.</p>
<p>This architecture is not only production-ready, but already operating a scale for one of our early adopters. One application (proxy) serves traffic, while another (audit worker) persists audits, and a durable queue between them keeps latency low while preserving records, using the filesystem to provide durability. This same early-adopter halved its application latency using this pattern with Watt. With clear audit trails and resilient traffic handling, they were able to trace errors quickly and keep their on-call load under control, while giving their LLM-enabled end-users performance that approached parity with direct API calls, which was critical for serving their real-time use cases.</p>
<p>Source code: <a href="https://github.com/platformatic/ai-gateway-auditable">github.com/platformatic/ai-gateway-auditable</a></p>
<h2><strong>Why this matters now</strong></h2>
<p>The direct integration pattern is usually the first-stop for teams, but often leads to audit-trace gaps. Finance needs clean attribution by key or team, security needs auditable traces of model interactions, and product needs stronger uptime when upstream providers degrade.</p>
<p>As a real-world example, our same early adopter saw this with their initial production rollout, which missed up to 15% of request logs during peak volume, and causing request latency to spike by more than 2x when provider response times flared. At the same time, you want a single, stable integration surface instead of scattering provider-specific logic across multiple services. An AI gateway is where all your needs converge into a single, manageable control point.</p>
<p>With ai-gateway-auditable, every request has a clear path, every response is traceable, and fallback behavior is visible instead of opaque.</p>
<h2><strong>Why Watt</strong></h2>
<p>Platformatic Watt is well-suited to this pattern because it lets us run the API-facing proxy and the audit worker as separate applications with a shared operational model, using them as worker threads. That separation is the foundation of reliability here: the proxy can stay focused on low-latency responses, while the worker can focus on durable queue consumption, batching, and S3 shipping.</p>
<p>Most importantly, this design is tolerant of worker crashes. Watt supervises applications (worker threads), so if an audit worker crashes, it is automatically restarted, and unhealthy workers are automatically replaced. During that window, the proxy can keep accepting requests and persisting audit jobs in FileStorage. When the replacement worker is up, it resumes consuming from the same queue path and drains pending jobs.</p>
<p>The result is graceful degradation rather than data loss: temporary worker failures increase audit lag but do not break the request path or discard audit events. This distinction is critical from a business perspective. Losing audit data can put regulatory compliance at risk and expose the company to possible fines or a loss of trust, while a short delay in audit processing only postpones analysis or reporting. In other words, our design trades brief insight delays for the certainty that no evidence is lost.</p>
<h2><strong>Why filesystem-based storage</strong></h2>
<p>We use filesystem-backed queue storage on purpose. Writing audit jobs to local disk is crash-tolerant because queued data survives process failures and restarts, unlike in-memory buffers.</p>
<p>It also keeps resource usage and request-path performance under control. We do not need to retain full audit payloads in memory awaiting for remote writes, and we do not put every request on the critical path of an external storage service. That removes network latency and remote availability as immediate blockers to request handling, while still providing durable buffering before batches are shipped to S3.</p>
<h2><strong>Architecture at a glance</strong></h2>
<p>The system runs as two applications (threads) inside of <a href="https://docs.platformatic.dev/">Platformatic Watt</a>, the Node.js application server.</p>
<img src="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/f631ae19-c62d-4d5a-b2e5-fef16f32d9d6.png" alt="" style="display:block;margin:0 auto" />

<p>The proxy is optimized for low-latency request/response flow, while the audit-worker is optimized for durability, retries, and batch shipping. Keeping these concerns separate avoids a common failure mode: heavy audit I/O slowing down user-facing traffic.</p>
<p>How do the two applications communicate? Through the same FileStorage queue path on disk. proxy writes audit jobs to ./data/queue at the same rate as local queue operations, and audit-worker consumes those jobs independently in the background. This gives you explicit producer/consumer decoupling: the request path does not wait for S3 uploads, retries, or batch rotation. If the worker restarts, queued jobs remain on disk and are resumed when it comes back. If S3 is slow or temporarily unavailable, jobs continue to accumulate durably in the queue instead of being lost or pushing latency back to callers.</p>
<p>In other words, even when storage is under pressure or S3 is temporarily unavailable, the gateway can keep serving requests while the audit pipeline catches up safely in the background.</p>
<h2><strong>What the gateway gives you</strong></h2>
<p>At a product level, this gateway provides four strong guarantees:</p>
<ol>
<li><p><strong>OpenAI Completions compatible endpoint</strong> (/v1/chat/completions) for clients and SDKs.</p>
</li>
<li><p><strong>Model-based routing with fallback</strong> across providers.</p>
</li>
<li><p><strong>Complete request/response audit records</strong> for every successful exchange.</p>
</li>
<li><p><strong>Durable archival to S3</strong> with batched JSONL files partitioned by time (JSON Lines is a text file format where each line is a valid, independent JSON object, separated by newline characters).</p>
</li>
</ol>
<p>This means reduced provider lock-in, minimized operational risks, and heightened observability.</p>
<h2><strong>Service responsibilities</strong></h2>
<p>The key behavior is role decoupling: proxy only produces queue jobs, while audit-worker handles all downstream storage and shipping work.</p>
<h3><strong>proxy (external entrypoint)</strong></h3>
<p>proxy exposes:</p>
<ul>
<li><p><code>GET /health</code></p>
</li>
<li><p><code>POST /v1/chat/completions</code></p>
</li>
</ul>
<p>For each request, it:</p>
<ol>
<li><p>Selects a provider chain based on model routing rules.</p>
</li>
<li><p>Executes upstream calls with fallback on retryable failures.</p>
</li>
<li><p>Returns the upstream response to the client.</p>
</li>
<li><p>Enqueues an audit payload into the shared durable queue.</p>
</li>
</ol>
<h3><strong>audit-worker (internal service)</strong></h3>
<p>audit-worker is an internal Node application with no HTTP API (hasServer = false).</p>
<p>It owns the full audit persistence path:</p>
<ul>
<li><p>queue consumption with @platformatic/job-queue</p>
</li>
<li><p>durable local buffering with FileStorage</p>
</li>
<li><p>batched JSONL writing</p>
</li>
<li><p>S3 uploads signed with AWS SigV4.</p>
</li>
</ul>
<p>Queue settings used in the current implementation:</p>
<ul>
<li><p><code>concurrency: 1</code></p>
</li>
<li><p><code>maxRetries: 3</code></p>
</li>
<li><p><code>resultTTL: 60_000</code></p>
</li>
<li><p><code>visibilityTimeout: 30_000</code></p>
</li>
</ul>
<p>This is optimized for predictable sequential writes and safe retry semantics. Filesystem queue storage is chosen because it needs no external setup (no Redis/Valkey), making local development and single-node production rollouts much simpler. At the same time, it still provides crash resilience: queue state is persisted to disk, so in-flight and pending audit jobs survive process restarts.</p>
<p>That combination is the key trade-off here: you gain operational simplicity and zero external dependencies, without sacrificing durability for the audit trail. Note that adopting the file system exposes teams to the risk of data loss. Moving the auditability trail back to the main response cycle will introduce latency and cause a hard failure if the audit cannot be completed. The tradeoff, as always, is in the hands of engineers: availability or consistency?</p>
<h2><strong>Routing and fallback configuration</strong></h2>
<p>Routing lives in providers.json and uses two lists:</p>
<ul>
<li><p>providers: upstream connection and adapter definitions</p>
</li>
<li><p>routing: per-model routing rules with ordered provider chains</p>
</li>
</ul>
<pre><code class="language-javascript">{
 "providers": [
   {
     "id": "openai",
     "type": "openai",
     "baseUrl": "https://api.openai.com",
     "apiKey": "{OPENAI_API_KEY}"
   },
   {
     "id": "anthropic",
     "type": "anthropic",
     "baseUrl": "https://api.anthropic.com",
     "apiKey": "{ANTHROPIC_API_KEY}"
   }
 ],
 "routing": [
   {
     "id": "gpt-4o",
     "providers": ["openai"],
     "strategy": "fallback"
   },
   {
     "id": "claude-sonnet-4-6",
     "providers": ["anthropic"],
     "strategy": "fallback"
   },
   {
     "id": "*",
     "providers": ["openai"],
     "strategy": "fallback"
   }
 ]
}
</code></pre>
<p>Environment variables like <code>{OPENAI_API_KEY}</code> are resolved from process env at startup.</p>
<p>Fallback behavior is explicit and policy-driven: by exposing a clearly configurable list of retryable statuses, teams can align gateway failover with internal governance or incident playbooks. For example, you can tune which upstream failures (such as 429, 500, 502, 503, 504) trigger fallback based on your own risk, compliance, or incident response thresholds. This mapping between config and governance means compliance and security teams can review and pre-approve response handling in line with internal standards—a step that accelerates approval and audit-readiness.</p>
<ul>
<li><p>retryable statuses: 429, 500, 502, 503, 504</p>
</li>
<li><p>Connection failures are retryable</p>
</li>
<li><p>Non-retryable responses (400, 401, 403) are returned immediately.</p>
</li>
</ul>
<p>If you want delegated provider orchestration, you can configure OpenRouter as an openai-type provider and route * traffic to it.</p>
<h2><strong>Adapter model: one external contract, many upstreams</strong></h2>
<p>The gateway keeps a single OpenAI-compatible API surface, while adapters normalize provider differences behind the scenes.</p>
<ul>
<li><p>OpenAI adapter supports OpenAI-compatible endpoints, including Azure/OpenRouter-compatible APIs.</p>
</li>
<li><p>The anthropic adapter translates OpenAI chat requests and responses to Anthropic Messages API semantics.</p>
</li>
</ul>
<p>This removes provider-specific branching logic from your application layer.</p>
<h2><strong>Streaming support with full audit fidelity</strong></h2>
<p>Streaming UX matters, so the proxy preserves token-by-token delivery.</p>
<p>For stream: true requests, the proxy:</p>
<ol>
<li><p>Pipes SSE chunks to the client in real time.</p>
</li>
<li><p>Buffers chunks internally.</p>
</li>
<li><p>Reconstructs a complete Chat Completions response.</p>
</li>
<li><p>Emits a single audit record with streamed set to true.</p>
</li>
</ol>
<p>Users get low-latency streaming, and operators still get complete records for replay and analysis.</p>
<h2><strong>Audit record shape</strong></h2>
<p>Each JSONL line is a complete record with request, response, latency, caller hash, status, and routing metadata:</p>
<pre><code class="language-json">{
 "id": "a8f3b2c1-...",
 "timestamp": "2026-03-03T11:44:00.000Z",
 "duration_ms": 1243,
 "request": {
   "model": "gpt-4o",
   "messages": [{ "role": "user", "content": "Hello" }]
 },
 "response": {
   "id": "chatcmpl-...",
   "choices": [{ "message": { "role": "assistant", "content": "Hi!" } }]
 },
 "upstream_status": 200,
 "caller": "7a3f2b1c",
 "streamed": false,
 "routing": {
   "model": "gpt-4o",
   "planned_providers": [{ "id": "openai", "status": 200, "duration_ms": 1200 }],
   "used_provider": "openai"
 }
}
</code></pre>
<p>The caller is an 8-character SHA-256 prefix of the bearer token value, so attribution is possible without storing raw API keys.</p>
<h2><strong>Durable audit pipeline in detail</strong></h2>
<p>Inside the request path, proxy enqueues each payload using the request ID as the job ID, which naturally supports deduplication when IDs repeat.</p>
<p>audit-worker consumes those jobs and writes them into local JSONL batches before upload.</p>
<p>The writer then:</p>
<ol>
<li><p>Appends each record as one JSON line to a local batch file using flush semantics.</p>
</li>
<li><p>Rotates to a new batch when the size or time threshold is reached.</p>
</li>
<li><p>Uploads the batch file to S3 using undici and SigV4 headers.</p>
</li>
<li><p>Deletes local batch files only after successful upload.</p>
</li>
</ol>
<p>Current thresholds:</p>
<ul>
<li><p><code>BATCH_SIZE = 100</code></p>
</li>
<li><p><code>FLUSH_INTERVAL_MS = 5000</code></p>
</li>
</ul>
<p>S3 object keys are hour-partitioned for downstream querying:</p>
<p><code>audits/2026/03/03/11/batch-1741003090000-3bb7....jsonl</code></p>
<p>This structure works well with tools like Athena and other data lake pipelines.</p>
<h2><strong>Operating under failure</strong></h2>
<p>The gateway is intentionally designed to degrade gracefully.</p>
<p>Typical architectural components here include the file-backed queue directory (such as ./data/queue), which serves as the communication bridge between the proxy and the audit-worker; single-node deployment support via Platformatic Watt's supervised applications; and a default S3 bucket for audit archives. Core configuration files like providers.json define routing logic and provider chains, while runtime environment variables control credentials and logging. All of these components work together as the durable, fault-tolerant foundation that keeps this architecture reliable at scale. This keeps user-facing availability high while preserving eventual audit consistency.</p>
<h2><strong>Run it locally</strong></h2>
<pre><code class="language-plaintext">git clone https://github.com/platformatic/ai-gateway-auditable.git
cd ai-gateway-auditable
npx wattpm-utils install
docker compose up
</code></pre>
<p>Then call the gateway with any OpenAI-compatible client or a simple curl:</p>
<pre><code class="language-plaintext">curl http://localhost:3042/v1/chat/completions \
 -H 'Content-Type: application/json' \
 -H 'Authorization: Bearer sk-your-key' \
 -d '{
   "model": "gpt-4o",
   "messages": [{"role": "user", "content": "Hello"}]
 }'
</code></pre>
<h2><strong>Final take</strong></h2>
<p>ai-gateway-auditable is a practical pattern for teams that need to move fast with AI and still satisfy the operational norms of production software. It gives you:</p>
<ul>
<li><p>one consistent API surface with clear fallback behavior,</p>
</li>
<li><p>complete and queryable audit trails, and a clean separation between serving traffic and persisting evidence.</p>
</li>
</ul>
<p>If your roadmap includes multi-provider AI, compliance requirements, or strict SRE expectations, this architecture is ready to adopt and extend.</p>
<p>The easiest way to get started is to fork the repo, run the quick-start commands, and see the gateway in action with your own test requests. Try spinning up the service locally and sending a sample call: this practical step will show you right away how auditable AI operations can be within your own workflow.</p>
<p>Happy building!</p>
]]></content:encoded></item><item><title><![CDATA[Introducing @platformatic/job-queue ]]></title><description><![CDATA[Every backend developer knows the frustration: a key job disappears during a server restart, or duplicate tasks pile up when a client retries a request. Lost work, repeated emails, missing reports: th]]></description><link>https://blog.platformatic.dev/job-queue-reliable-background-jobs</link><guid isPermaLink="true">https://blog.platformatic.dev/job-queue-reliable-background-jobs</guid><category><![CDATA[Node.js]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Tue, 03 Mar 2026 15:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/62bc139e9c913efac56c8de3/6c7c7af8-cdf9-4c65-8472-fcba452e2ca9.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every backend developer knows the frustration: a key job disappears during a server restart, or duplicate tasks pile up when a client retries a request. Lost work, repeated emails, missing reports: these breakdowns always seem to happen when reliability matters most.</p>
<p><a href="https://github.com/platformatic/job-queue">@platformatic/job-queue</a> is a new queue library from Platformatic focused on reliability and operational simplicity. This library is built on a workflow that lets you enqueue jobs and wait for results when needed, making background processing feel just as smooth as calling a function. Alongside this, it provides Node.js teams with a modern API that includes built-in caching, deduplication, retries, and pluggable storage.</p>
<p>In practice, this means you can start with a tiny local setup and then move to a distributed, production-grade deployment without rewriting your application code.</p>
<h2><strong>What makes it different</strong></h2>
<p>Most queue setups force you to stitch together multiple patterns and handle edge cases yourself. @platformatic/job-queue includes those patterns out of the box:</p>
<ul>
<li><p><strong>Deduplication by job id</strong> so repeated enqueue attempts do not create duplicate work.</p>
</li>
<li><p><strong>Request/response support</strong> with enqueueAndWait() when you need async processing but still want a result.</p>
</li>
<li><p><strong>Reliable retries</strong> with configurable attempts and backoff behavior.</p>
</li>
<li><p><strong>Stalled job recovery</strong> via a Reaper that requeues jobs from crashed workers.</p>
</li>
<li><p><strong>Graceful shutdown</strong> ensures in-flight jobs complete before the service stops, reducing lost work during deploys and restarts.</p>
</li>
<li><p><strong>Move fast with safety:</strong> The API is TypeScript-native with typed payloads and results, so you catch errors at compile time and move confidently.</p>
</li>
</ul>
<p>This makes it appropriate for both classic fire-and-forget workloads and RPC-style workloads that require a response. You do not have to pick one model globally: many teams use both in the same system, depending on endpoint and latency requirements. For example, in use cases such as sending emails and notifications, fire-and-forget jobs make sense because results are often not needed immediately and occasional retries can be handled gracefully. On the other hand, workflows such as generating invoices or processing payments may require the caller to wait for a result, making the request/response pattern with enqueueAndWait() a better fit.</p>
<h2><strong>A quick look at the API</strong></h2>
<p>You can use the queue as a producer and consumer in the same process, or split them across services. The API is intentionally small, so the same primitives are easy to apply in monoliths, microservices, and worker pools.</p>
<pre><code class="language-javascript">import { Queue, MemoryStorage } from '@platformatic/job-queue'

const storage = new MemoryStorage()
const queue = new Queue&lt;{ email: string }, { sent: boolean }&gt;({
 storage,
 concurrency: 5
})

queue.execute(async job =&gt; {
 // your business logic
 return { sent: true }
})

await queue.start()

// fire-and-forget
await queue.enqueue('email-1', { email: 'user@example.com' })

// request/response
const result = await queue.enqueueAndWait('email-2', { email: 'another@example.com' }, { timeout: 30_000 })
console.log(result)

await queue.stop()
</code></pre>
<h3><strong>Architecture description</strong></h3>
<p>When you call enqueue(), the producer checks if the job already exists in the storage. If it’s a new job, it's added to the queue with the state “queued,” and the method returns immediately. If the job is a duplicate, the storage returns a duplicate status without creating a new entry.</p>
<img src="https://cdn.hashnode.com/uploads/covers/62bc139e9c913efac56c8de3/2a9a780e-e0bd-47e7-ad7a-7a6e72a11941.png" alt="" style="display:block;margin:0 auto" />

<p>When you call enqueueAndWait(), the producer first subscribes to a notification for that job, then enqueues it. If the job was already processed, it returns the cached result immediately. Otherwise, it waits for a notification from the worker when the job completes (or fails), then fetches the result and returns it.</p>
<img alt="" style="display:block;margin:0 auto" />

<p>The consumer continuously dequeues jobs from the storage using a blocking move operation. When it receives a job, it marks it as “processing” and executes the handler. On success, it stores the result with TTL and marks the job as completed. On failure, it either retries (if attempts remain) or marks the job as failed.</p>
<img alt="" style="display:block;margin:0 auto" />

<p>The producer API supports per-job options such as maxAttempts and resultTTL, which are useful when not all jobs have the same retention or retry requirements. For example, you might keep invoice-generation results longer than low-value notification results, even if they run on the same queue.</p>
<h2><strong>Storage backends for different environments</strong></h2>
<p>@platformatic/job-queue ships with three storage adapters:</p>
<h3><strong>MemoryStorage</strong></h3>
<p>MemoryStorage keeps all queue states in process memory. This makes it ideal for local development, testing, and simple single-instance services where data can be ephemeral.</p>
<pre><code class="language-javascript">import { Queue, MemoryStorage } from '@platformatic/job-queue'
const storage = new MemoryStorage()
const queue = new Queue({ storage })
</code></pre>
<p>Jobs are stored in JavaScript Maps and Sets within the same process. This gives you the lowest latency possible, but means jobs are lost if the process restarts. For development workflows where you restart frequently, this is usually not a concern.</p>
<h3><strong>FileStorage</strong></h3>
<p>FileStorage persists the queue state to the filesystem in JSON format. It works well for simple deployments on a single node where you need persistence but do not want external dependencies like Redis.</p>
<pre><code class="language-javascript">import { Queue, FileStorage } from '@platformatic/job-queue'

const storage = new FileStorage('./queue-data')
const queue = new Queue({ storage })
</code></pre>
<p>The storage writes atomically to prevent corruption, and it maintains separate files for jobs, metadata, and locks. Since it relies on file system locks, it is not suitable for multi-node deployments.</p>
<h3><strong>RedisStorage</strong></h3>
<p>RedisStorage uses Redis (7+) or Valkey (8+) for distributed queue operations. This is the recommended choice for production workloads that require horizontal scaling, leader election, or cross-instance coordination.</p>
<pre><code class="language-javascript">import { Queue, RedisStorage } from '@platformatic/job-queue'
const storage = new RedisStorage({ connectionString: 'redis://localhost:6379' })
const queue = new Queue({ storage })
</code></pre>
<p>RedisStorage leverages Redis data structures for atomic operations:</p>
<ul>
<li><p>Lists for job queues</p>
</li>
<li><p>Sorted sets for delayed job scheduling</p>
</li>
<li><p>Pub/sub for notifications across instances</p>
</li>
<li><p>Lua scripts for atomic state changes</p>
</li>
</ul>
<p>For high availability, RedisStorage also supports Sentinel and Cluster modes for failover and sharding.</p>
<h3>Choosing the right backend</h3>
<img src="https://cdn.hashnode.com/uploads/covers/62bc139e9c913efac56c8de3/303d84e8-9bd9-423f-a90a-19b7d032527d.png" alt="" style="display:block;margin:0 auto" />

<p>Start with MemoryStorage for development, use FileStorage for simple single-node deployments, and choose RedisStorage for production systems that need horizontal scaling.</p>
<h2><strong>Reliability features that matter in production</strong></h2>
<p>The library is designed around the real failure modes of job processing systems.</p>
<p>Visualize this: you deploy a routine patch, and one of your job workers crashes unnoticed. By the next day, 5,000 critical jobs piled up and could have vanished forever. But thanks to built-in recovery, every one of them was automatically rescued. Situations like this are exactly where background processing systems prove their worth, thanks to strong safeguards.</p>
<h3><strong>Recovering stalled jobs</strong></h3>
<p>If a worker crashes while processing a job, the Reaper can detect the stalled work and requeue it after visibilityTimeout.</p>
<pre><code class="language-javascript">import { Reaper } from '@platformatic/job-queue'
const reaper = new Reaper({
 storage,
 visibilityTimeout: 30_000
})
await reaper.start()
</code></pre>
<p>For high availability, the Reaper also supports leader election (with Redis storage), so multiple instances can run safely while only one acts as leader at a time. If the leader goes away, another instance takes over, which helps avoid manual control during incidents.</p>
<h3><strong>Controlled retries and terminal states</strong></h3>
<p>Failed jobs can retry automatically up to maxRetries. When retries are exhausted, errors are persisted as a terminal state so producers can inspect or react programmatically.</p>
<p>This gives you reliable behavior for flaky dependencies, such as third-party APIs: transient failures recover automatically, while permanent failures remain visible and actionable.</p>
<h3><strong>Graceful shutdown</strong></h3>
<p>When stopping a worker, queue.stop() waits for in-flight jobs to finish. This reduces dropped work during deploys and restarts and helps keep queue state consistent across gradual updates. In practice, this means you can safely perform blue/green or canary deployments without worrying about losing in-progress work. Teams can ship changes faster, with the confidence that jobs will complete and customer data will not go missing, even as new versions are rolled out.</p>
<h2><strong>Request/response without building custom plumbing</strong></h2>
<p>One particularly useful capability is enqueueAndWait(). Teams often build this pattern manually on top of queues, but it is already integrated here, including timeout handling and typed errors.</p>
<pre><code class="language-javascript">try {
 const result = await queue.enqueueAndWait('invoice-123', payload, { timeout: 10_000 })
 return result
} catch (error) {
 // handle TimeoutError / JobFailedError, etc.
}
</code></pre>
<p>This is a good fit when work should run in a worker context, but the caller still needs a bounded response path, such as document generation, webhook fan-out, or expensive validation that should not run on an HTTP thread.</p>
<p>You also get explicit queue errors (TimeoutError, JobFailedError, and others), so your application can distinguish among transport problems, worker failures, and business-level errors.</p>
<h2><strong>Getting started</strong></h2>
<p>Install the package:</p>
<pre><code class="language-plaintext">npm install @platformatic/job-queue
</code></pre>
<p>Then choose a backend based on your environment:</p>
<ol>
<li><p>Start with MemoryStorage for local development.</p>
</li>
<li><p>Move to RedisStorage (Redis 7+ or Valkey 8+) for production.</p>
</li>
<li><p>Add Reaper when running multiple workers or when stalled-job recovery is required.</p>
</li>
</ol>
<p>If you already have queue infrastructure in place, one good migration approach is to move one bounded workflow first (for example, email delivery or report generation), validate behavior and observability, and then expand usage across other jobs.</p>
<p>We recommend separating responsibilities into dedicated processes:</p>
<ul>
<li><p><strong>Producer services</strong> enqueue jobs from HTTP handlers or internal events.</p>
</li>
<li><p><strong>Worker services</strong> execute jobs with tuned concurrency.</p>
</li>
<li><p><strong>A Reaper instance</strong> handles stalled-job recovery (or multiple instances with leader election).</p>
</li>
</ul>
<p>This setup lets you scale producers and workers independently. If incoming traffic spikes, add producers; if processing backlog grows, add workers.</p>
<h2><strong>Final thoughts</strong></h2>
<p><code>@platformatic/job-queue</code> is a practical option for Node.js teams that want reliable background processing without having to assemble every reliability feature from scratch. The combination of deduplication, request/response semantics, retries, and pluggable storage makes it flexible enough for both simple jobs and more demanding production workloads. Most importantly, it lets you focus on what matters most: building features and generating value, knowing your background tasks are handled with care. Imagine deployments where you can sleep soundly, confident that every job is accounted for and that no critical work is lost, even during outages. With the right foundation, you are set up not just for peace of mind, but for lasting success as your systems and team continue to grow.</p>
<p>If you are evaluating queue systems for your next service, this is a good time to try it and share feedback with the team (us). Real-world feedback is especially valuable while the project is still young and evolving quickly. If you run into an unexpected edge case or a strange retry failure, please open an issue describing your scenario: we love to fix hard problems. Concrete examples help us improve reliability for everyone!</p>
]]></content:encoded></item><item><title><![CDATA[OpenClaw Proved the Demand. Now Enterprises Need the Infrastructure.]]></title><description><![CDATA[Over the weekend, OpenAI beat out Meta by snagging Peter Steinberger, the creator of OpenClaw, to help build out OpenAI’s story for running agentic workflows in the enterprise. It will be interesting ]]></description><link>https://blog.platformatic.dev/from-openclaw-to-enterprise-agents</link><guid isPermaLink="true">https://blog.platformatic.dev/from-openclaw-to-enterprise-agents</guid><category><![CDATA[Node.js]]></category><category><![CDATA[openai]]></category><category><![CDATA[Kubernetes]]></category><dc:creator><![CDATA[Luca Maraschi]]></dc:creator><pubDate>Fri, 27 Feb 2026 15:44:43 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/63f78b3e207712e9dab049ad/8ef2518e-0fd4-4a4e-ac0c-3f77d2068bb6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Over the weekend, OpenAI beat out Meta by snagging Peter Steinberger, the creator of OpenClaw, to help build out OpenAI’s story for running agentic workflows in the enterprise. It will be interesting to see how OpenAI and Steinberger translate the ideas that made OpenClaw a viral sensation for developers into the very different world of the enterprise. </p>
<p>In this article, I want to break down some of the key design choices that made OpenClaw such a sensation, what the biggest friction points are for enterprises trying to adopt and run agents at scale, and how the open-source work we’ve been doing at Platformatic can bridge that gap. </p>
<h2><strong>Developers and the Path of Least Resistance</strong></h2>
<p>OpenClaw racked up 196,000 GitHub stars, caught the eye of Meta and OpenAI, and got flagged by Gartner as an “unacceptable cybersecurity risk” for enterprises. So what’s really going on?</p>
<p>Let’s first take a look at what made OpenClaw so appealing to everyday developers. Namely, it brought the world of LLMs and agents to where developers were most excited to apply them, i.e., the data and apps on their own machines. (This, interestingly enough, is a common thread between consumers and enterprise teams, which I’ll touch on later.)</p>
<p>Second, it came with a fantastic developer experience out of the box. Because it was built on Node.js, OpenClaw shipped with a rich  ecosystem that let developers hook their agents up to … well, pretty much whatever they wanted, just with a few simple lines of code.  </p>
<h2><strong>Your Agents, Your System</strong></h2>
<p>So what does OpenClaw’s viral appeal teach us about bringing agents to the enterprise?</p>
<p>Well, it turns out, agents are most useful when you run them where they can do useful things. Again, your data, your files, all that good stuff. That’s greatly simplified if your agent runs on your own system, and it’s this simplicity that's largely been missing from most cloud-based Agentic Platforms. </p>
<p>This is because enterprises need something that integrates with the infrastructure they’ve already invested in. </p>
<p>When we talk about the sometimes ambiguous notion of “the enterprise”, what we are really referring to about are teams that have invested years of engineering effort and millions of dollars (both in terms of engineering hours and/or commercial licences) building heavily customized Kubernetes platforms for their teams, replete with observability stacks, CI/CD pipelines, security policies, and compliance systems; all heavily customized to the ergonomics of their developers and domain. So you can imagine how platform teams respond when a new vendor says,</p>
<blockquote>
<p><em>“Great news, agentic AI is here. You just need to adopt this entirely new platform to run it.”</em></p>
</blockquote>
<p>Here’s where Watt comes in: making your existing stack agent-ready.</p>
<h2><strong>Why Node.js Is the Runtime for Agents</strong></h2>
<p>OpenClaw’s architecture is a 390,000-line TypeScript codebase running on Node.js 22 or higher. Its Gateway, the control plane that manages every agent interaction across WhatsApp, Telegram, Slack, Discord, iMessage, and more, is written entirely in JavaScript and TypeScript. It works anywhere Node.js works. </p>
<p>If you’ve ever looked closely at how agents work, this makes a lot of sense. Agents aren’t batch jobs; they are persistent, event-driven processes that keep long WebSocket connections open, respond to messages across multiple channels at once, call external APIs, and manage conversations over time. This is exactly what Node.js was built for. The event loop, the main feature that makes Node.js great for high-concurrency I/O, lets an agent handle many conversations, tool calls, and streaming LLM responses at the same time without needing a separate thread for each connection.</p>
<p>OpenClaw chose Node.js because no other runtime handles this pattern as smoothly. Python would struggle with concurrency. Go could work, but it lacks the rich ecosystem that lets Steinberger build integrations for every major messaging platform in just weeks. The npm ecosystem, such as  Baileys for WhatsApp, grammY for Telegram, discord.js, and Slack’s Bolt SDK, is why a single developer could build something in weeks that would take an enterprise team months.</p>
<h2><strong>Watt: The Primitive that makes your existing stack Agent-Ready</strong></h2>
<p>At its core, Watt implements ideas that are simple to grasp but challenging to execute (elegantly) from an engineering perspective. Namely, we wanted to 1) truly unlock the power of multi-threading for Node.js by running your application as a worker thread within Watt, and  2) provide a universal primitive to run your app across any infrastructure, while making all the NFRs (observability, thread management, etc) “out of the box”.</p>
<p>So - what are the benefits of using tools like Watt to run and manage your agents as isolated worker threads? Let’s do a quick reality check.</p>
<ul>
<li><p>Can you see every long-running, event-driven process in your stack right now? </p>
</li>
<li><p>Do you have automated visibility into which connections are open, what messages are moving, or how your agents scale during spikes in requests?</p>
</li>
</ul>
<p> </p>
<p>If you hesitate, you’re not alone. Most enterprise stacks aren’t built for persistent, event-driven workloads. That’s exactly where agentic AI exposes the cracks.</p>
<p><strong>Long-running operations for agents.</strong> Agents are stateful, as they inherently operate in a “loop”. They that must remain active for hours, days, or even longer, maintaining state, holding connections, and reacting to events across multiple channels. Sub-agents can be spawned on demand to adapt the system on the fly. Watt allows your application to do so in isolated worker threads. Watt manages their full lifecycle of long-running Node.js agents on Kubernetes, including smooth restarts, health monitoring, and resource management, without losing agent state. </p>
<p>For enterprise teams, this brings real improvements: Watt's ability to recycle and self-heal threads means agentic workflows keep running without interruption. </p>
<p>Put another way, if your agent is in the middle of a conversation with a customer, coordinating across Slack and email, and your pod is rescheduled on Kubernetes, you lose your state and frustrate your users. With Watt, we automatically detect service degradation and act accordingly, gracefully hot-swapping threads before Kubernetes (or your customer) notices anything has gone awry. </p>
<p><strong>Out-of-the-Box Observability for Node.js.</strong> The OpenClaw security nightmare was as much about bad defaults as anything. Let’s be honest - configuring security and observability is going to be perceived as a distracting sidequest for an excited developer who wants to <em>just ship</em> (they are called ‘NFRs’ for a reason, after all).  Our workaround was to provide all of this “out-of-the-box” for both devs and the platform teams that look after them.</p>
<p>To this end, Watt’s Intelligent Command Center (and its companion Admin service) provides continuous profiling, event loop monitoring, and application-level metrics, giving DevOps teams and security leaders a clear view of every Node.js process in their cluster. You can’t secure what you can’t see.</p>
<p><strong>Intelligent autoscaling tied to Node.js internals.</strong> Agents often have unpredictable workloads. One agent might be idle for hours, then suddenly need to handle dozens of LLM calls when a user starts a complex workflow. </p>
<p>Watt’s autoscaler understands Node.js event loop metrics, not just CPU and memory, and scales based on real application-level demand. This kind of event-loop aware scaling can deliver strong business results. Application-level autoscaling strategies like this can cut cloud compute costs by 25 percent or more by avoiding overprovisioning during slow periods and preventing slowdowns during traffic spikes. </p>
<p>Put another way: autoscaling on the wrong metrics is expensive, both financially and in terms of performance SLOs.</p>
<p><strong>Enterprise-grade operations without rewrites.</strong> A big driver of adoption for us has been the fact that we don’t ask teams to rewrite their Node.js applications or give up their current infrastructure. </p>
<p>Watt wraps your Node.js app and adds operational features such as profiling, logging, tracing, and scaling, all without code changes. It integrates with your current Kubernetes setup, works with your observability tools, and fits into your deployment workflows. If your team has been building agent features on Node.js, Watt makes those agents ready for production on the infrastructure you already have.</p>
<h3>Watt and the Multi-agent-verse</h3>
<p>Let’s imagine a multi-agent workflow you could put into production next quarter:</p>
<ol>
<li><p>A sales agent gets a message from a customer about a delayed order. </p>
</li>
<li><p>Instead of forwarding the ticket manually, the sales agent automatically works with a logistics-tracking agent to check the shipment status. </p>
</li>
<li><p>If there’s a problem, an incident response agent opens a case in the ITSM system and notifies the customer proactively, all without human intervention. </p>
</li>
<li><p>Your teams see faster response times, fewer dropped tickets, and a better customer experience, and the whole process is auditable from start to finish.</p>
</li>
</ol>
<p>At its core, this is a distributed systems problem, and one that ties back to Node’s core strengths, with its event-driven architecture, streaming capabilities, and unmatched ecosystem for live communication. </p>
<p>But distributed systems also need operational infrastructure. They require monitoring, lifecycle management, security boundaries, and proven operational tools that Matteo and I have spent the last decade building.</p>
<p>OpenClaw showed that a single developer using Node.js can build an agent platform that excites hundreds of thousands of people. Imagine what happens when enterprises bring the same capabilities and add proper security, observability, and operational controls.</p>
<p>What if you could deploy AI agents the same way you deploy microservices, with worker isolation, auto-scaling, health checks, and hot-reload, on infrastructure you already own? Watt could run each agent type as an isolated application with its own worker pool, <a href="https://github.com/nodejs/node/pull/61478">sandboxed filesystem</a>, and tool policy, while a single gateway handles authentication, role-based access control, and routing across Slack, Teams, Telegram, or any HTTP client through an OpenAI-compatible API. No vendor lock-in, no data leaving your network, and the same Node.js runtime your team already knows, just pointed at a harder problem.</p>
<p>That’s the world Watt is making real.</p>
<h2><strong>Time to take the lobster by the claws.</strong></h2>
<p>If you’re leading an enterprise and watching OpenClaw unfold, here’s my take:</p>
<p>Don’t ban agentic AI. The demand is real, and your teams will find workarounds if you try. Instead, invest in the infrastructure that ensures safety. The pull of the ecosystem is strong. Your agent strategy is really a Node.js strategy.</p>
<p>Get your operations in order. You need visibility into long-running Node.js processes, autoscaling that understands the event loop, and lifecycle management for processes that aren’t just stateless web servers.</p>
<p>Start with what you already have. If your teams are running Node.js (and most likely they are), the path to production-ready agents is shorter than you think. Watt is built to meet you where you are.</p>
<p> The OpenClaw moment is just the beginning. Enterprises that build the right infrastructure now will be the ones to take advantage of agentic AI. Those who respond with bans and blocks will spend years trying to catch up.</p>
<p>Node.js made OpenClaw possible. Your cloud investment made your infrastructure real. Watt connects the two, turning them into enterprise-grade platforms that run secure, scalable, and durable agents.</p>
]]></content:encoded></item><item><title><![CDATA[We cut Node.js' Memory in half]]></title><description><![CDATA[V8, the C++ engine under the proverbial hood of JavaScript, includes a feature many Node.js developers aren’t familiar with. This feature, pointer compression, is a method for using smaller memory references (pointers) in the JavaScript heap, reducin...]]></description><link>https://blog.platformatic.dev/we-cut-nodejs-memory-in-half</link><guid isPermaLink="true">https://blog.platformatic.dev/we-cut-nodejs-memory-in-half</guid><category><![CDATA[Node.js]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Tue, 17 Feb 2026 17:00:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771353158818/c7aeeeea-51dd-4243-a2b7-7dc98f4dcb21.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>V8, the C++ engine under the proverbial hood of JavaScript, includes a feature many Node.js developers aren’t familiar with. This feature, pointer compression, is a method for using smaller memory references (pointers) in the JavaScript heap, reducing each pointer from 64 bits to 32 bits. The net is that you wind up using about 50% less memory for the same app, without changing any code. Pretty great, right?</p>
<p>Well, almost. Node.js does not enable Pointer compression by default for two historical reasons.</p>
<p>First, there was the '4 GB cage' limitation, which meant that enabling pointer compression required the entire Node.js process to share a single 4 GB memory space between the main thread and all the worker threads. This was a significant issue. <a target="_blank" href="https://www.cloudflare.com/">Cloudflare</a> and <a target="_blank" href="https://www.igalia.com/">Igalia</a> partner to solve it so that the cage can be per-isolate (an individual instance of the V8 engine).</p>
<p>Next, some worried that compressing and decompressing pointers on each heap access would introduce performance overhead. Cloudflare, Igalia, and the Node.js project collaborated to determine exactly what kind of overhead existed and assess whether it would impact real-world applications.</p>
<p>To test this, we created <a target="_blank" href="https://hub.docker.com/r/platformatic/node-caged">node-caged</a>, a Node.js 25 Docker image with pointer compression turned on, and ran production-level benchmarks on AWS EKS.</p>
<p>In short, we achieved <strong>50% memory savings with only a 2-4% increase in average latency across real-world workloads and reduced P99 latency by 7%</strong>. For most teams, this trade-off is an easy choice.</p>
<h2 id="heading-how-pointer-compression-works"><strong>How Pointer Compression Works</strong></h2>
<p>Every JavaScript object is stored on V8’s heap. Inside, objects point to each other using 64-bit memory addresses on a 64-bit system. For example, an object like { name: "Alice", age: 30 } has several internal pointers: one to its hidden class (shape), one to where its properties are stored, and one to the string “Alice” on the heap.</p>
<p>As you might imagine, all these pointers can add up in a typical Node.js app, taking up a lot of valuable heap space. On a 64-bit system, each pointer uses 8 bytes, even though most V8 heaps are much smaller than the huge address space they could use.</p>
<p>Pointer compression takes advantage of this. Instead of saving full 64-bit memory addresses, V8 stores 32-bit offsets (relative distances from a fixed starting point, called the base address). When reading from the heap (the section of memory where objects are stored), it rebuilds the full pointer by adding the base and the offset. When writing, it compresses the pointer by subtracting the base from the full address.</p>
<p>The trade-off is simple:</p>
<ul>
<li><p><strong>Memory</strong>: Each pointer goes from 8 bytes to 4 bytes. For structures with many pointers—such as objects, arrays, closures, Maps, and Sets—this can reduce memory consumption by around 50%</p>
</li>
<li><p><strong>CPU</strong>: Each heap access now needs one extra addition (for reads) or subtraction (for writes). To put it in perspective, this extra operation is akin to a Level 1 cache hit in terms of computational effort. These are incredibly fast operations, and although millions of them occur every second, their impact is minimal, akin to a gentle ripple in a vast ocean of processing tasks.</p>
</li>
<li><p><strong>Heap limit</strong>: 32-bit offsets can only reach 4GB of memory per V8 isolate (a separate instance of the JavaScript engine with its own memory and execution state). For most Node.js services, which usually use less than 1GB, this isn’t a problem.</p>
</li>
</ul>
<p>Chrome has used pointer compression since 2020, but Node.js hasn't. Previously, using this feature required setting a flag (--experimental-enable-pointer-compression) at compile time, which often felt like an 'expert-only' option for many developers. However, the introduction of node-caged has transformed this, enabling pointer compression with a simple one-line Docker image swap. This substantial simplification opens the door for a much broader audience to experiment with the feature more immediately.</p>
<h2 id="heading-what-changed-isolategroups"><strong>What Changed: IsolateGroups</strong></h2>
<p>Pointer compression has been part of V8 for years. Node.js didn’t use it before, not because of CPU overhead, but because of the memory cage limitation.</p>
<p>Originally, V8’s pointer compression made every isolate in a process share a single “pointer cage”—a 4GB block of memory for all compressed pointers. This meant the main thread and all worker threads had to fit into the same 4GB. In Chrome, where each tab runs in its own process, this worked fine. But for Node.js, where workers share a process, it was a big problem.</p>
<p>In November 2024, <a target="_blank" href="https://github.com/jasnell">James Snell</a> (Cloudflare, Node.js TSC) initiated the endeavor to address this challenge. Cloudflare sponsored Igalia engineers <a target="_blank" href="https://github.com/wingo">Andy Wingo</a> and <a target="_blank" href="https://github.com/dbezhetskov">Dmitry Bezhetskov</a> to introduce a new V8 feature, <strong>IsolateGroups</strong>, which gives each pointer its own compression cage. (You can read more about this feature and work at <a target="_blank" href="https://dbezhetskov.dev/multi-sandboxes/">https://dbezhetskov.dev/multi-sandboxes/</a>.)</p>
<p>The pivotal modification is that multiple IsolateGroups can now exist within a <em>single process</em>, each having its own 4GB cage, thus eliminating the process-wide memory constraint. This work symbolizes a significant collaboration between organizations, showcasing the strength of the open-source ecosystem. Thanks to this work, enabling pointer compression in <a target="_blank" href="http://Node.js">Node.js</a> changed from (shared cage):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771256639194/10aa2521-409b-494c-9d6e-ff34e1de0c7a.png" alt class="image--center mx-auto" /></p>
<p>to (IsolateGroups):</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771256625844/bea3c174-c34c-4155-a651-720c34ce5cbd.png" alt class="image--center mx-auto" /></p>
<p>In V8, the C++ change is simple. Use v8::Isolate::New(group, ...) instead of v8::Isolate::New(...). Now, each worker thread gets its own 4GB heap. The only limit is the system’s available memory.</p>
<p>Snell’s <a target="_blank" href="https://github.com/nodejs/node/pull/60254">Node.js integration</a> landed in October 2025: 62 lines across 8 files. This represents less than one commit's worth of changes across most modules, underscoring the update's maintainability. The code was reviewed and approved by <a target="_blank" href="https://github.com/joyeecheung">Joyee Cheung</a> [Igalia], <a target="_blank" href="https://github.com/targos">Michael Zasso</a> [Zakodium], <a target="_blank" href="https://github.com/Qard">Stephen Belanger</a> [Platformatic], and me [Platformatic]. Cheung also fixed the pointer compression build itself, which had been broken since Node.js 22. I tested with real-world Next.js SSR applications and confirmed a ~50% reduction in heap usage before approving.</p>
<p>This feature still requires a compile-time flag and isn’t in official Node.js builds yet. That’s why we made node-caged.</p>
<h2 id="heading-the-experiment"><strong>The Experiment</strong></h2>
<p>Two of our four configurations use <a target="_blank" href="https://docs.platformatic.dev/watt">Platformatic Watt</a>, our open-source Node.js application server. Watt runs multiple Node.js applications as worker threads (separate execution threads) within a single process, using the Linux kernel's 'SO_REUSEPORT' (a system feature that allows multiple processes to listen on the same network port) to distribute connections directly to workers. No master process, no IPC (Inter-Process Communication) coordination. In previous benchmarks, this eliminated the ~30% performance tax imposed by PM2 and the 'cluster' module through IPC-based load balancing.</p>
<p>We set up a Next.js e-commerce app—a trading card marketplace with 10,000 cards, 100,000 listings, server-side rendering, search, and simulated database delays—on a Kubernetes cluster. We tested four setups, all using the same hardware and app code:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771256657312/29236c71-7620-4419-ad4c-9e99c20ab761.png" alt class="image--center mx-auto" /></p>
<p><strong>Infrastructure</strong>: We used AWS EKS with m5.2xlarge nodes (8 vCPUs, 32GB RAM), 6 replicas for plain Node and 3 replicas for Watt (each with 2 workers, for a total of 6 processes). Both images used the same Debian bookworm-slim base and Node.js 25, so the only difference was the use of pointer compression.</p>
<p><strong>Workload</strong>: We used k6 with a ramping-arrival-rate executor, running 400 requests per second for 120 seconds after a 60-second ramp-up. The traffic was mixed as follows:</p>
<ul>
<li><p>20% homepage (SSR with featured cards, recent listings)</p>
</li>
<li><p>25% search (full-text search with pagination)</p>
</li>
<li><p>20% card detail (individual product page SSR)</p>
</li>
<li><p>15% game category pages</p>
</li>
<li><p>10% games listing</p>
</li>
<li><p>5% sellers listing</p>
</li>
<li><p>5% set detail pages</p>
</li>
</ul>
<p>Each request follows the server-side rendering path. It loads JSON data from disk, applies query filters, renders React components to HTML, and sends the response. We added a simulated 1-5ms database delay to mimic real data access.</p>
<h2 id="heading-the-results"><strong>The Results</strong></h2>
<h3 id="heading-plain-nodejs-standard-vs-pointer-compression"><strong>Plain Node.js: Standard vs Pointer Compression</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771256733959/f9ba4ceb-e681-49a0-ae11-c16b624758d8.png" alt class="image--center mx-auto" /></p>
<p>The average overhead was 2.5%. That translates to approximately 1 ms additional latency on our 40 ms median latency. This is a minor trade-off for cutting memory use in half. But if you look at p99 and max latency, they’re actually <em>lower</em> with pointer compression. A smaller heap means the garbage collector has less work to do, so there are fewer and shorter GC pauses. In these cases, pointer compression doesn’t just keep up—it performs better.</p>
<h3 id="heading-platformatic-watt-2-workers-standard-vs-pointer-compression"><strong>Platformatic Watt (2 workers): Standard vs Pointer Compression</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771256689212/4f8c15fd-9f97-42d7-9fb8-21307286f24b.png" alt class="image--center mx-auto" /></p>
<p>A similar outcome appears here. Average overhead is slightly higher (4.2%), the median remains unchanged, and maximum latency drops by 20% due to reduced garbage collection pressure.</p>
<h3 id="heading-the-full-picture-watt-pointer-compression-vs-baseline"><strong>The Full Picture: Watt + Pointer Compression vs Baseline</strong></h3>
<p>This is the comparison that matters for production decisions. What do you get if you adopt both Watt and pointer compression?</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1771256747676/0f6e3940-e6e7-4c76-a242-905818b453a0.png" alt class="image--center mx-auto" /></p>
<p>Consider this: on average, it’s 15% faster, delivering significant speed gains without requiring code adjustments. This kind of improvement could be likened to the gains typically achieved by rewriting key parts of a system in a more optimized language, such as C++. Not only does it increase p99 latency by 43%, but it also halves memory usage, all for free with minimal effort.</p>
<h2 id="heading-why-the-hello-world-benchmarks-were-misleading"><strong>Why the Hello-World Benchmarks Were Misleading</strong></h2>
<p>Initial tests of pointer compression on a basic Next.js starter app showed a 56% overhead. This outcome was unexpected.</p>
<p>But a simple hello-world SSR page mostly does V8 internal work: compiling templates, diffing the virtual DOM, and joining strings. There’s no I/O, no data loading, and no real app logic. Every operation goes through pointer decompression.</p>
<p>Real applications are different. A typical request spends most of its time on:</p>
<ol>
<li><p><strong>I/O wait</strong>: database queries, cache lookups, API calls to downstream services</p>
</li>
<li><p><strong>Data marshaling</strong>: JSON parsing, response body construction</p>
</li>
<li><p><strong>Framework overhead</strong>: routing, middleware chains, header processing</p>
</li>
<li><p><strong>OS/network</strong>: TCP handling, TLS, kernel scheduling</p>
</li>
</ol>
<p>The V8 heap access that triggers pointer decompression is only one component of the total request time. As the ratio of “real work” to “pure V8 pointer chasing” increases, the overhead of pointer compression shrinks proportionally.</p>
<p>Our e-commerce app includes simulated database delays of 1-5ms, JSON parsing of datasets with 10,000+ records, search filtering, pagination, and full SSR rendering with React. In that context, the pointer decompression overhead rounds to noise.</p>
<p><strong>The takeaway: always use realistic workloads for benchmarking; asmicrobenchmarks can give you the wrong idea. As a challenge to validate these findings, we invite you to try your heaviest endpoint and share your results. This collaborative effort can transform observations into active participation, build trust, and foster community validation of the effectiveness of pointer compression.</strong></p>
<h2 id="heading-the-technical-details-why-gc-gets-better"><strong>The Technical Details: Why GC Gets Better</strong></h2>
<p>The improved tail latencies deserve a deeper explanation. V8’s garbage collector (Orinoco) performs several types of collection:</p>
<ul>
<li><p><strong>Minor GC (Scavenge)</strong>: Copies live objects from the young generation. Time is proportional to the number of live objects and their size.</p>
</li>
<li><p><strong>Major GC (Mark-Sweep-Compact)</strong>: Marks all reachable objects, sweeps dead ones, and optionally compacts. Time depends on the total heap size and the level of fragmentation.</p>
</li>
</ul>
<p>With pointer compression, every object is smaller. This has domino effects:</p>
<ol>
<li><p><strong>Objects fit in fewer cache lines.</strong> A compressed object that fits in a single 64-byte cache line instead of two means the GC’s marking phase generates half as many cache misses while traversing the object graph.</p>
</li>
<li><p><strong>The young generation fills more slowly.</strong> Smaller objects mean more allocations before a minor GC is triggered. Fewer minor GCs per unit of work.</p>
</li>
<li><p><strong>Major GC has less to scan.</strong> A 1GB heap with compressed pointers contains the same logical data as a 2GB heap without. The GC scans half the bytes to process the same application state.</p>
</li>
<li><p><strong>Compaction moves fewer bytes.</strong> When the GC compacts the heap to reduce fragmentation, smaller objects mean less data to copy.</p>
</li>
</ol>
<p>The end result is that GC pauses are both shorter and less frequent. This corresponds to what we saw in the p99 and max latency numbers. When a long-tail request lines up with a GC pause, the pause is now shorter.</p>
<h2 id="heading-what-this-means-for-your-business"><strong>What This Means for Your Business</strong></h2>
<h3 id="heading-cut-your-kubernetes-bill"><strong>Cut Your Kubernetes Bill</strong></h3>
<p>If you run Node.js on Kubernetes with 2GB memory limits per pod, pointer compression lets you cut that to 1GB. You get the same app and performance, but can run twice as many pods per node or use half as many nodes. What would halving pod memory do to your cluster bill? Take a moment to calculate the potential savings based on your current setup and see how much your organization could benefit from implementing pointer compression.</p>
<p>A 6-node m5.2xlarge EKS cluster (at $0.384 per hour per node) costs about $16,600 a year. Dropping to 3 nodes saves $8,300 a year. In a real production fleet with 50 or more nodes, the savings can reach $80,000 to $100,000 a year, all without changing your code.</p>
<p>For platform teams running hundreds of Node.js microservices, these savings add up. Each service has a baseline memory load from the V8 heap, framework, and modules. Pointer compression reduces the baseline across all services simultaneously.</p>
<h3 id="heading-double-your-tenant-density"><strong>Double Your Tenant Density</strong></h3>
<p>Multi-tenant SaaS platforms, where each tenant runs in an isolated Node.js process, hit memory as the binding constraint for density. If each tenant’s worker uses 512 MB, pointer compression reduces it to ~256 MB. That’s 2x tenants per host.</p>
<p>At scale, this changes your costs. If each tenant costs $5 per month for infrastructure and you have 10,000 tenants, cutting memory in half saves $25,000 a month, or $300,000 a year.</p>
<h3 id="heading-unlock-edge-deployment"><strong>Unlock Edge Deployment</strong></h3>
<p>Edge runtimes like Lambda@Edge, Cloudflare Workers, and Deno Deploy have strict memory limits, typically 128MB to 512MB per isolate. Cloudflare sponsored the IsolateGroups work in V8 because their Workers runtime needed pointer compression to support more isolates. Pointer compression can be the difference between your app running at the edge or needing to go back to the origin server.</p>
<p>That matters for revenue. Every 100ms of latency measurably reduces conversion rates. An e-commerce site moving SSR to the edge shaves 50-200ms off TTFB, depending on user location. For a $50M/year business, that latency improvement can translate to hundreds of thousands in incremental annual revenue.</p>
<h3 id="heading-handle-more-concurrent-connections"><strong>Handle More Concurrent Connections</strong></h3>
<p>For WebSocket-based applications (chat, collaboration, live dashboards, gaming), each persistent connection holds state in memory. A server handling 50,000 connections at ~10KB heap per connection uses 500MB. With pointer compression, that drops to ~250MB, allowing the same server to handle 100,000 connections, or halving your WebSocket server fleet.</p>
<h2 id="heading-compatibility-constraints"><strong>Compatibility Constraints</strong></h2>
<p>There is one strict limit: each V8 IsolateGroup’s pointer cage is 4GB. 32-bit compressed pointers can only address 4GB. With IsolateGroups, this limit applies to each isolate, not the whole process. Your main thread gets 4GB, each worker thread gets 4GB, and the total is only limited by your system’s memory.</p>
<p>For most Node.js services, 4GB per isolate is irrelevant. The vast majority of production processes run well under 1GB of heap. If your service genuinely requires more than 4GB of heap per isolate (e.g., large ML model inference, massive in-memory caches, or heavy ETL pipelines), pointer compression is not an option. Note that only the V8 JavaScript heap lives inside the cage; native add-on allocations and ArrayBuffer backing stores do not count against the 4GB limit.</p>
<p>There is one more compatibility constraint: native addons built with the legacy <a target="_blank" href="https://github.com/nodejs/nan">NAN</a> (Native Abstractions for Node.js) won't work with pointer compression enabled. NAN exposes V8 internals directly, and pointer compression changes the internal representation of V8 objects. When you recompile, the ABI is different. Addons built on [Node-API](<a target="_blank" href="https://nodejs.org/api/n-api.html">https://nodejs.org/api/n-api.html</a>) (formerly N-API) are unaffected because Node-API abstracts away V8's pointer layout entirely. The most popular native packages have already migrated: <code>sharp</code>, <code>bcrypt</code>, <code>canvas</code>, <code>sqlite3</code>, <code>leveldown</code>, <code>bufferutil</code>, and <code>utf-8-validate</code> all use Node-API today. The main holdout is <code>nodegit</code>, which still depends on NAN. If you're unsure, check your dependency tree with <code>npm ls nan</code>. If nothing shows up, you're good.</p>
<p>For everyone else—which is most Node.js deployments—there’s nothing to lose.</p>
<h2 id="heading-try-it"><strong>Try It</strong></h2>
<p>It’s a drop-in replacement. You don’t need to change any code.</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Before</span>
<span class="hljs-string">FROM</span> <span class="hljs-string">node:25-bookworm-slim</span>

<span class="hljs-comment"># After</span>
<span class="hljs-string">FROM</span> <span class="hljs-string">platformatic/node-caged:25-slim</span>
</code></pre>
<p>The platformatic/node-caged image is built from the Node.js v25.x branch with --experimental-enable-pointer-compression. It’s the same Node.js, same APIs, and everything else—just with smaller heaps.</p>
<p>Available tags: latest, slim, 25, 25-slim.</p>
<p>Start by testing in staging. Watch your memory usage go down. Make sure your p99 latency stays within your SLO. Then deploy it.</p>
<p>As always, we want to hear from you! Share your results and experience by dropping us a note at <a target="_blank" href="mailto:hello@platformatic.dev">hello@platformatic.dev</a> or by engaging on social media if you’d like to chat about anything you’re building.</p>
<hr />
<p><em>Benchmarks were run on AWS EKS (m5.2xlarge nodes, us-west-2) using k6 with ramping-arrival-rate at 400 req/s sustained. The application is a Next.js 16 e-commerce marketplace with server-side rendering and a JSON-based data layer. Full benchmark infrastructure and results are available in the</em> <a target="_blank" href="https://github.com/platformatic/node-caged"><em>node-caged repository</em></a>. The upstream V8 IsolateGroups feature was implemented by Igalia, sponsored by Cloudflare. Node.js integration by <a target="_blank" href="https://github.com/nodejs/node/pull/60254">James Snell</a>, with build fixes by <a target="_blank" href="https://github.com/joyeecheung">Joyee Cheung</a>. See the <a target="_blank" href="https://github.com/nodejs/node/issues/55735">tracking issue</a> for the full history.</p>
]]></content:encoded></item><item><title><![CDATA[Watt Now Supports TanStack Start]]></title><description><![CDATA[TL;DR
Watt 3.32 introduces first-class support for TanStack Start, the full-stack React framework from the creators of TanStack Query and TanStack Router. We benchmarked TanStack Start on AWS EKS under extreme load (10,000 req/s) and found that Watt ...]]></description><link>https://blog.platformatic.dev/watt-now-supports-tanstack-start</link><guid isPermaLink="true">https://blog.platformatic.dev/watt-now-supports-tanstack-start</guid><category><![CDATA[tanstack]]></category><category><![CDATA[Node.js]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Thu, 29 Jan 2026 15:00:31 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769794378188/3f1d1a60-d2dc-4ef1-a478-8602cae87396.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-tldr"><strong>TL;DR</strong></h2>
<p>Watt 3.32 introduces first-class support for <a target="_blank" href="https://tanstack.com/start">TanStack Start</a>, the full-stack React framework from the creators of TanStack Query and TanStack Router. We benchmarked TanStack Start on AWS EKS under extreme load (10,000 req/s) and found that Watt matches single-process Node.js throughput and improves tail latency by 10%, consistently demonstrating measurable improvements.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769626611818/55d90d06-457e-4f4e-a7fa-0a15a840ef22.png" alt class="image--center mx-auto" /></p>
<p>Both configurations were tested under identical conditions at a 10,000 req/s target load. The following section details the full methodology and raw data.</p>
<hr />
<p>We’re excited to announce that Watt 3.32 adds native support for TanStack Start, bringing the same performance benefits that Next.js users have enjoyed to this rapidly growing full-stack React framework.</p>
<h2 id="heading-what-is-tanstack-start"><strong>What is TanStack Start?</strong></h2>
<p>TanStack Start is a modern full-stack React framework built on top of TanStack Router, Vinxi, and Nitro. It offers:</p>
<ul>
<li><p><strong>Type-safe routing</strong> with first-class TypeScript support</p>
</li>
<li><p><strong>Server functions</strong> for seamless client-server communication</p>
</li>
<li><p><strong>SSR and streaming</strong> out of the box</p>
</li>
<li><p><strong>File-based routing</strong> with nested layouts</p>
</li>
<li><p><strong>Built-in data loading</strong> patterns from the TanStack Query team</p>
</li>
</ul>
<p>For teams already using TanStack Query and TanStack Router, TanStack Start provides a natural progression to full-stack development with familiar patterns and excellent developer experience. Next, we'll explore why running TanStack Start with Watt is a strong architectural choice.</p>
<h2 id="heading-why-watt-for-tanstack-start"><strong>Why Watt for TanStack Start?</strong></h2>
<p>Like Next.js, TanStack Start uses server-side rendering (SSR), which is CPU-bound and poses familiar scaling challenges:</p>
<ol>
<li><p>Node.js runs on a single CPU core by default, underutilizing multi-core servers.</p>
</li>
<li><p>SSR frameworks require the full request context to gauge load, preventing early request rejection.</p>
</li>
<li><p><strong>Event loop blocking</strong>: CPU-intensive rendering can cause the event loop to block, leading to latency spikes.</p>
</li>
</ol>
<p>Watt addresses these with SO_REUSEPORT, distributing connections at the kernel level across workers and removing IPC overhead. To validate this approach, our benchmark methodology is explained below.</p>
<h2 id="heading-benchmark-methodology"><strong>Benchmark Methodology</strong></h2>
<h3 id="heading-infrastructure"><strong>Infrastructure</strong></h3>
<p>All benchmarks ran on AWS EKS (Elastic Kubernetes Service) with the following infrastructure:</p>
<ul>
<li><p><strong>EKS Cluster</strong>: 4 nodes running m5.2xlarge instances (8 vCPUs, 32GB RAM each)</p>
</li>
<li><p><strong>Region</strong>: us-west-2</p>
</li>
<li><p><strong>Load Testing Instance</strong>: c7gn.2xlarge (8 vCPUs, 16GB RAM, network-optimized)</p>
</li>
<li><p><strong>Load Testing Tool</strong>: Grafana k6</p>
</li>
</ul>
<p>The environment was ephemeral, created on demand via shell scripts and the AWS CLI, then torn down after each test run.</p>
<h3 id="heading-software-versions"><strong>Software Versions</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769626649425/81031ef5-0ceb-4bcd-b437-241a6bafc082.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-resource-allocation"><strong>Resource Allocation</strong></h3>
<p>Each configuration received identical total CPU resources:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769626664902/43028a05-994d-457e-95c9-f0bff4b500c9.png" alt class="image--center mx-auto" /></p>
<p>Pods were distributed evenly across all 4 cluster nodes using topologySpreadConstraints.</p>
<h3 id="heading-load-test-configuration"><strong>Load Test Configuration</strong></h3>
<p>We tested under extreme load to stress-test both configurations:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> options = {
 <span class="hljs-attr">scenarios</span>: {
   <span class="hljs-attr">ramping_load</span>: {
     <span class="hljs-attr">executor</span>: <span class="hljs-string">'ramping-arrival-rate'</span>,
     <span class="hljs-attr">startRate</span>: <span class="hljs-number">100</span>,
     <span class="hljs-attr">timeUnit</span>: <span class="hljs-string">'1s'</span>,
     <span class="hljs-attr">preAllocatedVUs</span>: <span class="hljs-number">1000</span>,
     <span class="hljs-attr">maxVUs</span>: <span class="hljs-number">10000</span>,
     <span class="hljs-attr">stages</span>: [
       { <span class="hljs-attr">duration</span>: <span class="hljs-string">'20s'</span>, <span class="hljs-attr">target</span>: <span class="hljs-number">2000</span> },   <span class="hljs-comment">// Ramp to 2,000 req/s</span>
       { <span class="hljs-attr">duration</span>: <span class="hljs-string">'20s'</span>, <span class="hljs-attr">target</span>: <span class="hljs-number">5000</span> },   <span class="hljs-comment">// Ramp to 5,000 req/s</span>
       { <span class="hljs-attr">duration</span>: <span class="hljs-string">'20s'</span>, <span class="hljs-attr">target</span>: <span class="hljs-number">8000</span> },   <span class="hljs-comment">// Ramp to 8,000 req/s</span>
       { <span class="hljs-attr">duration</span>: <span class="hljs-string">'20s'</span>, <span class="hljs-attr">target</span>: <span class="hljs-number">10000</span> },  <span class="hljs-comment">// Ramp to 10,000 req/s</span>
       { <span class="hljs-attr">duration</span>: <span class="hljs-string">'100s'</span>, <span class="hljs-attr">target</span>: <span class="hljs-number">10000</span> }, <span class="hljs-comment">// Hold at 10,000 req/s</span>
     ],
   },
 },
};
</code></pre>
<p>This configuration ramps up to 10,000 requests per second and holds for 100 seconds, deliberately exceeding the capacity of both configurations to observe behavior under stress.</p>
<h3 id="heading-test-protocol"><strong>Test Protocol</strong></h3>
<ol>
<li><p><strong>NLB Warm-up Phase</strong>: All endpoints received a 60-second warm-up (ramping from 10 to 500 req/s) to ensure AWS Network Load Balancers were properly scaled</p>
</li>
<li><p><strong>Pre-test Warm-up</strong>: Each runtime received a 20-second warm-up before its test</p>
</li>
<li><p><strong>Test Execution</strong>: 180 seconds total (80s ramp + 100s hold at 10k req/s)</p>
</li>
<li><p><strong>Cooldown</strong>: 480 seconds between each test to allow system recovery</p>
</li>
</ol>
<h2 id="heading-results"><strong>Results</strong></h2>
<h3 id="heading-performance-summary"><strong>Performance Summary</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769626701516/5eed5c21-cfb9-40b3-b25d-c0a78c912e15.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-latency-successful-requests-only"><strong>Latency (Successful Requests Only)</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769626718132/e8e181d1-a4cc-48d8-be6d-94eed26e04c3.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-key-observations"><strong>Key Observations</strong></h3>
<p>1. Equivalent Throughput Under Extreme Load</p>
<p>Both Watt and single-process Node.js achieved nearly identical throughput (~5,958 req/s) under the 10,000 req/s target load. This demonstrates that Watt’s multi-worker architecture introduces no overhead compared to running Node.js directly.</p>
<p>2. Better Tail Latency with Watt</p>
<p>While average latencies were equivalent, Watt showed measurably better tail latency:</p>
<ul>
<li><p><strong>p99</strong>: 263ms (Watt) vs 289ms (Node.js) - <strong>9% improvement</strong></p>
</li>
<li><p><strong>p95</strong>: 221ms (Watt) vs 250ms (Node.js) - <strong>12% improvement</strong></p>
</li>
<li><p><strong>p90</strong>: 196ms (Watt) vs 216ms (Node.js) - <strong>9% improvement</strong></p>
</li>
</ul>
<p>This improvement comes from SO_REUSEPORT’s kernel-level load distribution, which prevents request pileup on any single worker.</p>
<p>3. Slightly Higher Success Rate</p>
<p>Watt achieved a 79.3% success rate compared to Node.js’s 78.6% - a small but consistent improvement under stress. Both configurations were pushed well beyond their sustainable capacity (the target was 10k req/s, but actual throughput was ~6k req/s), so the high failure rates are expected.</p>
<p>4. Test Was Deliberately Extreme</p>
<p>The 20%+ failure rate across both configurations indicates we successfully stress-tested beyond capacity. Under normal production loads (staying within throughput limits), both configurations would achieve near-100% success rates, as demonstrated in our Next.js benchmarks at 1,000 req/s.</p>
<h2 id="heading-getting-started-with-tanstack-start-on-watt"><strong>Getting Started with TanStack Start on Watt</strong></h2>
<p>Adding Watt support to your TanStack Start application requires minimal configuration:</p>
<h3 id="heading-1-install-dependencies"><strong>1. Install Dependencies</strong></h3>
<p><code>npm install wattpm @platformatic/tanstack</code></p>
<h3 id="heading-2-create-wattjson"><strong>2. Create watt.json</strong></h3>
<pre><code class="lang-json">{
 <span class="hljs-attr">"$schema"</span>: <span class="hljs-string">"https://schemas.platformatic.dev/@platformatic/tanstack/3.32.0.json"</span>,
 <span class="hljs-attr">"application"</span>: {
   <span class="hljs-attr">"outputDirectory"</span>: <span class="hljs-string">".output"</span>
 },
 <span class="hljs-attr">"runtime"</span>: {
   <span class="hljs-attr">"logger"</span>: {
     <span class="hljs-attr">"level"</span>: <span class="hljs-string">"info"</span>
   },
   <span class="hljs-attr">"server"</span>: {
     <span class="hljs-attr">"hostname"</span>: <span class="hljs-string">"0.0.0.0"</span>,
     <span class="hljs-attr">"port"</span>: <span class="hljs-number">3000</span>
   },
   <span class="hljs-attr">"workers"</span>: {
     <span class="hljs-attr">"static"</span>: <span class="hljs-number">2</span>
   }
 }
}
</code></pre>
<h3 id="heading-3-update-packagejson-scripts"><strong>3. Update package.json Scripts</strong></h3>
<pre><code class="lang-javascript">{
 <span class="hljs-string">"scripts"</span>: {
   <span class="hljs-string">"build"</span>: <span class="hljs-string">"vite build"</span>,
   <span class="hljs-string">"build:watt"</span>: <span class="hljs-string">"NODE_ENV=production wattpm build"</span>,
   <span class="hljs-string">"start:watt"</span>: <span class="hljs-string">"wattpm start"</span>
 }
}
</code></pre>
<h3 id="heading-4-build-and-run"><strong>4. Build and Run</strong></h3>
<p><code>npm run build:watt</code></p>
<p><code>npm run start:watt</code></p>
<p>That’s it. Watt will automatically detect your TanStack Start application and configure the appropriate build and runtime settings.</p>
<h2 id="heading-kubernetes-deployment"><strong>Kubernetes Deployment</strong></h2>
<p>For Kubernetes deployments, the same principles from our Next.js guide apply. Here’s a sample deployment configuration:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
 <span class="hljs-attr">name:</span> <span class="hljs-string">tanstack-watt</span>
<span class="hljs-attr">spec:</span>
 <span class="hljs-attr">replicas:</span> <span class="hljs-number">4</span>
 <span class="hljs-attr">template:</span>
   <span class="hljs-attr">spec:</span>
     <span class="hljs-attr">topologySpreadConstraints:</span>
       <span class="hljs-bullet">-</span> <span class="hljs-attr">maxSkew:</span> <span class="hljs-number">1</span>
         <span class="hljs-attr">topologyKey:</span> <span class="hljs-string">kubernetes.io/hostname</span>
         <span class="hljs-attr">whenUnsatisfiable:</span> <span class="hljs-string">DoNotSchedule</span>
         <span class="hljs-attr">labelSelector:</span>
           <span class="hljs-attr">matchLabels:</span>
             <span class="hljs-attr">app:</span> <span class="hljs-string">tanstack-watt</span>
     <span class="hljs-attr">containers:</span>
       <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">tanstack-watt</span>
         <span class="hljs-attr">image:</span> <span class="hljs-string">your-registry/tanstack-app:latest</span>
         <span class="hljs-attr">env:</span>
           <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">WORKERS</span>
             <span class="hljs-attr">value:</span> <span class="hljs-string">"2"</span>
         <span class="hljs-attr">resources:</span>
           <span class="hljs-attr">requests:</span>
             <span class="hljs-attr">cpu:</span> <span class="hljs-string">'2000m'</span>
             <span class="hljs-attr">memory:</span> <span class="hljs-string">'4Gi'</span>
           <span class="hljs-attr">limits:</span>
             <span class="hljs-attr">cpu:</span> <span class="hljs-string">'2000m'</span>
             <span class="hljs-attr">memory:</span> <span class="hljs-string">'4Gi'</span>
         <span class="hljs-attr">ports:</span>
           <span class="hljs-bullet">-</span> <span class="hljs-attr">containerPort:</span> <span class="hljs-number">3000</span>
</code></pre>
<p>Key points:</p>
<ul>
<li><p>Use topologySpreadConstraints to distribute pods evenly across nodes.</p>
</li>
<li><p>Set WORKERS to match your CPU allocation (2 workers for 2 CPUs)</p>
</li>
<li><p>Watt’s health monitoring will automatically restart unhealthy workers without terminating the pod.</p>
</li>
</ul>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Watt 3.32 brings the same performance benefits to TanStack Start that Next.js users have enjoyed: kernel-level load distribution via SO_REUSEPORT, zero-overhead multi-worker scaling, and external health monitoring to improve throughput and tail latency.</p>
<p>Our benchmarks show that under extreme load (10,000 req/s), Watt matches Node.js throughput while delivering measurably better tail latency (p99 improved by 9%, p95 by 12%). In production deployments constrained by capacity, both approaches achieve near-complete reliability.</p>
<p>If you’re building with TanStack Start and deploying to Kubernetes or any multi-core environment, Watt provides a straightforward path to better resource utilization and improved tail latency with minimal configuration changes.</p>
<p>The complete benchmark code is available at: <a target="_blank" href="https://github.com/platformatic/k8s-watt-performance-demo">https://github.com/platformatic/k8s-watt-performance-demo.</a></p>
<p>To get started with Watt, visit: <a target="_blank" href="https://docs.platformatic.dev">https://docs.platformatic.dev.</a></p>
<p>For questions or enterprise support, reach out to <a target="_blank" href="mailto:info@platformatic.dev">info@platformatic.dev</a> or connect with us on <a target="_blank" href="https://discord.gg/platformatic">Discord</a>.</p>
]]></content:encoded></item><item><title><![CDATA[Debugging Node.js Performance with AI]]></title><description><![CDATA[I’ve been improving the performance of Node.js applications for the last decade. I know for a fact that performance debugging is hard, and I’ve often ended up creating my own tools. This is one of those times.
How often have you captured a CPU profil...]]></description><link>https://blog.platformatic.dev/debugging-nodejs-performance-with-ai</link><guid isPermaLink="true">https://blog.platformatic.dev/debugging-nodejs-performance-with-ai</guid><category><![CDATA[Node.js]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Thu, 22 Jan 2026 15:00:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1769099059277/b9091bc9-9a41-460b-9106-0acb276c6b65.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I’ve been improving the performance of Node.js applications for the last decade. I know for a fact that performance debugging is hard, and I’ve often ended up creating my own tools. This is one of those times.</p>
<p>How often have you captured a CPU profile, stared at a flamegraph, and tried to make sense of thousands of stack frames? What if your AI assistant could help you understand exactly where your application is spending time?</p>
<p>Today, we’re releasing a new feature in <a target="_blank" href="https://github.com/platformatic/flame">@platformatic/flame</a> that generates LLM-friendly markdown analysis alongside your flamegraphs. Now, when you profile your Node.js application, you get three outputs:</p>
<ul>
<li><p>Binary pprof data (.pb) - for tooling compatibility</p>
</li>
<li><p>Interactive HTML flamegraph (.html) - for visual exploration</p>
</li>
<li><p><strong>Markdown analysis</strong> (.md) - for AI-assisted debugging</p>
</li>
</ul>
<p>This means you can drop your profile analysis directly into Cursor, Claude Code, OpenCode, or any AI assistant and get intelligent insights about your application’s performance characteristics.</p>
<h2 id="heading-the-problem-with-traditional-profiling"><strong>The Problem with Traditional Profiling</strong></h2>
<p>Flamegraphs are incredibly powerful visualization tools, but they have limitations:</p>
<ol>
<li><p><strong>They require expertise to interpret</strong> - Understanding which stack frames matter takes experience.</p>
</li>
<li><p><strong>They don’t prioritize hotspots</strong> - You see everything, but the critical bottlenecks aren’t highlighted.</p>
</li>
<li><p><strong>They’re not searchable by AI</strong> - You can’t paste an SVG into ChatGPT and ask “what’s slow?”</p>
</li>
</ol>
<p>We built Flame to make profiling accessible, and this update takes it a step further by making profile data consumable by AI assistants.</p>
<h2 id="heading-how-it-works"><strong>How It Works</strong></h2>
<p>When you run Flame, it now automatically generates a markdown file with structured hotspot analysis:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Profile your application</span>
flame run server.js

<span class="hljs-comment"># When you stop the app (Ctrl-C), you'll see:</span>
<span class="hljs-comment"># 🔥 CPU profile written to: cpu-profile-2025-01-21T12-00-00-000Z.pb</span>
<span class="hljs-comment"># 🔥 CPU flamegraph generated: cpu-profile-2025-01-21T12-00-00-000Z.html</span>
<span class="hljs-comment"># 🔥 CPU markdown generated: cpu-profile-2025-01-21T12-00-00-000Z.md</span>
<span class="hljs-comment"># 🔥 Heap profile written to: heap-profile-2025-01-21T12-00-00-000Z.pb</span>
<span class="hljs-comment"># 🔥 Heap flamegraph generated: heap-profile-2025-01-21T12-00-00-000Z.html</span>
<span class="hljs-comment"># 🔥 Heap markdown generated: heap-profile-2025-01-21T12-00-00-000Z.md</span>
</code></pre>
<p>The markdown output contains a structured analysis of your profile:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># CPU Profile Analysis: cpu-profile-2025-01-21T12-00-00-000Z.pb</span>

<span class="hljs-comment">## Summary</span>
- Total samples: 1,234
- Duration: 10.5s
- Sample rate: 99 Hz

<span class="hljs-comment">## Top Hotspots</span>

| Rank | Function | File | Self Time | Total Time |
|------|----------|------|-----------|------------|
| 1 | processRequest | src/handler.js:45 | 23.5% | 45.2% |
| 2 | parseJSON | node_modules/... | 12.3% | 12.3% |
| 3 | renderTemplate | src/views.js:123 | 8.7% | 15.4% |
...
</code></pre>
<p>This format is perfect for AI consumption. You can paste it directly into your AI assistant and ask questions like:</p>
<ul>
<li><p>“What are the main performance bottlenecks in this profile?”</p>
</li>
<li><p>“How can I optimize the processRequest function?”</p>
</li>
<li><p>“Is there anything unusual about this CPU usage pattern?”</p>
</li>
<li><p>“Optimize all hot spots.”</p>
</li>
</ul>
<h2 id="heading-three-markdown-formats"><strong>Three Markdown Formats</strong></h2>
<p>We’ve included three output formats optimized for different use cases:</p>
<h3 id="heading-summary-default"><strong>Summary (Default)</strong></h3>
<p>The summary format produces a compact hotspots table - ideal for quick AI triage:</p>
<pre><code class="lang-shell">flame run server.js
# or explicitly:
flame run --md-format=summary server.js
</code></pre>
<p>This is perfect for dropping into an AI chat and asking, “What should I focus on?” or even “Improve the performance of my application”.</p>
<h3 id="heading-detailed"><strong>Detailed</strong></h3>
<p>The detailed format includes full stack traces and comprehensive statistics:</p>
<pre><code class="lang-bash">flame run --md-format=detailed server.js
</code></pre>
<p>Use this when you need the AI to understand the complete call hierarchy and suggest architectural improvements.</p>
<h3 id="heading-adaptive"><strong>Adaptive</strong></h3>
<p>The adaptive format automatically chooses based on profile complexity:</p>
<pre><code class="lang-bash">flame run --md-format=adaptive server.js
</code></pre>
<p>Simple profiles get the summary treatment; complex profiles get detailed analysis.</p>
<h2 id="heading-works-with-both-cpu-and-heap-profiles"><strong>Works with Both CPU and Heap Profiles</strong></h2>
<p>Flame captures both CPU and heap profiles concurrently, and markdown analysis is generated for both:</p>
<pre><code class="lang-bash">flame run server.js
<span class="hljs-comment"># Generates:</span>
<span class="hljs-comment"># cpu-profile-*.pb, cpu-profile-*.html, cpu-profile-*.md</span>
<span class="hljs-comment"># heap-profile-*.pb, heap-profile-*.html, heap-profile-*.md</span>
</code></pre>
<p>For heap profiles, the markdown highlights memory allocation hotspots - perfect for asking your AI assistant to help identify memory leaks or excessive allocations.</p>
<h2 id="heading-generate-from-existing-profiles"><strong>Generate from Existing Profiles</strong></h2>
<p>Already have pprof files? Generate markdown analysis from them:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Generate HTML and markdown from existing profile</span>
flame generate cpu-profile.pb

<span class="hljs-comment"># Use detailed format for comprehensive analysis</span>
flame generate --md-format=detailed cpu-profile.pb
</code></pre>
<h2 id="heading-programmatic-api"><strong>Programmatic API</strong></h2>
<p>The new generateMarkdown function is also available in the programmatic API:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> { generateMarkdown } = <span class="hljs-built_in">require</span>(<span class="hljs-string">'@platformatic/flame'</span>)

<span class="hljs-comment">// Generate LLM-friendly markdown analysis</span>
<span class="hljs-keyword">await</span> generateMarkdown(<span class="hljs-string">'profile.pb'</span>, <span class="hljs-string">'analysis.md'</span>, { <span class="hljs-attr">format</span>: <span class="hljs-string">'summary'</span> })
</code></pre>
<h2 id="heading-ai-debugging-workflow"><strong>AI Debugging Workflow</strong></h2>
<p>Here’s the workflow we recommend for AI-assisted performance debugging:</p>
<ol>
<li><strong>Profile your application</strong> during a realistic workload:</li>
</ol>
<p>flame run server.js</p>
<ol>
<li><p><strong>Generate traffic</strong> that exercises the slow code paths.</p>
</li>
<li><p><strong>Stop profiling</strong> (Ctrl-C) to generate all output files.</p>
</li>
<li><p><strong>Open the markdown file</strong> and paste its contents into your AI assistant.</p>
</li>
<li><p><strong>Ask</strong></p>
<ul>
<li><p>“What are the top 3 things I should optimize?”</p>
</li>
<li><p>“Is this JSON parsing overhead normal?”</p>
</li>
<li><p>“How can I reduce the time spent in renderTemplate?”</p>
</li>
<li><p>“Improve the performance of all the hotspots.”</p>
</li>
</ul>
</li>
<li><p><strong>Iterate</strong> based on AI changes and re-profile to verify improvements.</p>
</li>
</ol>
<h2 id="heading-requirements"><strong>Requirements</strong></h2>
<p>This feature requires Node.js 22.6.0 or later. We’ve bumped the minimum version to take advantage of ES module interoperability improvements needed for the pprof-to-md integration.</p>
<p>Update flame to the latest version:</p>
<pre><code class="lang-shell">npm install -g @platformatic/flame@latest
</code></pre>
<h2 id="heading-llm-performance-optimization-evals"><strong>LLM Performance Optimization Evals</strong></h2>
<p>We didn’t just build this feature and hope it works - we ran systematic evaluations to measure how well LLMs can identify and fix performance bottlenecks using pprof-to-md output.</p>
<p>The eval process used <strong>Claude Code with Claude Opus 4.5</strong>: an orchestrating agent ran benchmarks, collected profiles, spawned optimization subagents with the markdown analysis, applied suggested fixes, and measured results.</p>
<h3 id="heading-results-summary"><strong>Results Summary</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769082193317/e83d7fa9-8a85-431e-9503-e7400a5e72e1.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-key-metrics"><strong>Key metrics:</strong></h3>
<ul>
<li><p><strong>Correct fix identified:</strong> 5/5 (100%)</p>
</li>
<li><p><strong>Significant improvement achieved:</strong> 4/5 (80%)</p>
</li>
</ul>
<h3 id="heading-highlights"><strong>Highlights</strong></h3>
<p><strong>json-bottleneck (144x improvement):</strong> The app was parsing a 1MB JSON config file on every request. The profile showed the route handler at 71.1% and readFileSync at 10.7%. Claude immediately identified the issue and moved JSON parsing out of the request handler. Result: 8 req/s → 1,122 req/s.</p>
<p><strong>n-plus-one (12.8x improvement):</strong> Sequential async calls in a loop - the classic N+1 query pattern. Claude recognized this from code analysis (CPU profiles don’t capture async wait time) and parallelized with Promise.all(). Result: 41 req/s → 526 req/s.</p>
<p><strong>quadratic-algo (127x latency improvement):</strong> O(n²) deduplication using nested loops. Claude suggested using Set for O(1) lookups. Latency dropped from 4,686ms to 37ms with zero errors.</p>
<p><strong>memory-churn (84x latency improvement):</strong> Creating 4 intermediate arrays with spread copies. Claude combined all operations into a single loop pass. Latency dropped from 5,239ms to 62ms.</p>
<h3 id="heading-what-we-learned"><strong>What We Learned</strong></h3>
<ol>
<li><p><strong>Claude correctly identified all 5 performance issues</strong> by reviewing the analysis and then applying the necessary patches.</p>
</li>
<li><p><strong>All fixes were idiomatic and correct</strong> - caching parsed config, pre-compiling regex, parallelizing async operations, single-pass array processing, using Set for O(1) lookups.</p>
</li>
<li><p><strong>Latency is often a better success indicator than throughput</strong> for optimization evals.</p>
</li>
<li><p><strong>The markdown format provides enough information</strong> for Claude to understand call paths and identify hotspots in the codebase.</p>
</li>
</ol>
<p>The one failure (regex-hotpath) wasn’t because Claude suggested the wrong fix - it correctly moved the regex pattern outside the loop. The bottleneck was simply masked by I/O operations in that particular workload.</p>
<h2 id="heading-from-profile-to-actionable-fix"><strong>From Profile to Actionable Fix</strong></h2>
<p>The real power of LLM-friendly profiles is turning raw data into specific, prioritized recommendations. In one example, we profiled an application, and the AI identified that URL constructor calls accounted for 14.8% of CPU time, garbage collection overhead accounted for 6.7%, and route matching accounted for another 7.1%. But it didn't stop at identifying hotspots; it provided concrete fixes ranked by impact: replace expensive abstractions with simpler alternatives where possible, pre-compute values in loops instead of recalculating them, initialize resources at startup rather than on-demand, and memoize repeated computations. The estimated result? A 20-25% reduction in CPU time from straightforward changes. This is the workflow we envisioned: profile your app, paste the markdown into your AI assistant, and get back a prioritized list of exactly what to fix and how to fix it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1769082165121/985b726f-713b-45b4-96cf-71475764a5c9.png" alt class="image--center mx-auto" /></p>
<p>Specifically, this is about TanStack Start. We notified Tanner immediately - they are on it!</p>
<h2 id="heading-built-on-pprof-to-md"><strong>Built on pprof-to-md</strong></h2>
<p>The markdown generation is powered by our new <a target="_blank" href="https://github.com/platformatic/pprof-to-md">pprof-to-md</a> library, which we’ve also open-sourced. If you’re building profiling tools and want to add AI-friendly output, check it out.</p>
<h2 id="heading-get-started"><strong>Get Started</strong></h2>
<p>Update to the latest flame and start profiling:</p>
<pre><code class="lang-bash">npm install -g @platformatic/flame@latest

flame run your-app.js
</code></pre>
<p>Then paste your markdown analysis into your favorite AI assistant and start asking questions. Performance debugging just got a whole lot easier.</p>
<hr />
<p>Have questions or feedback? Open an issue on <a target="_blank" href="https://github.com/platformatic/flame">GitHub</a> or contact us on our DM.</p>
]]></content:encoded></item><item><title><![CDATA[Bun Is Fast, Until Latency Matters for Next.js Workloads]]></title><description><![CDATA[As the JavaScript runtime ecosystem expands beyond Node.js, developers now have multiple options for running Next.js in production. These, of course, include more established runtimes like Node.js, newer alternatives such as Bun and Deno, and multi-t...]]></description><link>https://blog.platformatic.dev/bun-is-fast-until-latency-matters-for-nextjs-workloads</link><guid isPermaLink="true">https://blog.platformatic.dev/bun-is-fast-until-latency-matters-for-nextjs-workloads</guid><category><![CDATA[Node.js]]></category><category><![CDATA[Deno]]></category><category><![CDATA[Bun]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Thu, 15 Jan 2026 15:00:45 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768492027461/aac998d6-7a2b-41da-8b87-5afca151087c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As the JavaScript runtime ecosystem expands beyond Node.js, developers now have multiple options for running Next.js in production. These, of course, include more established runtimes like Node.js, newer alternatives such as Bun and Deno, and multi-threaded solutions like <a target="_blank" href="https://github.com/platformatic/platformatic">Platformatic Watt</a>, which is an application server we built on top of Node.js. This report presents benchmark results comparing these four approaches on AWS EKS under identical conditions.</p>
<p>While evaluating these options and the benchmarks that follow, it’s important to keep in mind what matters most for your context and use case, as there are no “one-size fits all” solutions in software: latency, consistency, or ease of adoption.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768300777055/801abeee-92cb-4136-997c-b49749f6a24b.png" alt class="image--center mx-auto" /></p>
<p>ll runtimes completed the benchmarks without any errors. You can find the complete methodology we followed below.</p>
<h2 id="heading-benchmark-methodology"><strong>Benchmark Methodology</strong></h2>
<p>We benchmarked Next.js 15.5 on AWS EKS across four JavaScript runtimes, each allocated six CPU cores, and the results will be of interest to any engineer building or maintaining server-side Javascript applications with any sort of performance sensitivity.</p>
<p>Three test runs were conducted, rotating the test order, at 1,000 requests per second for 120 seconds each, to illustrate the practical demands these runtimes might face under heavy traffic (think a flash sale in eCommerce, etc).</p>
<p><strong>Infrastructure</strong></p>
<p>All benchmarks ran on AWS EKS (Elastic Kubernetes Service) with the following infrastructure:</p>
<ul>
<li><p><strong>EKS Cluster</strong>: 4 nodes running m5.2xlarge instances (8 vCPUs, 32GB RAM each)</p>
</li>
<li><p><strong>Region</strong>: us-west-2</p>
</li>
<li><p><strong>Load Testing Instance</strong>: c7gn.large (2 vCPUs, 4GB RAM, network-optimized)</p>
</li>
<li><p><strong>Load Testing Tool</strong>: Grafana k6</p>
</li>
</ul>
<p>Two critical but often overlooked aspects of effective benchmarking are 1) providing clean and reproducible conditions for each test run, and 2) providing a reliable set-up for others to replicate your experiment. This empowers researchers and developers to verify the results by reproducing them themselves.</p>
<p>To this end, we used shell scripts and the AWS CLI to create on-demand, ephemeral environments for each testing round:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768300791858/ff79a8ee-d2a0-4d50-873c-a97b399b2dee.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-software-versions"><strong>Software Versions</strong></h3>
<p>The benchmarks used the following software versions:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768300801686/ca31e640-fc0f-4fa9-9c6d-4d7803c56ced.png" alt class="image--center mx-auto" /></p>
<p>All software versions were specified in the Dockerfile to ensure reproducible benchmarks.</p>
<h3 id="heading-resource-allocation"><strong>Resource Allocation</strong></h3>
<p>Each runtime received identical total CPU resources (6 cores) with the following distribution:</p>
<p>Node.js, Bun, and Deno, which each operate as single-threaded processes, were distributed across six single-CPU pods. We configured Watt, our multi-threaded application service built on Node.js, to use two workers per pod across three x2 CPU pods.</p>
<p>Considering the AWS infrastructure costs, these six cores on an m5.2xlarge instance roughly translate to approximately $0.096 per hour. By understanding this cost, you can better evaluate how any latency improvements might affect your budget, as different runtimes could potentially lead to savings by requiring fewer instances to handle the same load (measured in requests per second).</p>
<h3 id="heading-load-test-configuration"><strong>Load Test Configuration</strong></h3>
<p>Each runtime was tested with the following k6 configuration:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> http <span class="hljs-keyword">from</span> <span class="hljs-string">'k6/http'</span>;
<span class="hljs-keyword">import</span> { check } <span class="hljs-keyword">from</span> <span class="hljs-string">'k6'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> options = {
 <span class="hljs-attr">scenarios</span>: {
   <span class="hljs-attr">constant_arrival_rate</span>: {
     <span class="hljs-attr">executor</span>: <span class="hljs-string">'constant-arrival-rate'</span>,
     <span class="hljs-attr">duration</span>: <span class="hljs-string">'120s'</span>,
     <span class="hljs-attr">rate</span>: <span class="hljs-number">1000</span>,
     <span class="hljs-attr">timeUnit</span>: <span class="hljs-string">'1s'</span>,
     <span class="hljs-attr">preAllocatedVUs</span>: <span class="hljs-number">1000</span>,
     <span class="hljs-attr">maxVUs</span>: <span class="hljs-number">20000</span>,
   },
 },
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> (<span class="hljs-params"></span>) </span>{
 <span class="hljs-keyword">const</span> res = http.get(__ENV.TARGET, {
   <span class="hljs-attr">timeout</span>: <span class="hljs-string">"5s"</span>,
 });
 check(res, {
   <span class="hljs-string">'status is 200'</span>: <span class="hljs-function">(<span class="hljs-params">r</span>) =&gt;</span> r.status === <span class="hljs-number">200</span>,
   <span class="hljs-string">'response has body'</span>: <span class="hljs-function">(<span class="hljs-params">r</span>) =&gt;</span> r.body &amp;&amp; r.body.length &gt; <span class="hljs-number">0</span>,
 });
}
</code></pre>
<p>This configuration maintained a constant arrival rate of 1,000 requests per second for 120 seconds, resulting in approximately 120,000 requests per test.</p>
<h3 id="heading-test-protocol"><strong>Test Protocol</strong></h3>
<p>Given that our benchmark harness runs on live cloud services, there is some inherent variability to the data we collected: to ensure a fair comparison and boost confidence in our results, we run them multiple times by rotating the order of service being tested, and we took the extra effort to ‘warm up’ each environment as a part of our test runs.</p>
<p>To start, the Network Load Balancer (NLB) went through a warm-up phase in which all four endpoints received a 60-second warm-up, starting at 10 and reaching up to 500 requests per second, ensuring that AWS Network Load Balancers were properly scaled. Each runtime also received a 20-second pre-test warm-up to stabilize the environment before its respective test.</p>
<p>Test execution spanned 120 seconds at a constant arrival rate of 1,000 requests per second, providing robust data for analysis. A cooldown period of 480 seconds was implemented between each test to allow the system to return to baseline conditions, further ensuring that subsequent tests commenced without residual impact from prior runs.</p>
<p>Finally, the tests were executed in three complete runs with different execution orders to detect positional bias and ensure that each run's performance was accurately assessed as part of our scientific rigor.</p>
<h3 id="heading-test-orders"><strong>Test Orders</strong></h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768300860787/2f67541c-b7a1-408b-9b88-5f71b3a04c88.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-runtime-configurations"><strong>Runtime Configurations</strong></h3>
<p><strong>Node.js</strong>: Standard Next.js standalone server</p>
<pre><code class="lang-shell">next start
</code></pre>
<p><strong>Bun</strong>: Next.js with Bun runtime (requires --bun flag to override shebang)</p>
<pre><code class="lang-shell">bun run --bun next start
</code></pre>
<p>Without the --bun flag, Bun respects the shebang (#!/usr/bin/env node) in the Next.js binary and executes it with Node.js instead. The --bun flag overrides this behavior to use the Bun runtime.</p>
<p><strong>Deno</strong>: Next.js via npm compatibility layer</p>
<pre><code class="lang-shell">deno run -A npm:next start
</code></pre>
<p>Deno runs Next.js via its npm compatibility layer (npm:next), which allows running npm packages in the Deno runtime.</p>
<p><strong>Watt</strong>: Platformatic Watt with 2 workers per pod</p>
<pre><code class="lang-shell">wattpm start  # with WORKERS=2
</code></pre>
<p>Watt uses SO_REUSEPORT to distribute connections across multiple Node.js worker threads at the kernel level, eliminating the IPC overhead present in traditional cluster-based approaches. Each worker operates with its own event loop while sharing the same listening socket.</p>
<h2 id="heading-results"><strong>Results</strong></h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1768300873034/791731b4-a5d5-41b0-9459-d2b8f2eee911.png" alt class="image--center mx-auto" /></p>
<p><strong>Success Rate</strong></p>
<p>All runtimes achieved a 100% success rate, with zero failed requests across all three runs. Each test processed approximately 120,000 requests at the target rate of 1,000 requests per second.</p>
<h2 id="heading-observations"><strong>Observations</strong></h2>
<h3 id="heading-latency-distribution"><strong>Latency Distribution</strong></h3>
<p>The runtimes fell into distinct performance tiers based on average latency:</p>
<ul>
<li><p><strong>Tier 1 (~11-14ms)</strong>: Deno and Watt</p>
</li>
<li><p><strong>Tier 2 (~20ms)</strong>: Node.js</p>
</li>
<li><p><strong>Tier 3 (~246ms)</strong>: Bun</p>
</li>
</ul>
<h3 id="heading-consistency-across-runs"><strong>Consistency Across Runs</strong></h3>
<p>Deno demonstrated the most consistent performance across different test positions, with a standard deviation of ±1.19ms, indicating minimal predictability risk. Watt exhibited similar consistency at ±1.03ms, offering low operational risk and high reliability. Node.js displayed moderate variance at ±2.42ms, posing a moderate predictability risk that decision-makers should consider when evaluating stability. Although Bun’s absolute variance was higher at ±4.72ms, this represented consistent behavior relative to its average latency, which could translate into higher predictability risk. Understanding these performance metrics in terms of predictability risk can help managers better assess the stability and reliability of deploying specific runtimes.</p>
<h3 id="heading-test-order-impact"><strong>Test Order Impact</strong></h3>
<p>Rotating the test order across three runs helped identify whether the position affected the results. Of the frameworks we tested, all of them performed consistently regardless of where they fell in the testing order, with the notable exception of Node.js itself, which performed best when tested last (see "Run 3", above).</p>
<h3 id="heading-tail-latency-p99"><strong>Tail Latency (p99)</strong></h3>
<p>The p99 latency provides insight into the worst-case user experience:</p>
<ul>
<li><p>Deno: 101.27ms average p99</p>
</li>
<li><p>Watt: 114.78ms average p99</p>
</li>
<li><p>Node.js: 173.84ms average p99</p>
</li>
<li><p>Bun: 974ms average p99</p>
</li>
</ul>
<h3 id="heading-throughput"><strong>Throughput</strong></h3>
<p>All runtimes successfully handled the target load of 1,000 requests per second with negligible dropped requests. The slight variations in reported requests per second, ranging from 997.94 to 999.96, are within normal measurement variance.</p>
<p>As we reflect on these results, it prompts us to consider future directions for our experiments. For example, which memory-intensive workloads might flip these rankings?</p>
<p>Part of our aim in our open source practice is not just to build products, but to build community, and we’d like to hear from you all: what frameworks and scenarios are most relevant to your work today that you think we should investigate next?</p>
<h2 id="heading-reproducing-these-benchmarks">Reproducing these benchmarks</h2>
<p>The complete benchmark infrastructure is available at: <a target="_blank" href="https://github.com/platformatic/runtimes-benchmarks">https://github.com/platformatic/runtimes-benchmarks</a>.</p>
<p>To run the benchmarks:</p>
<pre><code class="lang-bash">AWS_PROFILE=&lt;profile-name&gt; ./benchmark.sh
</code></pre>
<p>The script creates an ephemeral EKS cluster, deploys all four runtime configurations, executes the load tests, and automatically tears down the infrastructure. Easy as that!</p>
<p>Let us know how this works for you (and perhaps more importantly, if anything doesn’t work for you or if you see results that surprise you…).</p>
<h2 id="heading-conclusions"><strong>Conclusions</strong></h2>
<p>The benchmarks showed three distinct performance tiers: Deno and Watt had the lowest average latencies, at approximately 11 to 14 milliseconds; Node.js averaged 20 milliseconds; and Bun exhibited significantly higher latency at approximately 246 milliseconds. (I’m sure Bun’s showing here will surprise many - it surprised us as well.)</p>
<p>All configurations successfully handled the target throughput of 1,000 requests per second, achieving a 100% success rate. These results reflect performance characteristics under the specified test conditions and may vary depending on application workload, infrastructure configuration, and runtime versions. Teams prioritizing sub-15ms latency may shortlist Deno and Watt, with Watt being the natural choice for those who want to stay within the Node.js ecosystem.</p>
<h2 id="heading-what-next"><strong>What Next?</strong></h2>
<p>As we reflect on these results, we’re considering what future direction we’d like to take with our next round of experiments.</p>
<p>Part of our aim in our open source practice is not just to build products, but to build community, and we’d like to hear from you all: what frameworks and scenarios are most relevant to your work today that you think we should investigate next?</p>
<p>Don’t be shy - do drop us a comment here or on LinkedIn (DMs always open!) about what you’d like to see.</p>
]]></content:encoded></item><item><title><![CDATA[10,000 requests, 2 approaches to multi-threading, 1 React-Router]]></title><description><![CDATA[When evaluating the feedback we received after publishing our Next.js x Kubernetes benchmarks (showed 93% faster latency with Watt), a natural question emerged:
Could this approach apply to other Node.js frameworks as well? If so, does a general patt...]]></description><link>https://blog.platformatic.dev/10000-requests-2-approaches-to-multi-threading-1-react-router</link><guid isPermaLink="true">https://blog.platformatic.dev/10000-requests-2-approaches-to-multi-threading-1-react-router</guid><category><![CDATA[Node.js]]></category><category><![CDATA[React]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Wed, 07 Jan 2026 15:00:15 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767808169628/a6fcf9ae-f08b-48b9-9b14-fb8d50967f36.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When evaluating the feedback we received after publishing our <a target="_blank" href="https://blog.platformatic.dev/93-faster-nextjs-in-your-kubernetes">Next.js x Kubernetes benchmarks</a> (showed 93% faster latency with Watt), a natural question emerged:</p>
<p>Could this approach apply to other Node.js frameworks as well? If so, does a general pattern begin to emerge that performance-sensitive <a target="_blank" href="https://nodejs.org">Node.js</a> applications benefit from being run with <a target="_blank" href="https://github.com/platformatic/platformatic/">Watt</a>?</p>
<p>While the jury is still out on if Watt can help <em>every</em> performance-sensitive <a target="_blank" href="http://Node.js">Node.js</a> app out there, we’ve done another round of benchmarks, this time with <a target="_blank" href="https://reactrouter.com/">React Router</a>, to see if our previous results with Next could be replicated with other frameworks. The answer?</p>
<p><strong>Why yes, of course. I wouldn’t be writing an article about it if I didn’t have some compelling numbers to share 🙂</strong></p>
<h2 id="heading-methodology"><strong>Methodology</strong></h2>
<p>We ran React Router (framework mode), the go-to library for handling server and client-side navigations in React applications, through an extreme load test: 10,000 HTTP requests per second for 120 seconds. The results confirm that Watt delivers notable performance improvements across the Node.js ecosystem.</p>
<h3 id="heading-why-10x-the-load">Why 10x the Load?</h3>
<p>Our Next.js benchmarks used 1,000 requests per second. For React Router, we cranked it up to 10,000.</p>
<p>Why? Because React Router is significantly more efficient than Next.js for server-side rendering (“SSR”) workloads. SSR with Next.js involves heavier processing: full React server components, complex routing logic, and more extensive middleware chains. React Router’s server rendering is leaner and faster. At 1,000 req/s, all three configurations (PM2, Watt, Single Node) handled React Router without breaking a sweat, and we couldn’t see meaningful differences.</p>
<p>To properly stress-test the systems and expose the architectural differences, we needed to crank up the dial by an order of magnitude. This extreme load is where the cracks appear and Watt’s advantages become stark.</p>
<h3 id="heading-the-benchmark-setup">The Benchmark Setup</h3>
<p>We tested three deployment strategies on AWS EKS, all using identical total CPU resources (6 CPUs):</p>
<ol>
<li><p><strong>Single-CPU pods</strong> (6 replicas x 1000m CPU limit each)</p>
</li>
<li><p><strong>PM2 multi-worker pods</strong> (3 replicas x 2000m CPU limit with 2 PM2 workers each)</p>
</li>
<li><p><strong>Watt multi-worker pods</strong> (3 replicas x 2000m CPU limit with 2 Watt workers each)</p>
</li>
</ol>
<p><strong>Infrastructure:</strong></p>
<ul>
<li><p><strong>EKS Cluster:</strong> 3 nodes running m5.2xlarge instances (8 vCPUs, 32GB RAM each)</p>
</li>
<li><p><strong>Load Testing:</strong> c7gn.2xlarge instance (8 vCPUs, 16GB RAM, network-optimized)</p>
</li>
<li><p><strong>Load Pattern:</strong> k6 with a constant arrival rate of 10,000 requests/second for 120 seconds</p>
</li>
<li><p><strong>Virtual Users:</strong> Up to 20,000 VUs</p>
</li>
<li><p><strong>Request Timeout:</strong> 5 seconds</p>
</li>
</ul>
<p>This is an extreme stress test - 10x the load we used in our Next.js benchmarks. All three configurations hit the 20,000 VU ceiling, confirming we pushed each system to its absolute limits.</p>
<h3 id="heading-results-summary">Results Summary</h3>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767723473207/0c88a8cb-c4a0-4b8d-8e0f-6f72f8dfbdc5.png" alt /></p>
<p>Note that the average latency is lower in the single-node deployment due to lower number of errors that Watt has. The server are at saturation point, therefore any additional requests passing through would raise the latency.</p>
<h2 id="heading-key-findings"><strong>Key Findings</strong></h2>
<h3 id="heading-watt-vs-pm2-a-dramatic-difference"><strong>Watt vs PM2: A Dramatic Difference</strong></h3>
<p>Under extreme load, Watt consistently outshines PM2:</p>
<ul>
<li><p><strong>45% higher throughput</strong> (6,032 vs 4,154 req/s)</p>
</li>
<li><p><strong>45% lower failure rate</strong> (37.9% vs 69.2%)</p>
</li>
<li><p><strong>2.9x more successful responses</strong> (467K vs 160K)</p>
</li>
<li><p><strong>21% lower average latency</strong> (866ms vs 1.1s)</p>
</li>
</ul>
<p>Moreover, Watt showed better tail latency performance. The P95 latency for Watt was significantly lower than that for PM2, reinforcing Watt's reliability under heavy traffic conditions. This emphasis on both average and tail latencies provides a more detailed view of Watts' advantage in real-world conditions.</p>
<p>Throughout our testing, PM2 struggled with the load we were putting it under, dropping over 680,000 iterations and failing on nearly 70% of requests. Watt, using the same CPU resources, maintained only a 37.9% failure rate while processing nearly 3x more successful requests.</p>
<h3 id="heading-the-unsurprising-single-node-performance"><strong>The (Un)surprising Single Node Performance</strong></h3>
<p>The results from this benchmark support our study from the Next.js benchmarks: PM2’s cluster module architecture (which creates child processes and routes all incoming connections through a single master process via inter-process communication, or IPC) brings substantial overhead (30%) that becomes overwhelming under heavy load.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767723514963/01591d96-195f-4d93-a109-8629aea15a4c.png" alt /></p>
<p>This observation challenges conventional assumptions about classic Node.js deployments, which often treat multiple processes as necessary to effectively handle high volumes of tasks. It encourages us to reexamine our deployment strategies in similar high-load situations, potentially simplifying architectures that may be plagued by process-management overhead.</p>
<h3 id="heading-watt-vs-single-node"><strong>Watt vs Single Node</strong></h3>
<p>Watt also edges out a Single Node running React Router, most notably when it comes to resilience:</p>
<ul>
<li><p><strong>3% higher throughput</strong> (6,032 vs 5,838 req/s)</p>
</li>
<li><p><strong>10% lower failure rate</strong> (37.9% vs 42.1%)</p>
</li>
<li><p><strong>11% more successful responses</strong> (467K vs 420K)</p>
</li>
</ul>
<p>While the margin is smaller than what we’ve seen with Next, we will take a cool 10% reduction in failure rate back to the lab for the next sprint.</p>
<p><strong>Why These Results Matter</strong></p>
<p>It’s worth taking a moment to ground these numbers in what matters to your business (managers, shareholders, etc.) and users (presumably, customers). For example, most developers writing in Node.js, spend a lot of time thinking about how quickly they can load a given page for a user (“latency”, measured in milliseconds), how many users requests they might be able to serve simultaneously (requests per second), and how often those requests fail (failure rate).</p>
<p>Latency, requests per second, and failure rate all have very real business consequences, from user churn to abandoned carts, that have a very real and material cost on the bottom line for your business, especially for teams working at any sort of scale.</p>
<h3 id="heading-why-watt-thrives-at-scale"><strong>Why Watt Thrives at Scale</strong></h3>
<p>Watt’s provides the benefits of utilizing multiple workers across multiple cores while avoiding node:cluster and PM2’s management overhead by using SO_REUSEPORT to let the Linux kernel distribute connections directly to workers:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1767723524262/f206d563-2e99-4a31-9e3d-8a7481ed70ea.png" alt /></p>
<p>With Watt, there’s no need for master-worker coordination, IPC overhead, or serialization. Instead, worker processes accept connections directly from the OS, using the Linux kernel’s fast, hash-based algorithm to distribute connections evenly across workers.</p>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>This latest round of benchmarks with React Router and PM2 confirms what we saw with our previous benchmarks and Next.js: Watts’ SO_REUSEPORT-based architecture delivers significant performance gains over classic Node.js scaling approaches; particularly when compared to PM2 in this instance.</p>
<p>Under extreme load of 10,000 requests per second:</p>
<ul>
<li><p><strong>Watt delivered 45% higher throughput than PM2</strong> (6,032 vs 4,154 req/s)</p>
</li>
<li><p><strong>Watt processed 2.9x more successful responses than PM2</strong> (467K vs 160K)</p>
</li>
<li><p><strong>Watt delivered 3% higher throughput than Single Node</strong> (6,032 vs 5,838 req/s)</p>
</li>
<li><p><strong>Watt achieved a 10% lower failure rate than Single Node</strong> (37.9% vs 42.1%)</p>
</li>
</ul>
<p>Single Node outperformed PM2, confirming our assessment of cluster module overhead. Watt surpassed both by efficiently using multiple CPU cores without the management overhead of PM2, offering multi-core parallelism, operational benefits, and effective load distribution. Additionally, Watt's architecture complements existing service-mesh and autoscaling patterns, rendering it a strategic fit for modern distributed systems. By integrating smoothly, it increases performance while in accordance with the architects’ mental models and simplifying deployment strategies.</p>
<p>Crucially, Watt’s main thread continuously monitors all worker threads, detecting and recovering from catastrophic event loop situations - blocked loops, memory leaks, or unresponsive workers. When a worker becomes unhealthy, Watt gracefully restarts it without affecting other workers or requiring pod termination. With a Single Node, a blocked event loop means your entire pod is down until Kubernetes notices and restarts it. With Watt, the main thread catches the problem early, restarts just that worker, and your service stays available.</p>
<p>The takeaway: at scale, PM2 and single-process deployments underperform. Watt delivers multi-core power without traditional tradeoffs for Node.js apps.</p>
<h2 id="heading-getting-started-with-watt"><strong>Getting Started with Watt</strong></h2>
<p>Watt is open-source and easy to implement, and you don’t need to be serving 10,000 requests per second to benefit from adopting it, either. In fact, much of our time is spent on good, old-fashioned developer experience. One team cut its P95 latency by 30% in just one afternoon by simply integrating Watt into its existing system during an onsite architecture workshop with us. (If you have a particularly business-critical or thorny app you think Watt could, we love doing these - drop me a message on LinkedIn!)</p>
<p>When configuring Watt with your project, we recommend setting the number of workers to match your CPU allocation to start with. From there, you simply deploy using the same Kubernetes setup you're already using and you’re off to the races.</p>
<p>For a complete guide, see our documentation: <a target="_blank" href="https://docs.platformatic.dev/docs/guides/deployment/nextjs-in-k8s">https://docs.platformatic.dev/docs/guides/deployment/nextjs-in-k8s</a></p>
<p>The benchmark code is available at: <a target="_blank" href="https://github.com/platformatic/k8s-watt-performance-demo">https://github.com/platformatic/k8s-watt-performance-demo.</a></p>
<p>If you have questions or want help getting Watt set-up in your environment, contact us at <a target="_blank" href="mailto:info@platformatic.dev">info@platformatic.dev</a> or connect with <a target="_blank" href="https://www.linkedin.com/in/lucamaraschi/">Luca</a> or <a target="_blank" href="https://www.linkedin.com/in/matteocollina/">Matteo</a> on LinkedIn. We always love hearing from the community!</p>
]]></content:encoded></item><item><title><![CDATA[watt-admin 1.0.0: Capture, Profile, and Share Your Node.js Performance Data]]></title><description><![CDATA[We're excited to announce a powerful new feature in Watt Admin: Recording Mode with CPU and Heap Profiling. Now you can capture your application's performance data, generate flame graphs, and share everything in a single, self-contained HTML file—no ...]]></description><link>https://blog.platformatic.dev/watt-admin-100-capture-profile-and-share-your-nodejs-performance-data</link><guid isPermaLink="true">https://blog.platformatic.dev/watt-admin-100-capture-profile-and-share-your-nodejs-performance-data</guid><category><![CDATA[Node.js]]></category><category><![CDATA[performance]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Tue, 16 Dec 2025 14:29:28 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767810499431/508f6b49-18e6-4ed3-b522-25f3c8fa2fa5.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We're excited to announce a powerful new feature in Watt Admin: <strong>Recording Mode with CPU and Heap Profiling</strong>. Now you can capture your application's performance data, generate flame graphs, and share everything in a single, self-contained HTML file—no internet connection required.</p>
<h2 id="heading-the-challenge-sharing-performance-insights">The Challenge: Sharing Performance Insights</h2>
<p>As Node.js developers, we've all been there: you've discovered a performance issue, captured some data, and now need to share it with your team. But how do you effectively communicate what you're seeing? Screenshots don't tell the whole story, and setting up monitoring dashboards for everyone isn't always practical.</p>
<p>Previously, Watt Admin provided real-time monitoring—perfect for live debugging. But what about post-mortem analysis? What if you need to capture a performance profile during a specific scenario, or share detailed metrics with a colleague who isn't running your application?</p>
<p>That's where Recording Mode comes in.</p>
<h2 id="heading-whats-new-record-profile-analyze">What's New: Record, Profile, Analyze</h2>
<p>With the new recording capabilities, Watt Admin can now:</p>
<ul>
<li><p><strong>📹 Record complete sessions</strong>: Capture all metrics and performance data over time</p>
</li>
<li><p><strong>🔥 Generate flame graphs</strong>: Profile CPU usage or heap allocation to identify bottlenecks</p>
</li>
<li><p><strong>📦 Create offline bundles</strong>: Package everything into a single HTML file</p>
</li>
<li><p><strong>🤝 Share effortlessly</strong>: Send the bundle to anyone—no setup required</p>
</li>
</ul>
<p><img src="https://github.com/platformatic/watt-admin/raw/fece5afd9de223d3ad3158b6ed92ac8e4ffa1d39/screenshot-metrics-dashboard.png" alt="Watt Admin dashboard with metrics" /></p>
<h2 id="heading-how-it-works">How It Works</h2>
<h3 id="heading-metrics-recording">Metrics Recording</h3>
<p>At the core of every recording session is comprehensive metrics collection. Watt Admin captures a complete picture of your application's health and performance:</p>
<p><strong>Memory Metrics</strong></p>
<ul>
<li><p><strong>RSS (Resident Set Size)</strong>: Total process memory usage</p>
</li>
<li><p><strong>Heap Usage</strong>: Total heap, used heap, new space, and old space—essential for tracking memory leaks and garbage collection behavior</p>
</li>
</ul>
<p><strong>CPU &amp; Event Loop</strong></p>
<ul>
<li><p><strong>CPU Usage</strong>: Per-thread CPU utilization percentage</p>
</li>
<li><p><strong>Event Loop Utilization (ELU)</strong>: How busy your event loop is <em>the</em> key indicator of Node.js application health</p>
</li>
</ul>
<p><strong>HTTP Performance</strong></p>
<ul>
<li><p><strong>Request Count &amp; RPS</strong>: Total requests and throughput over time</p>
</li>
<li><p><strong>Latency Percentiles</strong>: P90, P95, and P99 response times to understand your tail latency</p>
</li>
</ul>
<p><strong>HTTP Client (Undici)</strong></p>
<ul>
<li><p><strong>Connection Pool Stats</strong>: Idle, open, pending, queued, and active sockets</p>
</li>
<li><p>Track how your application communicates with external services</p>
</li>
</ul>
<p><strong>Additional Metrics</strong></p>
<ul>
<li><p><strong>WebSocket Connections</strong>: Active WebSocket connection count</p>
</li>
<li><p><strong>Kafka Metrics</strong>: Produced/consumed messages, producers, consumers, and DLQ stats (if using Kafka)</p>
</li>
<li><p><strong>Event Loop Resources</strong>: Active handles and requests in the event loop</p>
</li>
</ul>
<p>All metrics are sampled every second and stored for the duration of your recording session (up to 600 data points). When you stop recording, this entire metrics history is bundled into the HTML file, giving you a complete timeline to analyze.</p>
<p><img src="https://github.com/platformatic/watt-admin/raw/fece5afd9de223d3ad3158b6ed92ac8e4ffa1d39/screenshot-services-metrics.png" alt="Metrics dashboard showing memory, CPU, and other charts" /></p>
<h3 id="heading-cpu-profiling">CPU Profiling</h3>
<p>Identify performance bottlenecks by visualizing where your application spends CPU time:</p>
<pre><code class="lang-bash">watt-admin --record --profile cpu
</code></pre>
<p>Run your application through the scenario you want to analyze, then press <code>Ctrl+C</code>. Watt Admin will:</p>
<ol>
<li><p>Stop profiling all services in your runtime</p>
</li>
<li><p>Collect CPU flame graph data</p>
</li>
<li><p>Bundle all metrics and the flame graph into a single HTML file</p>
</li>
<li><p>Automatically open it in your browser</p>
</li>
</ol>
<p>The resulting flame graph shows you exactly which functions are consuming CPU cycles, making it easy to spot optimization opportunities.</p>
<p><img src="https://github.com/platformatic/watt-admin/raw/fece5afd9de223d3ad3158b6ed92ac8e4ffa1d39/screenshot-cpu-flamegraph.png" alt="CPU flame graph showing function call hierarchy" /></p>
<h3 id="heading-heap-profiling">Heap Profiling</h3>
<p>Track down memory leaks and understand allocation patterns:</p>
<pre><code class="lang-bash">watt-admin --record --profile heap
</code></pre>
<p>Heap profiling reveals:</p>
<ul>
<li><p>Which parts of your code allocate the most memory</p>
</li>
<li><p>Memory allocation patterns over time</p>
</li>
<li><p>Potential memory leak sources</p>
</li>
<li><p>Object retention paths</p>
</li>
</ul>
<p>Perfect for debugging those mysterious memory issues that only appear under specific conditions.</p>
<p><img src="https://github.com/platformatic/watt-admin/raw/fece5afd9de223d3ad3158b6ed92ac8e4ffa1d39/screenshot-heap-flamegraph.png" alt="Heap allocation flame graph" /></p>
<h2 id="heading-real-world-use-cases">Real-World Use Cases</h2>
<h3 id="heading-debugging-production-issues-locally">Debugging Production Issues Locally</h3>
<p>Reproduce a production issue in your local environment, record a session with CPU profiling, and share the complete analysis with your team. No need for everyone to set up the same environment—they can explore the flame graph and metrics directly from the HTML bundle.</p>
<h3 id="heading-performance-reviews">Performance Reviews</h3>
<p>Before merging a significant change, record a profiling session to demonstrate its performance characteristics. Attach the HTML file to your pull request so reviewers can see the real-world impact of your optimizations.</p>
<h3 id="heading-team-knowledge-sharing">Team Knowledge Sharing</h3>
<p>Found an interesting performance pattern? Record it and share the bundle in Slack or your team chat. Your colleagues can explore the interactive flame graph and metrics without any setup.</p>
<h3 id="heading-client-reporting">Client Reporting</h3>
<p>Need to show a client why their application is slow? Generate a recording with clear flame graphs that visually demonstrate the bottlenecks. The self-contained HTML makes it easy to share professional performance analysis.</p>
<h2 id="heading-the-technical-details">The Technical Details</h2>
<p>Recording mode leverages Platformatic's built-in profiling capabilities:</p>
<ul>
<li><p><strong>Automatic service discovery</strong>: Profiles all applications in your Platformatic runtime</p>
</li>
<li><p><strong>Standards-based profiling</strong>: Uses V8's built-in CPU and heap profilers</p>
</li>
<li><p><strong>pprof format</strong>: Stores profiling data in the industry-standard pprof format</p>
</li>
<li><p><strong>Interactive visualization</strong>: Uses react-pprof for exploring flame graphs</p>
</li>
<li><p><strong>Complete capture</strong>: Embeds metrics, logs, and profiling data in <code>window.LOADED_JSON</code></p>
</li>
</ul>
<p>The generated HTML bundle is truly self-contained—it includes:</p>
<ul>
<li><p>All JavaScript, CSS, and assets inlined</p>
</li>
<li><p>Complete metrics history from the recording session</p>
</li>
<li><p>Flame graph data for all profiled services</p>
</li>
<li><p>Interactive UI for exploring the data</p>
</li>
</ul>
<p>No external dependencies. No network requests. Just open and explore.</p>
<p><img src="https://github.com/platformatic/watt-admin/raw/fece5afd9de223d3ad3158b6ed92ac8e4ffa1d39/screenshot-metrics-dashboard.png" alt="Self-contained HTML bundle with metrics and flame graph" /></p>
<h2 id="heading-getting-started">Getting Started</h2>
<h3 id="heading-prerequisites">Prerequisites</h3>
<p><strong>Important</strong>: Watt Admin is designed specifically for applications running on <a target="_blank" href="https://docs.platformatic.dev">Platformatic Watt</a>. If you don't have a Watt application yet, follow the <a target="_blank" href="https://docs.platformatic.dev/getting-started/quick-start">Quick Start Guide</a> to create one.</p>
<h3 id="heading-installation">Installation</h3>
<p>Install Watt Admin globally:</p>
<pre><code class="lang-bash">npm i @platformatic/watt-admin -g
</code></pre>
<p>Or use it directly with npx:</p>
<pre><code class="lang-bash">npx wattpm admin --record --profile cpu
</code></pre>
<h3 id="heading-basic-workflow">Basic Workflow</h3>
<ol>
<li><p><strong>Start a recording session</strong>:</p>
<pre><code class="lang-bash"> watt-admin --record --profile cpu
</code></pre>
</li>
<li><p><strong>Run your application through the scenario</strong> you want to analyze</p>
</li>
<li><p><strong>Stop recording</strong> by pressing <code>Ctrl+C</code></p>
</li>
<li><p><strong>Analyze</strong> the automatically-opened HTML bundle</p>
</li>
<li><p><strong>Share</strong> the bundle file with your team</p>
</li>
</ol>
<h2 id="heading-under-the-hood-how-recording-works">Under the Hood: How Recording Works</h2>
<p>When you start Watt Admin with <code>--record</code>, here's what happens:</p>
<ol>
<li><p><strong>Discovery</strong>: Watt Admin discovers your Platformatic runtime using the RuntimeApiClient</p>
</li>
<li><p><strong>Connection</strong>: Connects to the runtime's admin API</p>
</li>
<li><p><strong>Profiling start</strong>: Calls <code>startApplicationProfiling()</code> on each application</p>
</li>
<li><p><strong>Metrics collection</strong>: Continuously collects metrics at regular intervals</p>
</li>
<li><p><strong>User interaction</strong>: You use your application while profiling runs</p>
</li>
<li><p><strong>SIGINT handling</strong>: When you press Ctrl+C, graceful shutdown begins</p>
</li>
<li><p><strong>Profiling stop</strong>: Calls <code>stopApplicationProfiling()</code> to get the profiling data</p>
</li>
<li><p><strong>Data bundling</strong>: Writes profiling data (.pb files) and embeds everything in HTML</p>
</li>
<li><p><strong>Auto-launch</strong>: Opens the generated bundle in your default browser</p>
</li>
</ol>
<p>The resulting file lives in <code>web/frontend/dist/index.html</code> and contains everything needed for offline analysis.</p>
<h2 id="heading-whats-next">What's Next</h2>
<p>Recording mode is just the beginning. We're exploring additional profiling capabilities:</p>
<ul>
<li><p><strong>Custom time ranges</strong>: Record specific time windows</p>
</li>
<li><p><strong>Comparison mode</strong>: Compare multiple recordings side-by-side</p>
</li>
<li><p><strong>Export formats</strong>: Additional export options for different tools</p>
</li>
<li><p><strong>Annotations</strong>: Mark specific events during recording</p>
</li>
</ul>
<p>We'd love to hear your feedback! Try recording mode and let us know what you think.</p>
<h2 id="heading-try-it-today">Try It Today</h2>
<p>Recording and profiling are available now in Watt Admin. Update to the latest version:</p>
<pre><code class="lang-bash">npm i @platformatic/watt-admin -g
</code></pre>
<p>Then start profiling:</p>
<pre><code class="lang-bash">watt-admin --record --profile cpu
</code></pre>
<p>Happy profiling! 🔥</p>
<hr />
<h3 id="heading-resources">Resources</h3>
<ul>
<li><p><a target="_blank" href="https://github.com/platformatic/watt-admin">Watt Admin on GitHub</a></p>
</li>
<li><p><a target="_blank" href="https://docs.platformatic.dev">Platformatic Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://blog.platformatic.dev/introducing-watt-admin">Watt Admin Introduction</a></p>
</li>
</ul>
<h3 id="heading-get-involved">Get Involved</h3>
<p>Have questions or feedback? Connect with us:</p>
<ul>
<li><p><a target="_blank" href="https://github.com/platformatic/watt-admin/issues">GitHub Issues</a></p>
</li>
<li><p><a target="_blank" href="https://discord.gg/platformatic">Discord Community</a></p>
</li>
<li><p><a target="_blank" href="https://twitter.com/platformatic">Twitter</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Streaming and WebSocket Support Now Available in @platformatic/python-node]]></title><description><![CDATA[@platformatic/python-node v2.0.0 introduces full support for HTTP response streaming and bidirectional WebSocket communication. This release enables full-stack teams to build real-time, high-performance applications by bridging Python's async ecosyst...]]></description><link>https://blog.platformatic.dev/streaming-and-websocket-support-now-available-in-platformaticpython-node</link><guid isPermaLink="true">https://blog.platformatic.dev/streaming-and-websocket-support-now-available-in-platformaticpython-node</guid><category><![CDATA[Node.js]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Stephen Belanger]]></dc:creator><pubDate>Tue, 09 Dec 2025 15:00:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767808263781/0658155c-6f56-40b8-a88d-00f86402b8ad.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>@platformatic/python-node v2.0.0</strong> introduces full support for HTTP response streaming and bidirectional WebSocket communication. This release enables full-stack teams to build real-time, high-performance applications by bridging Python's async ecosystem with Node.js.</p>
<p>For teams running Python ASGI applications alongside Node.js services, this release enables new application types, including real-time dashboards, live data feeds, WebSocket-powered chat systems, progressive file uploads, and server-sent events—all while maintaining the Python-Node.js integration you've come to expect.</p>
<p>If you're new to @platformatic/python-node, it's a module that lets you run Python ASGI applications (such as FastAPI, Starlette, or Django) directly in Node.js processes. No separate Python server, no HTTP proxy overhead, no complex deployment setup.</p>
<h2 id="heading-whats-new-in-v200">What's New in v2.0.0</h2>
<p>This release brings four major enhancements that align @platformatic/python-node with modern ASGI server capabilities:</p>
<h3 id="heading-http-response-streaming">HTTP Response Streaming</h3>
<p>The new <code>handleStream()</code> method enables streaming of HTTP responses. Instead of buffering the entire response body before returning it to Node.js, chunks are processed incrementally as they arrive from your Python application. This reduces memory usage for large responses and provides immediate access to response headers before the body completes.</p>
<p>Each chunk will be pulled from Python when Node.js is ready to operate on it. Python will wait until a chunk is requested before continuing to process the ASGI handler. This architecture provides proper backpressure between the two languages and fully async on both ends so either language can do operate on other things while waiting for the other.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> python.handleStream(req)

<span class="hljs-comment">// Headers available immediately</span>
<span class="hljs-built_in">console</span>.log(res.status) <span class="hljs-comment">// 200</span>
<span class="hljs-built_in">console</span>.log(res.headers.get(<span class="hljs-string">'content-type'</span>))

<span class="hljs-comment">// Body consumed via AsyncIterator as chunks arrive</span>
<span class="hljs-keyword">for</span> <span class="hljs-keyword">await</span> (<span class="hljs-keyword">const</span> chunk <span class="hljs-keyword">of</span> res) {
  <span class="hljs-built_in">console</span>.log(chunk.toString())
}
</code></pre>
<h3 id="heading-http-request-streaming">HTTP Request Streaming</h3>
<p>In addition to streaming responses, you can now stream request bodies to Python. This is essential for handling large file uploads, processing data progressively, or implementing custom streaming protocols.</p>
<p>Each write returns a promise to provide backpressure from Python so Node.js doesn't write too much data if Python is not consuming it fast enough. It uses an internal buffer, if the buffer has enough space for the write it will resolve immediately, otherwise it will wait for space to be made available.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> req = <span class="hljs-keyword">new</span> Request({
  <span class="hljs-attr">method</span>: <span class="hljs-string">'POST'</span>,
  <span class="hljs-attr">url</span>: <span class="hljs-string">'/upload'</span>,
  <span class="hljs-attr">headers</span>: { <span class="hljs-string">'Content-Type'</span>: <span class="hljs-string">'application/octet-stream'</span> }
})

<span class="hljs-comment">// Dispatch request and write body concurrently</span>
<span class="hljs-keyword">const</span> [res] = <span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.all([
  python.handleStream(req),
  <span class="hljs-keyword">async</span> () =&gt; {
    <span class="hljs-comment">// Stream chunks to Python</span>
    <span class="hljs-keyword">await</span> req.write(chunk1)
    <span class="hljs-keyword">await</span> req.write(chunk2)
    <span class="hljs-keyword">await</span> req.write(chunk3)
    <span class="hljs-keyword">await</span> req.end()
  }()
])
</code></pre>
<h3 id="heading-bidirectional-websocket-support">Bidirectional WebSocket Support</h3>
<p>Full WebSocket support means your Python ASGI applications can now handle persistent, bidirectional connections. Whether you're building a chat application, live dashboard, or multiplayer game, you can implement the WebSocket logic in Python while integrating seamlessly with your Node.js infrastructure.</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> req = <span class="hljs-keyword">new</span> Request({
  <span class="hljs-attr">url</span>: <span class="hljs-string">'/ws'</span>,
  <span class="hljs-attr">websocket</span>: <span class="hljs-literal">true</span>
})

<span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> python.handleStream(req)

<span class="hljs-comment">// Send messages to Python</span>
<span class="hljs-keyword">await</span> req.write(<span class="hljs-string">'Hello from Node.js!'</span>)

<span class="hljs-comment">// Receive messages from Python</span>
<span class="hljs-keyword">for</span> <span class="hljs-keyword">await</span> (<span class="hljs-keyword">const</span> chunk <span class="hljs-keyword">of</span> res) {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Received:'</span>, chunk.toString())
}
</code></pre>
<h3 id="heading-asgi-30-protocol-implementation">ASGI 3.0 Protocol Implementation</h3>
<p>Under the hood, v2.0.0 implements the ASGI 3.0 protocol specification for both HTTP and WebSocket communication. This ensures compatibility with the entire Python async ecosystem, including FastAPI's <code>StreamingResponse</code>, Starlette's WebSocket endpoints, and any other ASGI-compliant framework.</p>
<h3 id="heading-key-benefits">Key Benefits</h3>
<ul>
<li><strong>Lower Memory Footprint</strong>: Stream large responses without buffering everything in memory</li>
<li><strong>Faster Time-to-First-Byte</strong>: Access response headers immediately, before body writing even begins</li>
<li><strong>Real-Time Capabilities</strong>: Build WebSocket applications with bidirectional communication</li>
<li><strong>Better Resource Utilization</strong>: Process chunks as they arrive instead of waiting for completion</li>
<li><strong>Backward Compatible</strong>: Existing code using <code>handleRequest()</code> continues to work almost completely unchanged -- the single breaking change is no more <code>req.body</code> setter/getter</li>
</ul>
<h2 id="heading-http-streaming-in-action-server-sent-events-with-fastapi">HTTP Streaming in Action: Server-Sent Events with FastAPI</h2>
<p>One of the most powerful use cases for HTTP streaming is Server-Sent Events (SSE), which enable servers to push real-time updates to clients over a standard HTTP connection. Let's build a live monitoring dashboard that streams system metrics from Python to Node.js.</p>
<p>Here's a FastAPI application that generates streaming metrics:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fastapi <span class="hljs-keyword">import</span> FastAPI
<span class="hljs-keyword">from</span> fastapi.responses <span class="hljs-keyword">import</span> StreamingResponse
<span class="hljs-keyword">import</span> asyncio
<span class="hljs-keyword">import</span> json
<span class="hljs-keyword">import</span> random
<span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime

app = FastAPI()

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_metrics</span>():</span>
    <span class="hljs-string">"""Generate fake system metrics as server-sent events"""</span>
    <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
        <span class="hljs-comment"># Simulate collecting system metrics</span>
        metrics = {
            <span class="hljs-string">'timestamp'</span>: datetime.now().isoformat(),
            <span class="hljs-string">'cpu_usage'</span>: random.uniform(<span class="hljs-number">20</span>, <span class="hljs-number">80</span>),
            <span class="hljs-string">'memory_usage'</span>: random.uniform(<span class="hljs-number">40</span>, <span class="hljs-number">90</span>),
            <span class="hljs-string">'active_connections'</span>: random.randint(<span class="hljs-number">10</span>, <span class="hljs-number">100</span>),
            <span class="hljs-string">'requests_per_second'</span>: random.randint(<span class="hljs-number">50</span>, <span class="hljs-number">500</span>)
        }

        <span class="hljs-comment"># Format as SSE event</span>
        data = <span class="hljs-string">f'data: <span class="hljs-subst">{json.dumps(metrics)}</span>\n\n'</span>
        <span class="hljs-keyword">yield</span> data.encode()

        <span class="hljs-comment"># Send update every second</span>
        <span class="hljs-keyword">await</span> asyncio.sleep(<span class="hljs-number">1</span>)

<span class="hljs-meta">@app.get('/metrics/stream')</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">stream_metrics</span>():</span>
    <span class="hljs-string">"""Endpoint that streams real-time metrics"""</span>
    <span class="hljs-keyword">return</span> StreamingResponse(
        generate_metrics(),
        media_type=<span class="hljs-string">'text/event-stream'</span>,
        headers={
            <span class="hljs-string">'Cache-Control'</span>: <span class="hljs-string">'no-cache'</span>,
            <span class="hljs-string">'Connection'</span>: <span class="hljs-string">'keep-alive'</span>
        }
    )

<span class="hljs-meta">@app.get('/health')</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">health_check</span>():</span>
    <span class="hljs-string">"""Standard health check endpoint"""</span>
    <span class="hljs-keyword">return</span> {<span class="hljs-string">'status'</span>: <span class="hljs-string">'healthy'</span>, <span class="hljs-string">'version'</span>: <span class="hljs-string">'1.0.0'</span>}
</code></pre>
<p>Now let's consume this stream from Node.js:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { Python, Request } <span class="hljs-keyword">from</span> <span class="hljs-string">'@platformatic/python-node'</span>

<span class="hljs-keyword">const</span> python = <span class="hljs-keyword">new</span> Python({
  <span class="hljs-attr">docroot</span>: <span class="hljs-string">'./python-apps'</span>,
  <span class="hljs-attr">appTarget</span>: <span class="hljs-string">'metrics_app:app'</span>
})

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">monitorMetrics</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> req = <span class="hljs-keyword">new</span> Request({
    <span class="hljs-attr">method</span>: <span class="hljs-string">'GET'</span>,
    <span class="hljs-attr">url</span>: <span class="hljs-string">'http://localhost/metrics/stream'</span>
  })

  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Connecting to metrics stream...'</span>)
  <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> python.handleStream(req)

  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Status: <span class="hljs-subst">${res.status}</span>`</span>)
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Content-Type: <span class="hljs-subst">${res.headers.get(<span class="hljs-string">'content-type'</span>)}</span>`</span>)
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'\nReceiving metrics:\n'</span>)

  <span class="hljs-comment">// Process metrics as they arrive</span>
  <span class="hljs-keyword">for</span> <span class="hljs-keyword">await</span> (<span class="hljs-keyword">const</span> chunk <span class="hljs-keyword">of</span> res) {
    <span class="hljs-keyword">const</span> lines = chunk.toString().split(<span class="hljs-string">'\n'</span>)

    <span class="hljs-keyword">for</span> (<span class="hljs-keyword">const</span> line <span class="hljs-keyword">of</span> lines) {
      <span class="hljs-keyword">if</span> (line.startsWith(<span class="hljs-string">'data: '</span>)) {
        <span class="hljs-keyword">const</span> data = <span class="hljs-built_in">JSON</span>.parse(line.slice(<span class="hljs-number">6</span>))

        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`[<span class="hljs-subst">${data.timestamp}</span>]`</span>)
        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`  CPU: <span class="hljs-subst">${data.cpu_usage.toFixed(<span class="hljs-number">1</span>)}</span>%`</span>)
        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`  Memory: <span class="hljs-subst">${data.memory_usage.toFixed(<span class="hljs-number">1</span>)}</span>%`</span>)
        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`  Connections: <span class="hljs-subst">${data.active_connections}</span>`</span>)
        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`  RPS: <span class="hljs-subst">${data.requests_per_second}</span>`</span>)
        <span class="hljs-built_in">console</span>.log()
      }
    }
  }
}

monitorMetrics().catch(<span class="hljs-built_in">console</span>.error)
</code></pre>
<h3 id="heading-how-it-works">How It Works</h3>
<p>The streaming implementation leverages the interaction between Python's async generators and Node.js's AsyncIterator pattern:</p>
<ol>
<li><p><strong>Python Side</strong>: FastAPI's <code>StreamingResponse</code> accepts an async generator that yields chunks. Each <code>yield</code> dispatches data to Rust via a tokio DuplexStream.</p>
</li>
<li><p><strong>ASGI Bridge</strong>: The Rust-based ASGI implementation receives <code>http.response.body</code> events with the <code>more_body</code> flag, queuing chunks as they arrive.</p>
</li>
<li><p><strong>Node.js Side</strong>: The <code>handleStream()</code> method returns a Response object that implements the AsyncIterator protocol. Each iteration of <code>for await...of</code> receives the next chunk.</p>
</li>
</ol>
<p>This architecture means Node.js can start processing data the moment Python sends the first chunk—no waiting for the complete response. But it also means bidirectional backpressure, so each side can only operate as fast as the other and will yield back to its respective event loop when there's no work to be done.</p>
<h3 id="heading-real-world-use-cases-for-http-streaming">Real-World Use Cases for HTTP Streaming</h3>
<p>Beyond metrics dashboards, HTTP streaming enables:</p>
<ul>
<li><strong>Large File Downloads</strong>: Stream files from Python (e.g., generated reports, media files) without loading them entirely into memory</li>
<li><strong>AI/ML Model Outputs</strong>: Stream generated content from language models or other AI systems</li>
<li><strong>Progressive Data Processing</strong>: Stream database query results or CSV processing as rows are processed</li>
<li><strong>Video/Audio Streaming</strong>: Deliver media content with Python processing (transcoding, filtering) and Node.js delivery</li>
</ul>
<h2 id="heading-websocket-support-building-real-time-applications">WebSocket Support: Building Real-Time Applications</h2>
<p>While HTTP streaming works well for server-to-client communication, WebSockets provide full bidirectional real-time channels. This is useful for chat applications, collaborative editing, live gaming, and scenarios where both client and server need to send messages independently.</p>
<p>Let's build a conversational AI assistant with FastAPI WebSockets:</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> fastapi <span class="hljs-keyword">import</span> FastAPI, WebSocket, WebSocketDisconnect
<span class="hljs-keyword">import</span> json

app = FastAPI()

<span class="hljs-meta">@app.websocket('/ws/assistant')</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">assistant_endpoint</span>(<span class="hljs-params">websocket: WebSocket</span>):</span>
    <span class="hljs-keyword">await</span> websocket.accept()

    <span class="hljs-comment"># Send welcome message</span>
    <span class="hljs-keyword">await</span> websocket.send_text(<span class="hljs-string">'Hello! I am your AI assistant. Ask me anything or try /help for commands.'</span>)

    <span class="hljs-keyword">try</span>:
        <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
            <span class="hljs-comment"># Receive message from client</span>
            message = <span class="hljs-keyword">await</span> websocket.receive_text()

            <span class="hljs-comment"># Simple routing based on message content</span>
            <span class="hljs-keyword">if</span> message.startswith(<span class="hljs-string">'/help'</span>):
                response = <span class="hljs-string">'Available commands: /help, /status, /about, or ask any question'</span>
            <span class="hljs-keyword">elif</span> message.startswith(<span class="hljs-string">'/status'</span>):
                response = <span class="hljs-string">'System status: All services operational'</span>
            <span class="hljs-keyword">elif</span> message.startswith(<span class="hljs-string">'/about'</span>):
                response = <span class="hljs-string">'AI Assistant v1.0 - Powered by Python and Node.js'</span>
            <span class="hljs-keyword">elif</span> message.lower() <span class="hljs-keyword">in</span> [<span class="hljs-string">'hi'</span>, <span class="hljs-string">'hello'</span>, <span class="hljs-string">'hey'</span>]:
                response = <span class="hljs-string">'Hello! How can I help you today?'</span>
            <span class="hljs-keyword">elif</span> <span class="hljs-string">'time'</span> <span class="hljs-keyword">in</span> message.lower():
                <span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime
                response = <span class="hljs-string">f'The current time is <span class="hljs-subst">{datetime.now().strftime(<span class="hljs-string">"%H:%M:%S"</span>)}</span>'</span>
            <span class="hljs-keyword">else</span>:
                <span class="hljs-comment"># Echo back with a simulated AI response</span>
                response = <span class="hljs-string">f'You said: "<span class="hljs-subst">{message}</span>". I am processing your request...'</span>

            <span class="hljs-comment"># Send response back to client</span>
            <span class="hljs-keyword">await</span> websocket.send_text(response)

    <span class="hljs-keyword">except</span> WebSocketDisconnect:
        <span class="hljs-keyword">pass</span>  <span class="hljs-comment"># Client disconnected</span>
</code></pre>
<p>Now let's interact with this assistant from Node.js:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> { Python, Request } <span class="hljs-keyword">from</span> <span class="hljs-string">'@platformatic/python-node'</span>

<span class="hljs-keyword">const</span> python = <span class="hljs-keyword">new</span> Python({
  <span class="hljs-attr">docroot</span>: <span class="hljs-string">'./python-apps'</span>,
  <span class="hljs-attr">appTarget</span>: <span class="hljs-string">'assistant_app:app'</span>
})

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">runAssistant</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-comment">// Create WebSocket request</span>
  <span class="hljs-keyword">const</span> req = <span class="hljs-keyword">new</span> Request({
    <span class="hljs-attr">url</span>: <span class="hljs-string">'http://localhost/ws/assistant'</span>,
    <span class="hljs-attr">websocket</span>: <span class="hljs-literal">true</span>
  })

  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Connecting to AI Assistant...\n'</span>)
  <span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> python.handleStream(req)

  <span class="hljs-comment">// Messages to send to the assistant</span>
  <span class="hljs-keyword">const</span> messages = [
    <span class="hljs-string">'Hello'</span>,
    <span class="hljs-string">'/help'</span>,
    <span class="hljs-string">'/status'</span>,
    <span class="hljs-string">'What is the time?'</span>,
    <span class="hljs-string">'Tell me about yourself'</span>
  ]

  <span class="hljs-keyword">let</span> messageIndex = <span class="hljs-number">0</span>

  <span class="hljs-comment">// Clean for-await loop: read response, then write next message</span>
  <span class="hljs-keyword">for</span> <span class="hljs-keyword">await</span> (<span class="hljs-keyword">const</span> chunk <span class="hljs-keyword">of</span> res) {
    <span class="hljs-keyword">const</span> response = chunk.toString()
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Assistant: <span class="hljs-subst">${response}</span>\n`</span>)

    <span class="hljs-comment">// Send next message if we have more</span>
    <span class="hljs-keyword">if</span> (messageIndex &lt; messages.length) {
      <span class="hljs-keyword">const</span> nextMessage = messages[messageIndex]
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`You: <span class="hljs-subst">${nextMessage}</span>`</span>)
      <span class="hljs-keyword">await</span> req.write(nextMessage)
      messageIndex++
    } <span class="hljs-keyword">else</span> {
      <span class="hljs-comment">// No more messages, close the connection</span>
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Closing connection...'</span>)
      <span class="hljs-keyword">await</span> req.end()
      <span class="hljs-keyword">break</span>
    }
  }
}

runAssistant().catch(<span class="hljs-built_in">console</span>.error)
</code></pre>
<h3 id="heading-how-websocket-communication-works">How WebSocket Communication Works</h3>
<p>The example above demonstrates the clean request-response pattern enabled by WebSockets:</p>
<ol>
<li><p><strong>Connection Establishment</strong>: Node.js creates a Request with <code>websocket: true</code>. Python receives a scope with <code>type: 'websocket'</code> and sends <code>websocket.accept</code> to establish the connection.</p>
</li>
<li><p><strong>Bidirectional Messaging</strong>: The for-await loop reads from the Python server, and each iteration writes a new message back:</p>
<ul>
<li>Python → Node.js: Python's <code>await websocket.send_text()</code> delivers data to Node.js via the AsyncIterator</li>
<li>Node.js → Python: <code>await req.write(data)</code> sends data to Python via <code>websocket.receive_text()</code></li>
</ul>
</li>
<li><p><strong>Clean Loop Pattern</strong>: Unlike the background task pattern often seen in WebSocket examples, this approach uses a single synchronous-style loop where each receive is followed by a send, making the flow easy to understand and debug.</p>
</li>
<li><p><strong>Connection Lifecycle</strong>: Either side can close the connection. Python receives <code>websocket.disconnect</code> events, while Node.js closes via <code>req.end()</code>.</p>
</li>
</ol>
<h3 id="heading-real-world-websocket-use-cases">Real-World WebSocket Use Cases</h3>
<p>WebSocket support enables full-stack teams to build:</p>
<ul>
<li><strong>Conversational AI Assistants</strong>: Build chatbots and AI assistants in Python, exposing them via WebSocket for real-time conversations</li>
<li><strong>Real-Time Chat and Messaging</strong>: Interactive chat backends where Python handles message routing and business logic</li>
<li><strong>Live Data Feeds</strong>: Stream stock prices, sports scores, IoT sensor data, or live metrics with bidirectional control</li>
<li><strong>Interactive Commands</strong>: Command-line style interfaces where users send commands and receive structured responses</li>
<li><strong>Gaming</strong>: Real-time multiplayer game state synchronization with player actions and server updates</li>
<li><strong>Live Customer Support</strong>: Real-time support chat with Python AI integration for automated responses</li>
</ul>
<h2 id="heading-real-world-integration-scenarios-for-full-stack-teams">Real-World Integration Scenarios for Full-Stack Teams</h2>
<p>The combination of streaming and WebSocket support enables several integration patterns for teams using both Python and Node.js:</p>
<h3 id="heading-python-ml-models-with-real-time-inference">Python ML Models with Real-Time Inference</h3>
<p>Run machine learning inference in Python (using PyTorch, TensorFlow, or transformers) and expose results via WebSocket for real-time predictions. Your Node.js API gateway can manage authentication and routing while Python handles the heavy computation.</p>
<pre><code class="lang-python"><span class="hljs-meta">@app.websocket('/ws/inference')</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">ml_inference</span>(<span class="hljs-params">websocket: WebSocket</span>):</span>
    <span class="hljs-keyword">await</span> websocket.accept()
    model = load_model()  <span class="hljs-comment"># Load your ML model</span>

    <span class="hljs-keyword">while</span> <span class="hljs-literal">True</span>:
        data = <span class="hljs-keyword">await</span> websocket.receive_json()
        result = model.predict(data[<span class="hljs-string">'input'</span>])
        <span class="hljs-keyword">await</span> websocket.send_json({<span class="hljs-string">'prediction'</span>: result})
</code></pre>
<h3 id="heading-progressive-data-processing">Progressive Data Processing</h3>
<p>Stream large dataset processing results back to Node.js as they're computed, enabling progress bars, partial result display, or early termination:</p>
<pre><code class="lang-python"><span class="hljs-meta">@app.get('/process/dataset')</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">process_dataset</span>():</span>
    <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">process_stream</span>():</span>
        <span class="hljs-keyword">for</span> batch <span class="hljs-keyword">in</span> dataset.batches(size=<span class="hljs-number">1000</span>):
            result = process_batch(batch)
            <span class="hljs-keyword">yield</span> json.dumps(result).encode() + <span class="hljs-string">b'\n'</span>

    <span class="hljs-keyword">return</span> StreamingResponse(process_stream())
</code></pre>
<h3 id="heading-hybrid-api-gateway">Hybrid API Gateway</h3>
<p>Use Node.js as your API gateway for authentication, rate limiting, and routing, while leveraging Python's rich ecosystem for specific endpoints that need streaming or WebSocket capabilities.</p>
<h3 id="heading-existing-python-tools-with-websocket-interfaces">Existing Python Tools with WebSocket Interfaces</h3>
<p>Wrap existing Python command-line tools or libraries with FastAPI WebSocket endpoints, making them accessible to your Node.js infrastructure without rewriting them.</p>
<h2 id="heading-getting-started-and-migration-guide">Getting Started and Migration Guide</h2>
<h3 id="heading-installation">Installation</h3>
<p>Upgrade to v2.0.0 via npm:</p>
<pre><code class="lang-bash">npm install @platformatic/python-node@latest
</code></pre>
<p>Or with yarn:</p>
<pre><code class="lang-bash">yarn add @platformatic/python-node@latest
</code></pre>
<h3 id="heading-choosing-between-handlerequest-and-handlestream">Choosing Between handleRequest() and handleStream()</h3>
<p>The API now offers two methods for handling requests:</p>
<p><strong>Use <code>handleRequest()</code> when:</strong></p>
<ul>
<li>Response bodies are small and fit comfortably in memory</li>
<li>You would need to buffer the complete response body anyway</li>
<li>Backward compatibility with existing code is required</li>
</ul>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> python.handleRequest(req)
<span class="hljs-built_in">console</span>.log(res.body.toString()) <span class="hljs-comment">// Body available immediately</span>
</code></pre>
<p><strong>Use <code>handleStream()</code> when:</strong></p>
<ul>
<li>Responses are large or potentially unbounded</li>
<li>You need access to headers before the body completes</li>
<li>Implementing Server-Sent Events or streaming protocols</li>
<li>Building WebSocket applications</li>
<li>Memory efficiency is important</li>
</ul>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> res = <span class="hljs-keyword">await</span> python.handleStream(req)
<span class="hljs-built_in">console</span>.log(res.body) <span class="hljs-comment">// `undefined` for streams!</span>
<span class="hljs-built_in">console</span>.log(res.status) <span class="hljs-comment">// Headers available immediately</span>
<span class="hljs-keyword">for</span> <span class="hljs-keyword">await</span> (<span class="hljs-keyword">const</span> chunk <span class="hljs-keyword">of</span> res) {
  <span class="hljs-comment">// Process chunks incrementally</span>
}
</code></pre>
<h3 id="heading-migration-checklist">Migration Checklist</h3>
<p>Existing applications using <code>handleRequest()</code> should mostly all continue to work without changes. The one exception being that the <code>req.body</code> setter and getter are no longer available. To adopt streaming:</p>
<ol>
<li>Identify endpoints that would benefit from streaming (large responses, real-time data, WebSockets)</li>
<li>Switch those specific endpoints to use <code>handleStream()</code></li>
<li>Update response handling to use <code>for await...of</code> iteration</li>
<li>Test thoroughly, especially error handling during streaming</li>
<li>Monitor memory usage to verify streaming benefits</li>
</ol>
<h3 id="heading-documentation-and-resources">Documentation and Resources</h3>
<ul>
<li>Full documentation: <a target="_blank" href="https://github.com/platformatic/python-node">python-node GitHub repository</a></li>
<li>ASGI specification: <a target="_blank" href="https://asgi.readthedocs.io/">asgi.readthedocs.io</a></li>
<li>FastAPI streaming: <a target="_blank" href="https://fastapi.tiangolo.com/advanced/custom-response/#streamingresponse">FastAPI Advanced User Guide</a></li>
<li>FastAPI WebSockets: <a target="_blank" href="https://fastapi.tiangolo.com/advanced/websockets/">FastAPI WebSockets</a></li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>The addition of HTTP streaming and WebSocket support in @platformatic/python-node v2.0.0 expands what's possible for full-stack teams building with Python and Node.js. The existing integration now extends to real-time, high-performance use cases that previously required complex workarounds.</p>
<p>Whether you're building live dashboards, chat applications, progressive data processing pipelines, or exposing Python ML models via WebSocket APIs, v2.0.0 provides the necessary foundation while maintaining backward compatibility with existing code.</p>
<p>The implementation follows the ASGI 3.0 specification, ensuring compatibility with the Python async ecosystem, including FastAPI, Starlette, Django Channels, and other ASGI-compliant frameworks. Combined with running Python directly in Node.js processes, this release provides a practical option for teams looking to leverage both ecosystems.</p>
<p>Try out v2.0.0 today and share your feedback. If you encounter issues or have questions, please open an issue on our <a target="_blank" href="https://github.com/platformatic/python-node">GitHub repository</a>.</p>
]]></content:encoded></item><item><title><![CDATA[How We Made @platformatic/kafka 223% Faster (And What We Learned Along the Way)]]></title><description><![CDATA[A few months ago, we wrote about why we built yet another Kafka client for Node.js. The benchmarks looked promising—we were outperforming KafkaJS and holding our own against the native clients. However, something was concerning. The numbers didn't al...]]></description><link>https://blog.platformatic.dev/how-we-made-platformatickafka-223-faster-and-what-we-learned-along-the-way</link><guid isPermaLink="true">https://blog.platformatic.dev/how-we-made-platformatickafka-223-faster-and-what-we-learned-along-the-way</guid><dc:creator><![CDATA[Paolo Insogna]]></dc:creator><pubDate>Thu, 27 Nov 2025 03:00:50 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767811468661/2ec42337-d9b4-410d-87d5-e04b0a4bae3d.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few months ago, we <a target="_blank" href="https://blog.platformatic.dev/why-we-created-another-kafka-client-for-nodejs">wrote about why we built yet another Kafka client for Node.js</a>. The benchmarks looked promising—we were outperforming KafkaJS and holding our own against the native clients. However, something was concerning. The numbers didn't align with what we were observing in production environments.</p>
<p>We continued running the tests and analyzing the results, but they consistently failed to match our production experience. The variance was high, the sample sizes were small, and we lacked confidence that we were measuring what we intended to measure.</p>
<p>We decided to revisit our approach fundamentally. Not just to make <code>@platformatic/kafka</code> faster, but to ensure we were testing it correctly in the first place.</p>
<p>It turned out our methodology was flawed. Correcting that led us down a path that resulted in substantial performance improvements.</p>
<h2 id="heading-performance-summary">Performance Summary</h2>
<p>Here's where we ended up with v1.21.0:</p>
<ul>
<li><p><strong>Producer (Single Message)</strong>: 92,441 op/sec—48% faster than KafkaJS</p>
</li>
<li><p><strong>Producer (Batch)</strong>: 4,465 op/sec—53% faster than KafkaJS</p>
</li>
<li><p><strong>Consumer</strong>: 159,828 op/sec—9% faster than our previous version</p>
</li>
</ul>
<p>That single-message producer number represents a 223% improvement over v1.16.0.</p>
<h2 id="heading-benchmark-methodology-issues">Benchmark Methodology Issues</h2>
<p>When we initially ran benchmarks for our first blog post, we used what appeared to be a standard approach: send messages, measure elapsed time, and calculate operations per second.</p>
<p>The problem was that we were only capturing timing measurements every 100 messages. For the rdkafka-based libraries, we weren't properly waiting for delivery reports. We were essentially sending messages without tracking when they were actually acknowledged. The timing measurements were inconsistent and unreliable.</p>
<p>Our initial results reflected these methodological flaws:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764018829172/4e5d81d1-f048-4399-8ac0-2cf355633da4.png" alt class="image--center mx-auto" /></p>
<pre><code class="lang-plaintext">┌─────────────────────────────────────────────┬─────────┬────────────────┬───────────┐
│ Library                                     │ Samples │         Result │ Tolerance │
├─────────────────────────────────────────────┼─────────┼────────────────┼───────────┤
│ node-rdkafka                                │     100 │  68.30 op/sec  │ ± 67.58 % │
│ @confluentinc/kafka-javascript (rdkafka)    │     100 │ 220.26 op/sec  │ ±  1.24 % │
│ KafkaJS                                     │     100 │ 383.82 op/sec  │ ±  3.91 % │
│ @platformatic/kafka                         │     100 │ 582.59 op/sec  │ ±  3.97 % │
└─────────────────────────────────────────────┴─────────┴────────────────┴───────────┘
</code></pre>
<p>Consider the variance on node-rdkafka: ±67.58%. This level of variance indicates unreliable measurements. Additionally, only 100 samples provided insufficient statistical confidence.</p>
<p>We completely rewrote the benchmark suite with the following improvements:</p>
<p><strong>Per-operation timing</strong>: Instead of sampling every 100 messages, we now measure timing for each individual operation. This provides significantly more granular data and much lower variance.</p>
<p><strong>Proper delivery tracking</strong>: For rdkafka-based libraries, we now send a message and wait for its specific delivery report before timing the next operation. This ensures accurate per-message timing.</p>
<p><strong>Substantially larger sample sizes</strong>: We increased from 100 samples to 100,000 for most tests. While this increases execution time, the results are statistically meaningful.</p>
<p>When we re-ran the tests with the corrected methodology, the numbers improved dramatically across all libraries—particularly for the rdkafka-based ones:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Library</td><td>Producer Single</td><td>Producer Batch</td><td>Consumer</td></tr>
</thead>
<tbody>
<tr>
<td><strong>@platformatic/kafka v1.21.0</strong></td><td><strong>92,441 op/s</strong></td><td><strong>4,465 op/s</strong></td><td><strong>159,828 op/s</strong></td></tr>
<tr>
<td>@platformatic/kafka v1.16.0</td><td>28,596 op/s</td><td>3,779 op/s</td><td>146,862 op/s</td></tr>
<tr>
<td>KafkaJS</td><td>62,450 op/s</td><td>2,923 op/s</td><td>120,279 op/s</td></tr>
<tr>
<td>node-rdkafka</td><td>16,488 op/s</td><td>701 op/s</td><td>133,526 op/s</td></tr>
<tr>
<td>Confluent KafkaJS</td><td>19,721 op/s</td><td>2,311 op/s</td><td>139,881 op/s</td></tr>
<tr>
<td>Confluent rdkafka</td><td>21,587 op/s</td><td>2,648 op/s</td><td>127,146 op/s</td></tr>
</tbody>
</table>
</div><p>The libraries themselves hadn't changed—we had simply started measuring them accurately.</p>
<p>However, these improved benchmarks revealed performance issues in our own implementation that required attention.</p>
<h2 id="heading-identifying-and-addressing-performance-bottlenecks">Identifying and Addressing Performance Bottlenecks</h2>
<p>With proper measurements in place, we could precisely identify where <code>@platformatic/kafka</code> was spending its time and where optimization opportunities existed.</p>
<p>Our v1.16.0 performance numbers were respectable—28,596 op/sec for single messages—but the ±34.18% variance was concerning. In production environments, variance of this magnitude translates to unpredictable latency spikes, which contradicts our design goals.</p>
<p>We began systematic profiling. The first bottleneck that became apparent was CRC32C computation. We were calculating checksums for every message (as required by the Kafka protocol) using a pure JavaScript implementation. While functional, it exhibited both low throughput and high variance.</p>
<p>We integrated <code>@node-rs/crc32</code>, a native Rust implementation (<a target="_blank" href="https://github.com/platformatic/kafka/pull/126">#126</a>). The improvement was immediate and substantial—not just in throughput, but in consistency. The timing became significantly more predictable.</p>
<p><a target="_blank" href="https://github.com/baac0">@baac0</a> contributed a pull request that refactored error handling in request serialization (<a target="_blank" href="https://github.com/platformatic/kafka/pull/154">#154</a>). Initially, we viewed this primarily as code cleanup. This assessment proved incorrect. By handling errors asynchronously rather than blocking the serialization path, we eliminated an entire category of event loop blockages. Throughput increased substantially.</p>
<p><a target="_blank" href="https://github.com/jmdev12">@jmdev12</a> identified a subtle bug in our metadata request handling (<a target="_blank" href="https://github.com/platformatic/kafka/pull/144">#144</a>). We were improperly mixing callbacks in <code>kPerformDeduplicated</code>, which occasionally caused requests to hang or retry unnecessarily. Resolving this issue significantly improved connection handling reliability.</p>
<p>We also introduced a <code>handleBackPressure</code> option (<a target="_blank" href="https://github.com/platformatic/kafka/pull/127">#127</a>) to provide users with control over flow control behavior. While the Kafka protocol includes back-pressure mechanisms where brokers can signal clients to slow down, we weren't handling this consistently. The new option allows fine-tuning of how the client responds to back-pressure signals.</p>
<p>After implementing these changes, we re-ran the benchmarks.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764018640092/30a6fbcd-78f5-4946-a552-bc5019513635.png" alt class="image--center mx-auto" /></p>
<p>From 28,596 to 92,441 op/sec—a 223% improvement. More significantly, observe the variance reduction to ±1.05%.</p>
<h2 id="heading-batch-processing-performance">Batch Processing Performance</h2>
<p>Single-message performance is important for real-time event streaming, but many Kafka workloads involve bulk data pipelines sending hundreds or thousands of messages in batches.</p>
<p>Our batch performance was already competitive in v1.16.0—3,779 op/sec for batches of 100 messages. With the same optimizations applied, we observed improvements here as well:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764018602125/d9c26f6d-57d9-43d4-9dd4-a9f7f21f221e.png" alt class="image--center mx-auto" /></p>
<p>This represents an 18% improvement to 4,465 op/sec. More significantly, we now outperform KafkaJS by 53% in batch scenarios. This performance difference becomes substantial when processing millions of messages daily.</p>
<h2 id="heading-consumer-performance-improvements">Consumer Performance Improvements</h2>
<p>Our consumer implementation was already performing well in initial tests, but we discovered several bugs. Partition assignment logic had issues (<a target="_blank" href="https://github.com/platformatic/kafka/pull/138">#138</a>), and lag computation had edge cases that could produce incorrect results (<a target="_blank" href="https://github.com/platformatic/kafka/pull/153">#153</a>).</p>
<p>Addressing these bugs improved performance from 146,862 to 159,828 op/sec:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764018669213/a2f8309a-3406-4b48-95d1-a0f5f9a9eeb0.png" alt class="image--center mx-auto" /></p>
<p>The 9% throughput improvement is valuable, but the ±1.75% variance is more significant. This compares favorably to node-rdkafka's ±19.16% and the Confluent clients' ±18-24% variance. Consistent performance is often more valuable than peak throughput in production environments.</p>
<h2 id="heading-performance-architecture">Performance Architecture</h2>
<p>We frequently receive questions about how a pure JavaScript implementation can outperform native bindings to librdkafka. The answer lies not in a single optimization, but in the cumulative effect of multiple architectural decisions:</p>
<p><strong>Minimal buffer copying</strong>: Every buffer allocation and copy adds garbage collection pressure. We designed the entire protocol handling layer to work with buffer slices and views wherever possible. When processing 90,000+ messages per second, avoiding unnecessary allocations has significant impact on both throughput and latency consistency.</p>
<p><strong>Direct protocol implementation</strong>: There is no abstraction layer between application code and the wire protocol. Less indirection means fewer function calls, reduced stack manipulation, and more predictable performance characteristics. This also allows us to optimize hot paths without architectural constraints.</p>
<p><strong>Non-blocking event loop usage</strong>: Node.js performs optimally when used according to its design principles—specifically, with async operations that don't block. The error handling refactor was particularly impactful. We had been blocking on error serialization in several code paths, and eliminating these blocks substantially reduced latency spikes.</p>
<p><strong>Proper stream implementation</strong>: Node.js streams provide built-in back-pressure management when used correctly. When network sockets fill up, the stream pauses writes. When consumers cannot keep up, the fetch loop pauses. This keeps memory usage predictable and prevents unbounded memory growth.</p>
<p><strong>Hot path optimization</strong>: Operations like CRC32C checksums, murmur2 partition hashing, and varint encoding execute for every single message. We profiled these operations extensively, optimized them, and profiled again. The migration to native CRC32C via Rust was the largest single improvement, but numerous smaller optimizations compound significantly at scale.</p>
<p>It's worth noting that librdkafka implements similar optimizations—it's exceptionally well-optimized C code. However, it must cross the Node.js/C++ boundary for every operation, and that boundary crossing carries measurable overhead. By remaining in JavaScript, we avoid that overhead entirely.</p>
<h2 id="heading-the-journey-continues">The Journey Continues</h2>
<p>What started as a nagging doubt about our benchmark methodology turned into something far more valuable: a comprehensive understanding of our library's performance characteristics and a 223% improvement in single-message throughput.</p>
<p>The lessons from this experience are worth highlighting. First, measurement matters—flawed benchmarks don't just waste time, they obscure real performance issues. By fixing our methodology, we exposed bottlenecks we hadn't even known existed. Second, community contributions matter tremendously. The PRs from our contributors didn't just fix bugs—they fundamentally improved our throughput and reliability. Third, consistency matters as much as peak performance. Reducing variance from ±34% to ±1% means your p99 latencies become predictable, which is what production systems actually need.</p>
<p>The results speak for themselves: <code>@platformatic/kafka</code> v1.21.0 now delivers 92,441 op/sec for single messages and 159,828 op/sec for consumption, with variance under ±2% across all scenarios. It's 99% pure JavaScript library and yet it outperforms libraries built on highly optimized C code.</p>
<p>If you're building Node.js applications where Kafka performance matters, we encourage you to evaluate <code>@platformatic/kafka</code>:</p>
<pre><code class="lang-bash">npm install @platformatic/kafka
</code></pre>
<p>Run the benchmarks on your own infrastructure—we've published the complete test suite in <a target="_blank" href="http://BENCHMARKS.md">BENCHMARKS.md</a>. Test it against your workload patterns. And if you find issues or have optimization ideas, we welcome contributions at <a target="_blank" href="http://github.com/platformatic/kafka">github.com/platformatic/kafka</a>. After all, that's how we got here.</p>
<hr />
<p><em>All benchmarks executed on an M2 Max MacBook Pro with Node.js 22.19.0 against a three-broker Kafka cluster. Results may vary based on hardware and network configurations, though relative performance characteristics should remain comparable.</em></p>
]]></content:encoded></item><item><title><![CDATA[93% Faster Next.js in (your) Kubernetes]]></title><description><![CDATA[Next.js has been an incredibly popular project since its launch in 2016, and for good reason: it brings a world of capabilities to developers out-of-the-box. But, for teams looking to run it at scale in their own environments, it can also bring a wor...]]></description><link>https://blog.platformatic.dev/93-faster-nextjs-in-your-kubernetes</link><guid isPermaLink="true">https://blog.platformatic.dev/93-faster-nextjs-in-your-kubernetes</guid><category><![CDATA[Node.js]]></category><category><![CDATA[Next.js]]></category><dc:creator><![CDATA[Matteo Collina]]></dc:creator><pubDate>Tue, 25 Nov 2025 12:00:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1767810050890/4cbd0ac4-7f49-4910-a180-68139d6b5566.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a target="_blank" href="http://Next.js">Next.js</a> has been an incredibly popular project since its launch in 2016, and for good reason: it brings a world of capabilities to developers out-of-the-box. But, for teams looking to run it at scale in their own environments, it can also bring a world of pain.</p>
<p>We'll start by examining the complications of running this powerful framework in your own environment, and get under the hood (and I mean, down to the kernel) about why they happen.</p>
<p>Then, we'll walk you through the approach we took with <a target="_blank" href="https://github.com/platformatic/platformatic">Watt</a> to solve them, and what it means for you if you happen to run Next.js on any other Node.js CPU-bound workload on-prem.</p>
<p>(And if you're just here for the benchmarks, feel free to scroll past all the technical bits 🙂.)</p>
<h3 id="heading-the-fundamental-problems-of-scaling-next-and-node-in-kubernetes">The Fundamental Problems of Scaling Next (and Node) in Kubernetes</h3>
<p>If you're running Node.js at any modicum of scale, specifically in containers, you may know some version of this story:</p>
<p>Traffic spikes hit during peak hours, and suddenly, some pods are maxed out at 100% CPU while others sit at 30% utilization.</p>
<p>Requests start timing out. Your monitoring dashboard lights up red. Users see loading spinners instead of content, and your error rate climbs to 8%.</p>
<p>So you over-provision. You add 50% more pods than you need to handle the uneven load distribution. Your cloud bill grows, but at least <em>most</em> requests succeed.</p>
<p>Except now you're paying for infrastructure that sits idle most of the time, and during the next traffic surge, you're back to the same problem - just with more pods experiencing the same uneven distribution.</p>
<p>Meanwhile, your median latency hovers around 182ms. That doesn't sound terrible until you realize each page load makes multiple API calls. Three sequential calls at 180ms each mean users wait over half a second for basic interactions. In e-commerce, that's the difference between a sale and an abandoned cart. In SaaS, it's the difference between a delighted user and a churned customer.</p>
<p>This point I'm trying to (not so subtly) emphasize is that this isn't just a performance problem. It's a revenue problem. Each failed request during peak traffic is a lost transaction. Every 100ms of latency measurably reduces conversion rates. And all that over-provisioned infrastructure? That's profit margin burning in idle CPU cycles.</p>
<p>And yes, at the end of this, I'm going to tell you how Watt can fix it all, and show you some numbers to prove it. Strap in, it's time to go spelunking into the internals of Node.js and Linux.</p>
<h3 id="heading-under-the-hood">Under the Hood</h3>
<p>Ok. Time to get into the technical details.</p>
<p>For over a decade, the Node.js community has relied on two main approaches for scaling applications across multiple CPU cores:</p>
<h4 id="heading-1-the-cluster-module-and-pm2">1. The <code>cluster</code> module (and PM2)</h4>
<p>When Node.js introduced the <code>cluster</code> module in 2011, it seemed perfect: fork multiple worker processes, share a server port, and let the master process distribute connections using round-robin load balancing. Tools like <a target="_blank" href="https://www.npmjs.com/package/pm2">PM2</a> made this even easier.</p>
<p>But there's a hidden cost. The <code>cluster</code> architecture requires the master process to act as an internal load balancer - accepting every connection and transferring it to workers via IPC (inter-process communication). This introduces approximately <strong>30% overhead</strong> as every request passes through this coordination layer.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1764177938543/d1040360-01d3-4dec-a77c-79e0f506b6b4.png" alt class="image--center mx-auto" /></p>
<h4 id="heading-2-horizontal-scaling-with-single-cpu-pods">2. Horizontal scaling with single-CPU pods</h4>
<p>In Kubernetes environments, the conventional wisdom became: deploy single-CPU pods behind a load balancer. Simple, stateless, easy to scale. But this approach creates a critical problem for frameworks that can't implement early request rejection.</p>
<h4 id="heading-the-early-rejection-problem">The Early Rejection Problem</h4>
<p>Here's the issue: Once a request enters Node.js's event loop queue, it cannot be rejected until processing begins:</p>
<pre><code class="lang-typescript">Request → TCP Accept → Event Loop Queue → [Wait] → Process → <span class="hljs-number">503</span> (Too Late)
                     ↑
                Cannot reject here
</code></pre>
<p>This causes requests to pile up during overload, consuming memory and increasing latency for everyone. An ideal server would reject new requests immediately with a 503 before accepting them, allowing load balancers to route traffic elsewhere. But Node.js's event loop architecture makes this remarkably difficult.</p>
<h3 id="heading-why-nextjs-makes-this-worse">Why Next.js Makes This Worse</h3>
<p>Frameworks like Next.js that rely on React server-side rendering (SSR) fundamentally <strong>cannot implement early 503 responses</strong>. They need the full request context before they can even determine what to do:</p>
<ol>
<li><p><strong>Request Context Required</strong>: SSR needs headers, cookies, and query params before rendering</p>
</li>
<li><p><strong>Dynamic Route Matching</strong>: Next.js must accept the connection to determine which page to execute</p>
</li>
<li><p><strong>Data Fetching Dependencies</strong>: Server components require the request to be in-flight, parallelizing I/O but postponing the CPU-bound activity (learn more why this cause <a target="_blank" href="https://www.youtube.com/watch?v=81AqwvXqgG0&amp;t=536s">Event Loop blocking</a>).</p>
</li>
<li><p><strong>Middleware Execution</strong>: Next.js middleware runs after request acceptance, not before</p>
</li>
</ol>
<p>By the time Next.js knows it's overloaded, the request is already queued and consuming resources.</p>
<h4 id="heading-the-compounding-effect">The Compounding Effect</h4>
<p>When you combine this with traditional scaling approaches, the problems multiply:</p>
<ul>
<li><p><strong>With</strong> <code>cluster</code>/PM2: Every request pays the ~30% IPC overhead, even when the server isn't overloaded.</p>
</li>
<li><p><strong>With single-CPU pods</strong>: Round-robin distribution creates isolated queues where load imbalances compound. One pod might be drowning while another sits idle, but they can't share work.</p>
</li>
</ul>
<p>What we needed was a way to scale across multiple cores <strong>without</strong> the coordination overhead of <code>cluster</code>, while also enabling better statistical distribution of load than isolated single-CPU pods.</p>
<h2 id="heading-how-we-solved-this-with-watt">How We Solved This (with Watt)</h2>
<p>We built <strong>Watt</strong> as an "application server" for Node.js to fix these fundamental issues of scaling Node.js in containerized environments.</p>
<p>Here's what we achieved running Next.js with Watt on AWS EKS under sustained load of 1,000 requests per second:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Metric</td><td>PM2 (Cluster)</td><td>Single-CPU Pods</td><td>Watt</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Median Latency</strong></td><td>182ms</td><td>155ms</td><td><strong>11.6ms</strong></td></tr>
<tr>
<td><strong>P95 Latency</strong></td><td>1,260ms</td><td>1,000ms</td><td><strong>235ms</strong></td></tr>
<tr>
<td><strong>Success Rate</strong></td><td>91.9%</td><td>93.7%</td><td><strong>99.8%</strong></td></tr>
<tr>
<td><strong>Throughput</strong></td><td>910 req/s</td><td>972 req/s</td><td><strong>997 req/s</strong></td></tr>
</tbody>
</table>
</div><p>That's <strong>93.6% faster median latency</strong> and <strong>near-perfect reliability</strong> compared to the standard approaches for scaling Node.js applications.</p>
<p>These results come from production-grade benchmarks on real Next.js applications running on Kubernetes, comparing three common deployment strategies with identical total CPU resources (6 CPUs each). All configurations were tested under the same sustained load pattern, and Watt consistently outperformed both traditional approaches.</p>
<h3 id="heading-under-the-hood-again">Under the Hood (again)</h3>
<p><strong>Watt</strong> leverages a feature of the Linux kernel that allows us to distribute connections across multiple Node.js processes with zero coordination overhead: SO_REUSEPORT. This <code>listen()</code> option allows us to eliminate the ~30% performance tax that PM2 and the <code>cluster</code> module impose through IPC-based load balancing.</p>
<p>The core idea is elegantly simple: instead of having a master process coordinate workers, let the <strong>Linux kernel</strong> handle load distribution directly via SO_REUSEPORT to run multiple Node.js applications with zero coordination overhead.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763740899094/46d5e789-2397-4bbb-9054-9a2b3ef6c2cd.png" alt class="image--center mx-auto" /></p>
<p>So, for teams running Node with any sort of performance sensitivity, here's what that means for you:</p>
<h5 id="heading-1-zero-overhead-load-balancing"><strong>1. Zero-Overhead Load Balancing</strong></h5>
<p>Each Watt worker accepts connections directly from the OS using <code>SO_REUSEPORT</code>. There's no master process, no IPC coordination, no serialization overhead. The kernel distributes incoming connections using an efficient hash-based algorithm, and workers handle them independently.</p>
<h5 id="heading-2-process-orchestration"><strong>2. Process Orchestration</strong></h5>
<p>While workers run independently, Watt manages the process lifecycle:</p>
<ul>
<li><p>Automatic restart of crashed processes</p>
</li>
<li><p>Graceful shutdown handling</p>
</li>
<li><p>Health monitoring and metrics</p>
</li>
<li><p>Coordinated deployments</p>
</li>
</ul>
<h5 id="heading-3-shared-http-cache"><strong>3. Shared HTTP Cache</strong></h5>
<p>Watt includes a shared HTTP cache layer across workers, reducing redundant work and improving cache hit rates. See our post on <a target="_blank" href="https://blog.platformatic.dev/bringing-http-caching-to-nodejs">bringing HTTP caching to Node.js</a> for details. (Oh, and for you all working in Next.js, <a target="_blank" href="https://blog.platformatic.dev/watt-v318-unlocks-nextjs-16s-revolutionary-use-cache-directive-with-redisvalkey">you can now do component caching inside Watt as well.</a>)</p>
<h5 id="heading-4-automatic-health-restarts"><strong>4. Automatic Health Restarts</strong></h5>
<p>Event loop or heap catastrophic failures trigger graceful worker restarts without pod termination, maintaining service availability. These checks are executed from the main thread without triggering the worker thread's event loop; we can perform them even if the application event loop is blocked.</p>
<h5 id="heading-how-it-works"><strong>How It Works</strong></h5>
<p>Watt runs multiple Node.js applications as separate threads within a single Node.js process. Each thread operates independently with its own event loop, but they all share the same listening socket using <code>SO_REUSEPORT</code>:</p>
<pre><code class="lang-js">server.listen({
  <span class="hljs-attr">host</span>: <span class="hljs-string">'127.0.0.1'</span>,
  <span class="hljs-attr">port</span>: <span class="hljs-number">3000</span>,
  <span class="hljs-attr">reusePort</span>: <span class="hljs-literal">true</span>
})
</code></pre>
<p>This single flag - <code>reusePort: true</code> - is what enables the kernel to distribute connections efficiently. But rather than managing this yourself, Watt handles the entire orchestration for you while adding process management, health monitoring, and caching on top.</p>
<p>The result? The same performance as manually using <code>SO_REUSEPORT</code>, but with all the operational features you'd expect from a production application server.</p>
<p>Now, I know we are already 'under the hood' of Watt, but let's get a bit closer to the kernel, shall we?</p>
<h4 id="heading-even-deeper-under-the-hood-how-soreuseport-works">(Even Deeper) Under the Hood:: How SO_REUSEPORT Works</h4>
<p>The magic behind Watt's performance comes from a Linux kernel feature called <code>SO_REUSEPORT</code>, available since kernel 3.9 (April 2013). This socket option fundamentally changes how the operating system distributes incoming connections to processes.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763740381613/f15d6ed7-5254-4caa-960d-9df2440c81d9.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-kernel-level-load-distribution">Kernel-Level Load Distribution</h3>
<p>When you set the <code>reusePort: true</code> option on Node.js's HTTP server, it calls this under the hood:</p>
<pre><code class="lang-c">setsockopt(socket, SOL_SOCKET, SO_REUSEPORT, &amp;opt, <span class="hljs-keyword">sizeof</span>(opt));
</code></pre>
<p>This tells the Linux kernel to distribute incoming connections across all processes listening on the same port using a two-stage hash-based algorithm:</p>
<p><strong>Stage 1: Listen Socket Lookup</strong></p>
<ul>
<li><p>All processes using SO_REUSEPORT on the same port are grouped together in a shared array structure</p>
</li>
<li><p>The destination port determines which bucket in the kernel's LISTEN hash table to search</p>
</li>
</ul>
<p><strong>Stage 2: Connection Distribution</strong></p>
<ul>
<li><p>For each incoming connection, the kernel calculates a hash from the 4-tuple:</p>
<pre><code class="lang-c">  hash(source_ip, source_port, dest_ip, dest_port)
</code></pre>
</li>
<li><p>This hash selects which worker process receives the connection</p>
</li>
<li><p>The worker accepts the connection directly - no coordination needed</p>
</li>
</ul>
<p><strong>Key Properties:</strong></p>
<ul>
<li><p><strong>Connection affinity</strong>: Same client IP:port always reaches the same worker</p>
</li>
<li><p><strong>Even distribution</strong>: Hash function provides balanced load across workers</p>
</li>
<li><p><strong>Zero coordination</strong>: No IPC, no shared state, no serialization</p>
</li>
<li><p><strong>Deterministic</strong>: Based purely on connection parameters, not current load</p>
</li>
</ul>
<p>This is fundamentally different from PM2/cluster, where the master process must accept each connection and then forward it to a worker via Unix domain sockets - adding ~30% overhead.</p>
<p>For more technical details on SO_REUSEPORT, see:</p>
<ul>
<li><p><a target="_blank" href="https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/">Cloudflare: The Sad State of Linux Socket Balancing</a></p>
</li>
<li><p><a target="_blank" href="https://www.f5.com/company/blog/nginx/socket-sharding-nginx-release-1-9-1">NGINX: Socket Sharding in NGINX 1.9.1</a></p>
</li>
</ul>
<h3 id="heading-practical-applications-two-layer-architecture-in-kubernetes">Practical Applications: Two-Layer Architecture in Kubernetes</h3>
<p>When you deploy Watt with multiple workers on a Kubernetes pod with multiple CPUs, you get a two-layer load balancing system (with two layers of resiliency):</p>
<p><strong>Layer 1: Kubernetes Service</strong></p>
<ul>
<li><p>Distributes new TCP connections across pod IPs</p>
</li>
<li><p>Uses round-robin or other configured algorithms</p>
</li>
<li><p>Example: 3 pods become 3 endpoints</p>
</li>
</ul>
<p><strong>Layer 2: Watt and Worker Threads within each pod</strong></p>
<ul>
<li><p>Kernel distributes connections across workers in the same pod using Watt and SO_REUSEPORT</p>
</li>
<li><p>Hash-based selection ensures balanced distribution</p>
</li>
<li><p>Example: Each pod has 2 Watt workers for a total of 6 total workers</p>
</li>
</ul>
<p>This creates better statistical multiplexing (combining multiple signals or data streams to share a single resource) than the traditional approach of 6 single-CPU pods.</p>
<p>Deeper under the hood we go…</p>
<h5 id="heading-1-independent-event-loops"><strong>1. Independent Event Loops</strong></h5>
<p>Each worker has its own event loop on its own CPU core. When Worker 1 is busy processing a slow Next.js SSR request, Worker 2's event loop continues processing its connections independently. Variance in request processing times affects fewer connections.</p>
<h5 id="heading-2-resource-sharing-within-pods"><strong>2. Resource Sharing Within Pods</strong></h5>
<p>Workers in the same pod share:</p>
<ul>
<li><p>Kernel page cache (file system operations)</p>
</li>
<li><p>Memory for binary code (lower memory footprint)</p>
</li>
<li><p>Single network namespace (lower context switching)</p>
</li>
</ul>
<h5 id="heading-3-better-failure-characteristics"><strong>3. Better Failure Characteristics</strong></h5>
<p>With Watt's orchestration:</p>
<ul>
<li><p>Single worker crash: Only 1/6 capacity lost temporarily</p>
</li>
<li><p>Automatic restart without pod termination</p>
</li>
<li><p>Health checks at worker level, not pod level</p>
</li>
<li><p>Graceful failover for catastrophic failures</p>
</li>
</ul>
<h5 id="heading-4-statistical-load-distribution"><strong>4. Statistical Load Distribution</strong></h5>
<p>Hash-based distribution at both layers provides better statistical properties than round-robin to isolated pods. Connections are more evenly distributed, and the two-layer approach reduces the impact of connection-level variance.</p>
<h2 id="heading-production-benchmarks-nextjs-on-aws-eks">Production Benchmarks: Next.js on AWS EKS</h2>
<p>Ok. Now for the fun part. The part where I show you what all this means for your applications, with numbers. (And welcome, to those of you who scrolled here from the beginning of the article.)</p>
<p>To validate the theoretical and simulation results, we ran production-grade benchmarks using Next.js on AWS EKS (Elastic Kubernetes Service), i.e. testing a real-world framework that cannot implement early request rejection.</p>
<p>The tests compared three Kubernetes deployment strategies:</p>
<ol>
<li><p><strong>Single-CPU pods</strong> (6 replicas × 1000m CPU limit, 2GB RAM each = 6 total CPUs)</p>
</li>
<li><p><strong>PM2 multi-worker pods</strong> (3 replicas × 2000m CPU limit with 2 PM2 workers, 4GB RAM each = 6 total CPUs)</p>
</li>
<li><p><strong>Watt multi-worker pods</strong> (3 replicas × 2000m CPU limit with 2 Watt workers, 4GB RAM each = 6 total CPUs)</p>
</li>
</ol>
<p>Given that Next.js Server-Side Rendering is CPU-bound, using 6 CPUs provides a like-for-like comparison.</p>
<p><strong>Infrastructure:</strong></p>
<ul>
<li><p><strong>EKS Cluster:</strong> 3 nodes running m5.2xlarge instances (8 vCPUs, 32GB RAM each)</p>
</li>
<li><p><strong>Load Testing:</strong> c7gn.large instance (2 vCPUs, 4GB RAM, network-optimized)</p>
</li>
<li><p><strong>Load Pattern:</strong> k6 with constant arrival rate of 1,000 requests/second for 120 seconds</p>
</li>
<li><p><strong>Virtual Users:</strong> 1,000 pre-allocated VUs</p>
</li>
</ul>
<p>The environment is totally ephemeral and created on-demand via a shell script and the aws CLI.</p>
<p>All configurations used identical total CPU resources (6 CPUs) for fair comparison. For testing, we used <a target="_blank" href="https://k6.io/">Grafana's K6</a>, configured as follows:</p>
<p><strong>k6 Load Test Configuration:</strong></p>
<pre><code class="lang-javascript"><span class="hljs-keyword">import</span> http <span class="hljs-keyword">from</span> <span class="hljs-string">'k6/http'</span>;
<span class="hljs-keyword">import</span> { check } <span class="hljs-keyword">from</span> <span class="hljs-string">'k6'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> options = {
  <span class="hljs-attr">scenarios</span>: {
    <span class="hljs-attr">constant_arrival_rate</span>: {
      <span class="hljs-attr">executor</span>: <span class="hljs-string">'constant-arrival-rate'</span>,
      <span class="hljs-attr">duration</span>: <span class="hljs-string">'120s'</span>,
      <span class="hljs-attr">rate</span>: <span class="hljs-number">1000</span>,
      <span class="hljs-attr">timeUnit</span>: <span class="hljs-string">'1s'</span>,
      <span class="hljs-attr">preAllocatedVUs</span>: <span class="hljs-number">1000</span>,
    },
  },
};

<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-function"><span class="hljs-keyword">function</span> (<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">const</span> res = http.get(__ENV.TARGET, {
    <span class="hljs-attr">timeout</span>: <span class="hljs-string">"5s"</span>
  });
  check(res, {
    <span class="hljs-string">'status is 200'</span>: <span class="hljs-function">(<span class="hljs-params">r</span>) =&gt;</span> r.status === <span class="hljs-number">200</span>,
  });
}
</code></pre>
<p>This configuration maintains a constant arrival rate of 1,000 requests/second for 120 seconds, with 1,000 pre-allocated virtual users and a 5-second timeout per request.</p>
<p>You can find the complete source code for these benchmarks at: <a target="_blank" href="https://github.com/platformatic/k8s-watt-performance-demo">https://github.com/platformatic/k8s-watt-performance-demo</a>.</p>
<h4 id="heading-benchmark-results">Benchmark Results</h4>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Configuration</td><td>Throughput</td><td>Success Rate</td><td>Median Latency</td><td>P95 Latency</td></tr>
</thead>
<tbody>
<tr>
<td>Single-CPU pods (6×1)</td><td>972 req/s</td><td>93.7%</td><td>155 ms</td><td>1,000 ms</td></tr>
<tr>
<td>PM2 (3×2 workers)</td><td>910 req/s</td><td>91.9%</td><td>182 ms</td><td>1,260 ms</td></tr>
<tr>
<td><strong>Watt (3×2 workers)</strong></td><td><strong>997 req/s</strong></td><td><strong>99.8%</strong></td><td><strong>11.6 ms</strong></td><td><strong>235 ms</strong></td></tr>
</tbody>
</table>
</div><h5 id="heading-key-findings">Key Findings</h5>
<h6 id="heading-1-throughput-and-reliability"><strong><em>1. Throughput and Reliability</em></strong></h6>
<ul>
<li><p><strong>Watt vs PM2</strong>: +9.6% more throughput (997 vs 910 req/s)</p>
</li>
<li><p><strong>Watt vs Single-CPU</strong>: +2.5% more throughput (997 vs 972 req/s)</p>
</li>
<li><p><strong>Watt success rate</strong>: 99.8% vs 91.9% (PM2) and 93.7% (single-CPU pods)</p>
</li>
</ul>
<p>Under sustained load of 1,000 req/s, Watt maintained near-perfect reliability while both PM2 and single-CPU pod architectures experienced significant request failures (8.1% and 6.3% failure rates respectively).</p>
<h6 id="heading-2-latency-performance"><strong><em>2. Latency Performance</em></strong></h6>
<p>Watt delivers dramatically better latency across all percentiles:</p>
<ul>
<li><p><strong>Median (P50)</strong>: 11.6ms vs 182ms (PM2) = <strong>93.6% faster</strong></p>
</li>
<li><p><strong>Median (P50)</strong>: 11.6ms vs 155ms (single-CPU) = <strong>92.5% faster</strong></p>
</li>
<li><p><strong>P95</strong>: 235ms vs 1,260ms (PM2) = <strong>81.3% faster</strong></p>
</li>
<li><p><strong>P95</strong>: 235ms vs 1,000ms (single-CPU) = <strong>76.5% faster</strong></p>
</li>
</ul>
<h6 id="heading-3-why-single-cpu-pods-underperform"><strong><em>3. Why Single-CPU Pods Underperform</em></strong></h6>
<p>Single-CPU pods suffer from two compounding issues: blind load distribution and limited self-healing capability.</p>
<p>Kubernetes distributes connections via round-robin without visibility into each pod's actual load. When one pod starts struggling—perhaps processing a slow SSR request—it keeps receiving new connections at the same rate as healthy pods.</p>
<p>The deeper problem: a Node.js process with a blocked event loop cannot effectively monitor itself. Health checks run on the same event loop, so by the time the process can report it's unhealthy, requests have already queued up and timed out. Kubernetes only sees the problem after users have already experienced failures.</p>
<h6 id="heading-4-why-pm2-underperforms"><strong><em>4. Why PM2 Underperforms</em></strong></h6>
<p>PM2's lower throughput and higher latency validate our analysis of the cluster module overhead:</p>
<ul>
<li><p>Master process acts as internal load balancer, adding IPC overhead</p>
</li>
<li><p>Every request requires socket transfer via Unix domain sockets</p>
</li>
<li><p>Lower success rate (91.9%) indicates the coordination overhead impacts reliability under load</p>
</li>
</ul>
<h6 id="heading-5-watts-advantages"><strong><em>5. Watt's Advantages</em></strong></h6>
<p>Watt's key advantage is external health monitoring combined with <code>SO_REUSEPORT</code>.</p>
<p>Because Watt monitors workers from outside their event loops, it can detect when a worker is struggling and restart it before the situation cascades—without terminating the entire pod. This directly addresses the fundamental problem that a blocked Node.js process cannot effectively monitor itself.</p>
<p>The <code>SO_REUSEPORT</code> architecture eliminates the ~30% IPC overhead imposed by PM2 and the cluster module. Workers accept connections directly from the kernel with zero coordination. Our benchmark demonstrates results: a 99.8% success rate and a 93% faster median latency under sustained load.</p>
<h5 id="heading-benchmark-configuration">Benchmark Configuration</h5>
<p>The tests used Kubernetes deployments on AWS EKS with explicit resource limits:</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Single-CPU pods - 6 replicas with 1000m CPU limit each</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">next</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">6</span>  <span class="hljs-comment"># 6 pods × 1 CPU = 6 total CPUs</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">next</span>
          <span class="hljs-attr">resources:</span>
            <span class="hljs-attr">requests:</span>
              <span class="hljs-attr">cpu:</span> <span class="hljs-string">'1000m'</span>
              <span class="hljs-attr">memory:</span> <span class="hljs-string">'2Gi'</span>
            <span class="hljs-attr">limits:</span>
              <span class="hljs-attr">cpu:</span> <span class="hljs-string">'1000m'</span>
              <span class="hljs-attr">memory:</span> <span class="hljs-string">'2Gi'</span>

<span class="hljs-comment"># PM2 multi-worker pods - 3 replicas with 2000m CPU limit, 2 workers each</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">next-pm2</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>  <span class="hljs-comment"># 3 pods × 2 CPUs = 6 total CPUs</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">next-pm2</span>
          <span class="hljs-attr">env:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">WORKERS</span>
              <span class="hljs-attr">value:</span> <span class="hljs-string">"2"</span>
          <span class="hljs-attr">resources:</span>
            <span class="hljs-attr">requests:</span>
              <span class="hljs-attr">cpu:</span> <span class="hljs-string">'2000m'</span>
              <span class="hljs-attr">memory:</span> <span class="hljs-string">'4Gi'</span>
            <span class="hljs-attr">limits:</span>
              <span class="hljs-attr">cpu:</span> <span class="hljs-string">'2000m'</span>
              <span class="hljs-attr">memory:</span> <span class="hljs-string">'4Gi'</span>

<span class="hljs-comment"># Watt multi-worker pods - 3 replicas with 2000m CPU limit, 2 workers each</span>
<span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span>
<span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span>
<span class="hljs-attr">metadata:</span>
  <span class="hljs-attr">name:</span> <span class="hljs-string">next-watt</span>
<span class="hljs-attr">spec:</span>
  <span class="hljs-attr">replicas:</span> <span class="hljs-number">3</span>  <span class="hljs-comment"># 3 pods × 2 CPUs = 6 total CPUs</span>
  <span class="hljs-attr">template:</span>
    <span class="hljs-attr">spec:</span>
      <span class="hljs-attr">containers:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">next-watt</span>
          <span class="hljs-attr">env:</span>
            <span class="hljs-bullet">-</span> <span class="hljs-attr">name:</span> <span class="hljs-string">WORKERS</span>
              <span class="hljs-attr">value:</span> <span class="hljs-string">"2"</span>
          <span class="hljs-attr">resources:</span>
            <span class="hljs-attr">requests:</span>
              <span class="hljs-attr">cpu:</span> <span class="hljs-string">'2000m'</span>
              <span class="hljs-attr">memory:</span> <span class="hljs-string">'4Gi'</span>
            <span class="hljs-attr">limits:</span>
              <span class="hljs-attr">cpu:</span> <span class="hljs-string">'2000m'</span>
              <span class="hljs-attr">memory:</span> <span class="hljs-string">'4Gi'</span>
</code></pre>
<p>These results confirm that SO_REUSEPORT-based multi-worker pods (Watt) outperform both single-CPU pod scaling and PM2-based multi-worker approaches in real-world production scenarios on Kubernetes.</p>
<h2 id="heading-getting-started-with-watt">Getting Started with Watt</h2>
<p>What's more fun than reading about our results? Replicating them with your own apps in your own environment, of course.</p>
<p>As you might already know, Watt is open source, and straightforward to implement. Simply follow these steps to deploy Next.js in Kubernetes with Watt: <a target="_blank" href="https://docs.platformatic.dev/docs/guides/deployment/nextjs-in-k8s">https://docs.platformatic.dev/docs/guides/deployment/nextjs-in-k8s</a>.</p>
<h4 id="heading-implementation-and-configuration-tips">Implementation and Configuration Tips</h4>
<p><strong>From PM2:</strong></p>
<ul>
<li><p>Remove PM2 ecosystem files</p>
</li>
<li><p>Replace <code>pm2 start</code> with your Watt-enabled entry point</p>
</li>
<li><p>Set <code>workers</code> to match your previous PM2 instance count</p>
</li>
<li><p>Update health checks to target individual workers if needed</p>
</li>
</ul>
<p><strong>From Single-CPU Pods:</strong></p>
<ul>
<li><p>Reduce pod count and increase CPU per pod (maintain total CPU)</p>
</li>
<li><p>Example: 6 × 1-CPU pods → 3 × 2-CPU pods with <code>workers: 2</code></p>
</li>
<li><p>Update resource limits to match worker count</p>
</li>
<li><p>Monitor and adjust based on your traffic patterns (and check out our <a target="_blank" href="https://blog.platformatic.dev/the-intelligent-command-center-for-nodejs-is-now-open-source">Intelligent Command Center</a> if you want to make your life even easier 🙂)</p>
</li>
</ul>
<p>For complete documentation and advanced features like shared HTTP caching, visit the Watt GitHub repository.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>The 93% latency improvement we showed at the start of this post isn't magic - it's the result of eliminating unnecessary coordination overhead and letting the Linux kernel do what it does best: efficiently distributing network connections.</p>
<p>Traditional Node.js scaling approaches, whether PM2's cluster module or horizontally scaled single-CPU pods, impose architectural constraints that hurt performance:</p>
<ul>
<li><p><strong>PM2/cluster</strong>: Every new connections pays a ~30% IPC tax for master-worker coordination</p>
</li>
<li><p><strong>Single-CPU pods</strong>: Isolated queues compound load imbalances, leading to higher failure rates</p>
</li>
</ul>
<p>Watt takes a different approach: leverage SO_REUSEPORT to let multiple Node.js workers accept connections directly from the kernel, with zero coordination overhead. Then, we add all the needed control systems like healthchecks. The result is consistent and dramatic:</p>
<ul>
<li><p><strong>93.6% faster median latency</strong> than PM2</p>
</li>
<li><p><strong>99.8% reliability</strong> under sustained load</p>
</li>
<li><p><strong>9.6% more throughput</strong> with the same CPU resources</p>
</li>
</ul>
<p>What makes this especially compelling is the simplicity of implementation. You don't need specialized hardware, complex infrastructure changes, or extensive code refactoring. The path from PM2 or single-CPU pods to Watt is straightforward, and the performance gains are immediate. (Believe me, compared to some of the things I've teams do to achieve even a quarter of these improvements, this is pretty incredible ROI for your time here.)</p>
<p>If you're running Node.js applications - especially frameworks like Next.js that can't implement early request rejection - on Kubernetes or with PM2, you now have a proven alternative. The benchmarks speak for themselves, and the implementation is open source.</p>
<p>Give <a target="_blank" href="https://platformatic.dev/">Watt</a> a try on your next deployment and measure the difference yourself. Your p95 latency will thank you.</p>
<p>If you want to have a chat with us about any of this, or are interested in professional support or architecture guidance with Watt or any of our other projects, feel free to send an email to <a target="_blank" href="mailto:info@platformatic.dev">info@platformatic.dev</a> or add either <a target="_blank" href="https://www.linkedin.com/in/lucamaraschi/">Luca</a> or <a target="_blank" href="https://www.linkedin.com/in/matteocollina/">me</a> on LinkedIn.</p>
]]></content:encoded></item><item><title><![CDATA[Watt v3.18 Unlocks Next.js 16's Revolutionary 'use cache' Directive with Redis/Valkey]]></title><description><![CDATA[We're excited to announce that Watt 3.18.0 now supports Next.js 16.0, bringing a transformative shift in how you build performant Next.js applications.
This is a game-changer because Next.js 16 fundamentally reimagines React caching. For the first ti...]]></description><link>https://blog.platformatic.dev/watt-v318-unlocks-nextjs-16s-revolutionary-use-cache-directive-with-redisvalkey</link><guid isPermaLink="true">https://blog.platformatic.dev/watt-v318-unlocks-nextjs-16s-revolutionary-use-cache-directive-with-redisvalkey</guid><category><![CDATA[Node.js]]></category><category><![CDATA[Next.js]]></category><dc:creator><![CDATA[Paolo Insogna]]></dc:creator><pubDate>Tue, 18 Nov 2025 15:00:44 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1763401412670/6d4d5c9b-fc2f-47d2-b53e-f3e099d451e7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We're excited to announce that <a target="_blank" href="https://docs.platformatic.dev/watt">Watt</a> 3.18.0 now supports <a target="_blank" href="https://nextjs.org/blog/next-16">Next.js 16.0</a>, bringing a transformative shift in how you build performant Next.js applications.</p>
<p>This is a game-changer because Next.js 16 fundamentally reimagines React caching. For the first time, you can cache individual React components and functions with a simple <code>use cache</code> directive, giving you surgical precision over what gets cached and when. No more guessing about implicit caching behavior or wrestling with complex cache invalidation strategies. You now have explicit, fine-grained control over your application's performance.</p>
<p>However, here's where it gets even more exciting: while Next.js 16 provides the caching primitives, it requires custom configuration to enable component caching in production environments. That's where Watt 3.18.0 comes in. With Watt's <a target="_blank" href="https://blog.platformatic.dev/introducing-efficient-valkey-based-caching-for-nextjs">Redis/Valkey cache adapter</a>, you get instant, zero-configuration distributed caching. This means your cached React components are automatically shared across all application instances, eliminating cache inconsistencies and dramatically improving performance at scale.</p>
<p>The combination unlocks what was previously difficult to achieve: component-level caching that works seamlessly in distributed, production-scale Next.js deployments. Add a single line (<code>cacheComponents: true</code>) to your config, sprinkle <code>use cache</code> directives where you need them, and Watt handles all the complexity behind the scenes.</p>
<h2 id="heading-the-self-hosting-challenge-why-distributed-caching-matters">The Self-Hosting Challenge: Why Distributed Caching Matters</h2>
<p>When deploying Next.js applications outside of Vercel's platform—whether on your own infrastructure, Kubernetes clusters, or other cloud providers—you face a significant caching challenge that can impact both performance and data consistency. Next.js defaults to file-system-based caching, which works perfectly fine for single-server deployments but creates serious issues when scaling horizontally.</p>
<p>Here's the problem: in a typical self-hosted production environment, such as a Kubernetes deployment with multiple replicas, you run multiple instances of your Next.js application behind a load balancer for high availability and scalability. When each instance relies on local disk caching, every server maintains its own separate cache. This means:</p>
<ul>
<li><p><strong>Cache Inconsistencies</strong>: Different users may see different cached data depending on which server handles their request. User A hits Server 1 and sees cached data from an hour ago, while User B hits Server 2 and sees data cached just seconds ago.</p>
</li>
<li><p><strong>Wasted Resources</strong>: Each server independently fetches and caches the same data, multiplying your database/API load and memory usage across all instances.</p>
</li>
<li><p><strong>Cache Invalidation Complexity</strong>: When data changes, you need to invalidate caches across all servers simultaneously, which becomes a distributed systems problem without proper tooling.</p>
</li>
</ul>
<p>Vercel's platform solves this through its <a target="_blank" href="https://vercel.com/docs/data-cache">Data Cache</a> infrastructure, which provides a distributed cache shared across all function invocations. But when self-hosting, you're on your own to implement this critical piece of infrastructure.</p>
<p>This is precisely why Watt's Redis/Valkey adapter is essential for production Next.js deployments: it gives self-hosted applications the same distributed caching capabilities that Vercel provides as a platform feature. With a centralized cache store, all your application instances share the same cached data, ensuring consistency while dramatically improving performance and resource efficiency.</p>
<h2 id="heading-understanding-nextjs-16s-use-cache-directive">Understanding Next.js 16's use cache Directive</h2>
<p><a target="_blank" href="https://nextjs.org/blog/next-16">Next.js 16</a> introduces a fundamental shift in how caching works. Unlike previous versions, where caching was implicit, Next.js 16 makes caching explicit through the <a target="_blank" href="https://nextjs.org/docs/app/api-reference/directives/use-cache"><code>use cache</code> directive</a>. This gives you fine-grained control over what gets cached and when.</p>
<p>You can use the directive at different levels:</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Cache an entire component</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">ProductList</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-string">"use cache"</span>;
  <span class="hljs-keyword">const</span> products = <span class="hljs-keyword">await</span> fetchProducts();
  <span class="hljs-keyword">return</span> <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>{/* render products */}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>;
}

<span class="hljs-comment">// Cache a specific function</span>
<span class="hljs-keyword">export</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getProductData</span>(<span class="hljs-params">id</span>) </span>{
  <span class="hljs-string">"use cache"</span>;
  <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">`/api/products/<span class="hljs-subst">${id}</span>`</span>);
  <span class="hljs-keyword">return</span> data.json();
}

<span class="hljs-comment">// Cache at the file level</span>
(<span class="hljs-string">"use cache"</span>);
<span class="hljs-keyword">export</span> <span class="hljs-keyword">default</span> <span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">Page</span>(<span class="hljs-params"></span>) </span>{
  <span class="hljs-keyword">return</span> <span class="xml"><span class="hljs-tag">&lt;<span class="hljs-name">div</span>&gt;</span>{/* page content */}<span class="hljs-tag">&lt;/<span class="hljs-name">div</span>&gt;</span></span>;
}
</code></pre>
<p>This explicit approach provides more flexibility and control compared to the automatic caching behavior in earlier versions. Learn more about <a target="_blank" href="https://nextjs.org/docs/app/guides/caching">Next.js caching strategies</a> in the official documentation.</p>
<h2 id="heading-what-is-watt">What is Watt?</h2>
<p><a target="_blank" href="https://docs.platformatic.dev/watt">Watt</a> is an extensible Node.js application server designed to simplify how you build, deploy, and scale applications. Whether you're creating simple APIs, microservices, or full-stack applications, Watt acts as a powerful orchestration layer that composes multiple services into a single cohesive system.</p>
<p>With Watt, you can seamlessly integrate frontend frameworks like Next.js, Astro, or Remix with backend microservices, all while benefiting from:</p>
<ul>
<li><p><strong>Zero-configuration deployment</strong> - Applications run instantly without complex setup</p>
</li>
<li><p><strong>Service orchestration</strong> - Coordinates multiple applications and services seamlessly</p>
</li>
<li><p><strong>Production-ready features</strong> - Built-in monitoring, logging, and operational best practices</p>
</li>
<li><p><strong>Multi-threading support</strong> - Leverages Node.js worker threads for improved performance</p>
</li>
<li><p><strong>Shared caching</strong> - Centralized caching strategies like the Redis/Valkey adapter for distributed cache across all instances</p>
</li>
</ul>
<p>Watt handles the complexity of inter-service communication, caching strategies, and deployment, allowing you to focus on building your application logic. Learn more in our <a target="_blank" href="https://blog.platformatic.dev/introducing-watt-3">Watt 3 introduction post</a>.</p>
<p>The latest version of Watt adds first-class support for <a target="_blank" href="https://nextjs.org/blog/next-16">Next.js 16.0</a>, including the new <a target="_blank" href="https://nextjs.org/docs/app/getting-started/cache-components">Cache Components</a> feature. When you enable Watt's Redis/Valkey adapter and set the new <code>cacheComponents</code> option to <code>true</code>, you unlock the full potential of Next.js 16's <a target="_blank" href="https://nextjs.org/docs/app/api-reference/directives/use-cache"><code>use cache</code> directive</a>.</p>
<h3 id="heading-key-features">Key Features</h3>
<ul>
<li><p><strong>Component-Level Caching</strong>: Use the <code>use cache</code> directive to cache individual React components, functions, or entire route handlers</p>
</li>
<li><p><strong>Distributed Caching with Valkey</strong>: Share cached components across all your application instances using <a target="_blank" href="https://redis.io">Redis</a> or <a target="_blank" href="https://valkey.io/">Valkey</a>.</p>
</li>
<li><p><strong>Simple Configuration</strong>: Enable component caching with a single <code>cacheComponents: true</code> setting in your Watt configuration</p>
</li>
<li><p><strong>Seamless Integration</strong>: Watt handles all the complexity of configuring Next.js 16's cache handler behind the scenes</p>
</li>
</ul>
<h2 id="heading-enabling-component-caching-in-watt">Enabling Component Caching in Watt</h2>
<p>To take advantage of Next.js 16's component caching with Watt, you need to configure the Redis/Valkey adapter and enable component caching in your <code>watt.json</code>:</p>
<pre><code class="lang-json">{
  <span class="hljs-attr">"cache"</span>: {
    <span class="hljs-attr">"adapter"</span>: <span class="hljs-string">"valkey"</span>,
    <span class="hljs-attr">"url"</span>: <span class="hljs-string">"valkey://redis.example.com:6379"</span>,
    <span class="hljs-attr">"cacheComponents"</span>: <span class="hljs-literal">true</span>
  }
}
</code></pre>
<p>With these settings in place, Watt automatically configures Next.js 16's cache handler to use <a target="_blank" href="https://redis.io/">Redis</a> or <a target="_blank" href="https://valkey.io/">Valkey</a> for storing cached components, making them available across all instances of your application.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1763401471462/23ece4ac-3a8e-42b5-96f9-c7ae145c40a7.jpeg" alt class="image--center mx-auto" /></p>
<h2 id="heading-important-isr-cache-behavior">Important: ISR Cache Behavior</h2>
<p>When you enable the Redis/Valkey adapter and set <code>cacheComponents: true</code>, there's an important architectural consideration: <strong>Next.js 16 does not support running both the new component caching and the legacy</strong> <a target="_blank" href="https://nextjs.org/docs/app/guides/incremental-static-regeneration"><strong>Incremental Static Regeneration (ISR)</strong></a> <strong>cache simultaneously</strong>.</p>
<p>As a result, when component caching is enabled in Watt 3.18.0 with Next.js 16, the traditional ISR cache is automatically disabled. This is a limitation of Next.js 16's architecture, not Watt. The new <code>use cache</code> directive provides more explicit and flexible caching capabilities that supersede the older ISR approach.</p>
<p>If your application relies heavily on ISR, you can continue using Next.js 16 with <code>cacheComponents</code> disabled (or omitted) and avoid using the <code>use cache</code> directive. This way, you can benefit from other Next.js 16 features while maintaining your existing ISR-based caching strategy.</p>
<h2 id="heading-why-redisvalkey-for-nextjs-caching">Why Redis/Valkey for Next.js Caching?</h2>
<p><a target="_blank" href="https://redis.io/">Redis</a> and <a target="_blank" href="https://valkey.io/">Valkey</a> are high-performance, in-memory data stores optimized for caching workloads. When scaling Next.js applications horizontally, local caching solutions fall short because each instance maintains its own cache, leading to inconsistent user experiences.</p>
<p>By using <a target="_blank" href="https://blog.platformatic.dev/introducing-efficient-valkey-based-caching-for-nextjs">Watt's Redis/Valkey adapter</a>, you get:</p>
<ul>
<li><p><strong>Shared cache across instances</strong>: All application replicas access the same cached data</p>
</li>
<li><p><strong>Automatic cache key management</strong>: Watt handles cache key generation using Next.js 16's compiler</p>
</li>
<li><p><strong>Production-ready performance</strong>: Autopipelining support ensures minimal latency</p>
</li>
<li><p><strong>Simple configuration</strong>: No need to manually configure cache handlers or manage cache lifecycle</p>
</li>
</ul>
<p>Learn more about <a target="_blank" href="https://redis.io/">Redis</a> and <a target="_blank" href="https://valkey.io/">Valkey</a> in their official documentation.</p>
<h2 id="heading-getting-started">Getting Started</h2>
<p>To try Next.js 16 component caching with Watt 3.18.0:</p>
<ol>
<li><p>Update to <a target="_blank" href="https://github.com/platformatic/platformatic/pull/4397">Watt 3.18.0</a> or later</p>
</li>
<li><p>Upgrade your Next.js dependency to version 16.0 or higher</p>
</li>
<li><p>Configure the Redis/Valkey adapter in your <code>watt.json</code></p>
</li>
<li><p>Enable <code>cacheComponents: true</code> in your Next.js service configuration</p>
</li>
<li><p>Add <code>use cache</code> directives to the components and functions you want to cache</p>
</li>
</ol>
<p>That's it! Watt takes care of all the underlying complexity, allowing you to focus on building great applications.</p>
<h2 id="heading-learn-more">Learn More</h2>
<ul>
<li><p><a target="_blank" href="https://docs.platformatic.dev/watt">Platformatic Watt Documentation</a></p>
</li>
<li><p><a target="_blank" href="https://nextjs.org/blog/next-16">Next.js 16 Announcement</a></p>
</li>
<li><p><a target="_blank" href="https://nextjs.org/docs/app/getting-started/cache-components">Next.js Cache Components Guide</a></p>
</li>
<li><p><a target="_blank" href="https://nextjs.org/docs/app/api-reference/directives/use-cache">use cache Directive Reference</a></p>
</li>
<li><p><a target="_blank" href="https://blog.platformatic.dev/introducing-efficient-valkey-based-caching-for-nextjs">Valkey-Based Caching for Next.js</a></p>
</li>
<li><p><a target="_blank" href="https://valkey.io/">Valkey Project</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/platformatic/platformatic/pull/4397">PR #4397: Next.js 16.0 Support</a></p>
</li>
</ul>
<p>We're committed to making <a target="_blank" href="https://platformatichq.com">Platformatic</a> the best platform for building and deploying modern web applications. This update represents another step forward in providing seamless, production-ready tooling for the latest web technologies.</p>
<p>Try it out and let us know what you think!</p>
]]></content:encoded></item></channel></rss>