Multi-Threading Approaches for 10,000 Requests

When evaluating the feedback we received after publishing our Next.js x Kubernetes benchmarks (showed 93% faster latency with Watt), a natural question emerged:

Could this approach apply to other Node.js frameworks as well? If so, does a general pattern begin to emerge that performance-sensitive Node.js applications benefit from being run with Watt?

While the jury is still out on if Watt can help every performance-sensitive Node.js app out there, we’ve done another round of benchmarks, this time with React Router, to see if our previous results with Next could be replicated with other frameworks. The answer?

Why yes, of course. I wouldn’t be writing an article about it if I didn’t have some compelling numbers to share 🙂

Methodology

We ran React Router (framework mode), the go-to library for handling server and client-side navigations in React applications, through an extreme load test: 10,000 HTTP requests per second for 120 seconds. The results confirm that Watt delivers notable performance improvements across the Node.js ecosystem.

Why 10x the Load?

Our Next.js benchmarks used 1,000 requests per second. For React Router, we cranked it up to 10,000.

Why? Because React Router is significantly more efficient than Next.js for server-side rendering (“SSR”) workloads. SSR with Next.js involves heavier processing: full React server components, complex routing logic, and more extensive middleware chains. React Router’s server rendering is leaner and faster. At 1,000 req/s, all three configurations (PM2, Watt, Single Node) handled React Router without breaking a sweat, and we couldn’t see meaningful differences.

To properly stress-test the systems and expose the architectural differences, we needed to crank up the dial by an order of magnitude. This extreme load is where the cracks appear and Watt’s advantages become stark.

The Benchmark Setup

We tested three deployment strategies on AWS EKS, all using identical total CPU resources (6 CPUs):

Single-CPU pods (6 replicas x 1000m CPU limit each)
PM2 multi-worker pods (3 replicas x 2000m CPU limit with 2 PM2 workers each)
Watt multi-worker pods (3 replicas x 2000m CPU limit with 2 Watt workers each)

Infrastructure:

EKS Cluster: 3 nodes running m5.2xlarge instances (8 vCPUs, 32GB RAM each)
Load Testing: c7gn.2xlarge instance (8 vCPUs, 16GB RAM, network-optimized)
Load Pattern: k6 with a constant arrival rate of 10,000 requests/second for 120 seconds
Virtual Users: Up to 20,000 VUs
Request Timeout: 5 seconds

This is an extreme stress test - 10x the load we used in our Next.js benchmarks. All three configurations hit the 20,000 VU ceiling, confirming we pushed each system to its absolute limits.

Results Summary

Note that the average latency is lower in the single-node deployment due to lower number of errors that Watt has. The server are at saturation point, therefore any additional requests passing through would raise the latency.

Key Findings

Watt vs PM2: A Dramatic Difference

Under extreme load, Watt consistently outshines PM2:

45% higher throughput (6,032 vs 4,154 req/s)
45% lower failure rate (37.9% vs 69.2%)
2.9x more successful responses (467K vs 160K)
21% lower average latency (866ms vs 1.1s)

Moreover, Watt showed better tail latency performance. The P95 latency for Watt was significantly lower than that for PM2, reinforcing Watt's reliability under heavy traffic conditions. This emphasis on both average and tail latencies provides a more detailed view of Watts' advantage in real-world conditions.

Throughout our testing, PM2 struggled with the load we were putting it under, dropping over 680,000 iterations and failing on nearly 70% of requests. Watt, using the same CPU resources, maintained only a 37.9% failure rate while processing nearly 3x more successful requests.

The (Un)surprising Single Node Performance

The results from this benchmark support our study from the Next.js benchmarks: PM2’s cluster module architecture (which creates child processes and routes all incoming connections through a single master process via inter-process communication, or IPC) brings substantial overhead (30%) that becomes overwhelming under heavy load.

This observation challenges conventional assumptions about classic Node.js deployments, which often treat multiple processes as necessary to effectively handle high volumes of tasks. It encourages us to reexamine our deployment strategies in similar high-load situations, potentially simplifying architectures that may be plagued by process-management overhead.

Watt vs Single Node

Watt also edges out a Single Node running React Router, most notably when it comes to resilience:

3% higher throughput (6,032 vs 5,838 req/s)
10% lower failure rate (37.9% vs 42.1%)
11% more successful responses (467K vs 420K)

While the margin is smaller than what we’ve seen with Next, we will take a cool 10% reduction in failure rate back to the lab for the next sprint.

Why These Results Matter

It’s worth taking a moment to ground these numbers in what matters to your business (managers, shareholders, etc.) and users (presumably, customers). For example, most developers writing in Node.js, spend a lot of time thinking about how quickly they can load a given page for a user (“latency”, measured in milliseconds), how many users requests they might be able to serve simultaneously (requests per second), and how often those requests fail (failure rate).

Latency, requests per second, and failure rate all have very real business consequences, from user churn to abandoned carts, that have a very real and material cost on the bottom line for your business, especially for teams working at any sort of scale.

Why Watt Thrives at Scale

Watt’s provides the benefits of utilizing multiple workers across multiple cores while avoiding node:cluster and PM2’s management overhead by using SO_REUSEPORT to let the Linux kernel distribute connections directly to workers:

With Watt, there’s no need for master-worker coordination, IPC overhead, or serialization. Instead, worker processes accept connections directly from the OS, using the Linux kernel’s fast, hash-based algorithm to distribute connections evenly across workers.

Conclusion

This latest round of benchmarks with React Router and PM2 confirms what we saw with our previous benchmarks and Next.js: Watts’ SO_REUSEPORT-based architecture delivers significant performance gains over classic Node.js scaling approaches; particularly when compared to PM2 in this instance.

Under extreme load of 10,000 requests per second:

Watt delivered 45% higher throughput than PM2 (6,032 vs 4,154 req/s)
Watt processed 2.9x more successful responses than PM2 (467K vs 160K)
Watt delivered 3% higher throughput than Single Node (6,032 vs 5,838 req/s)
Watt achieved a 10% lower failure rate than Single Node (37.9% vs 42.1%)

Single Node outperformed PM2, confirming our assessment of cluster module overhead. Watt surpassed both by efficiently using multiple CPU cores without the management overhead of PM2, offering multi-core parallelism, operational benefits, and effective load distribution. Additionally, Watt's architecture complements existing service-mesh and autoscaling patterns, rendering it a strategic fit for modern distributed systems. By integrating smoothly, it increases performance while in accordance with the architects’ mental models and simplifying deployment strategies.

Crucially, Watt’s main thread continuously monitors all worker threads, detecting and recovering from catastrophic event loop situations - blocked loops, memory leaks, or unresponsive workers. When a worker becomes unhealthy, Watt gracefully restarts it without affecting other workers or requiring pod termination. With a Single Node, a blocked event loop means your entire pod is down until Kubernetes notices and restarts it. With Watt, the main thread catches the problem early, restarts just that worker, and your service stays available.

The takeaway: at scale, PM2 and single-process deployments underperform. Watt delivers multi-core power without traditional tradeoffs for Node.js apps.

Getting Started with Watt

Watt is open-source and easy to implement, and you don’t need to be serving 10,000 requests per second to benefit from adopting it, either. In fact, much of our time is spent on good, old-fashioned developer experience. One team cut its P95 latency by 30% in just one afternoon by simply integrating Watt into its existing system during an onsite architecture workshop with us. (If you have a particularly business-critical or thorny app you think Watt could, we love doing these - drop me a message on LinkedIn!)

When configuring Watt with your project, we recommend setting the number of workers to match your CPU allocation to start with. From there, you simply deploy using the same Kubernetes setup you're already using and you’re off to the races.

For a complete guide, see our documentation: https://docs.platformatic.dev/docs/guides/deployment/nextjs-in-k8s

The benchmark code is available at: https://github.com/platformatic/k8s-watt-performance-demo.

If you have questions or want help getting Watt set-up in your environment, contact us at info@platformatic.dev or connect with Luca or Matteo on LinkedIn. We always love hearing from the community!

10,000 requests, 2 approaches to multi-threading, 1 React-Router