AWS ECS Autoscaling vs Platformatic ICC: A Performance Comparison

If you run Node.js services on AWS ECS, you don’t have many built-in ways to handle changing loads. Your only two options respond within minutes: Target Tracking Scaling and Step Scaling. Both monitor a CloudWatch metric (usually CPU), compare it to a set threshold, and then add or remove tasks. The scalers will accurately react to changes, but always with a delay of several minutes:

Target Tracking won’t start scaling up until CPU usage has stayed above the threshold for three minutes in a row, and it won’t scale down until it’s been below for fifteen minutes. (These timings are set by AWS and can’t be changed.)
Step Scaling can be set to react after just one minute over the threshold, which is quicker, but it’s still reactive, still based on CPU, and still doesn’t see what’s happening inside the Node.js event loop.

In this article, we’ll dig into the algorithms we’ve designed to achieve more effective auto-scaling responses within our Intelligent Command Center (“ICC”). This tuning can make a massive impact on performance:

ICC kept the median request time at 20 ms with a 99.99% success rate
Step Scaling’s median was 471 ms (about 23 times slower) with a 85.58% success rate
Target Tracking was even slower, with a 929 ms median with a 74.76% success rate

How AWS-native ECS scalers actually work

ECS doesn’t ship its own autoscaler. It delegates to Application Auto Scaling (AAS), a generic AWS service that scales ECS services, DynamoDB tables, Lambda concurrency, and a dozen other targets. AAS provides four scaling policy types: Target Tracking, Step Scaling, Scheduled Scaling, and Predictive Scaling.

Target Tracking and Step Scaling are the two that engage with dynamic load, the ones the average team configures and the ones we benchmark in this post. Scheduled Scaling and Predictive Scaling solve a different problem; we come back to them in a separate section below. The mechanics of all four are worth understanding because they explain why the benchmark results look the way they do.

Target Tracking Scaling

Target Tracking is the “easy mode” option. You pick a metric (the predefined ECSServiceAverageCPUUtilization is the most common), set a target value (e.g., 50%), and AAS handles the rest. You never write a CloudWatch alarm yourself.

When you register a Target Tracking policy with a target 50%, AAS quietly creates two CloudWatch alarms:

AlarmHigh: CPU > 50% over 3 consecutive 60-second windows → triggers scale-up
AlarmLow: CPU < 45% over 15 consecutive 60-second windows → triggers scale-down

(Scale-down fires at 90% of the target metric, i.e. at 45% if your target is 50%. That offset is also hardcoded.)

These intervals are not configurable. You can change the target value, the cooldowns, and a few other knobs, but the 3-of-3 and 15-of-15 evaluation periods are baked into the AAS implementation. AWS does not document these evaluation periods on the Target Tracking concepts page; you only see them by inspecting the CloudWatch alarms AAS creates after you register a policy. (For comparison, the DynamoDB auto scaling page does state its equivalent numbers explicitly: 2 minutes for scale-up, 15 datapoints for scale-down. AWS documents these alarm parameters per-service, when it does so at all.)

When the alarm does fire, the scaling decision uses the same formula as Kubernetes HPA:

new_desired = ceil(current_tasks × current_cpu / target_cpu)

So if 4 tasks are running at 90% CPU with a 50% target, AAS sets the desired count to ceil(4 × 90 / 50) = 8. The same one-shot snapshot logic as HPA, just with three minutes of mandatory pre-roll added to the front. (You can find more details on how the HPA works and how we’ve improved scaling on Kubernetes with ICC here.)

Step Scaling

Step Scaling is the “advanced” option. You write your own CloudWatch alarms, with your own thresholds and evaluation periods, and you provide AAS with a step adjustment table that maps “how far over the threshold” to “how many tasks to add.” A small breach adds one task; a larger breach adds more. We paired this with a CloudWatch alarm tuned to fire after a single 60-second window rather than three.

Our scale-up configuration:

With a CloudWatch alarm that fires after just 1 evaluation period of 60s above the 50% threshold, this scales up faster than Target Tracking.

However, we found the graduated step adjustments turn out to be less useful in practice than they look on paper: once a scale-up fires and a task gets added, the next CloudWatch datapoint a minute later often shows CPU back in the lowest band, so the +2 and +3 jumps fire rarely or not at all. In our run, the scaler fired +1 once and +2 once, never +3. The peak observed cluster was 7 desired / 9 running (the extra running tasks were ECS replacing saturated ones, not scaling), despite the policy allowing much faster expansion if the CPU had stayed sufficiently elevated.

There’s also a tradeoff with the advanced approach: you have to manage the alarms, thresholds, and step adjustments yourself, and keep them consistent across services. AAS Target Tracking handles this automatically. Step Scaling gives you faster reactions, but you have more settings to maintain, and in the worst cases, it only works a bit better than Target Tracking once the alarms go off.

What about AWS Predictive Scaling?

AAS supports two other policy types: Scheduled Scaling and Predictive Scaling.

Scheduled Scaling is exactly what it sounds like, “scale to N tasks at 9:00 every weekday”, and it has the obvious limitation that it can’t react to anything not on the schedule.

Predictive Scaling is the more interesting one, because it also calls itself “predictive,” and a careful reader should be asking how it relates to what ICC does.

Predictive Scaling uses machine learning to detect cyclical patterns in your historical CloudWatch metrics, typically daily and weekly cycles. It requires a minimum of 14 days of history to produce useful forecasts, looks up to 48 hours ahead, and revises the forecast once an hour. A configurable SchedulingBufferTime (up to 1 hour, default 5 minutes) tells AWS how early to start pre-warming capacity ahead of the forecasted load. It is designed to be used alongside Target Tracking, not as a replacement: Predictive Scaling handles the cyclical baseline, Target Tracking handles deviations.

Both AWS Predictive Scaling and ICC’s predictive scaling are forecast-based, but they operate on completely different timescales and solve different problems:

AWS Predictive Scaling asks, “What does this Tuesday morning usually look like?” It forecasts at the scale of hours and days, needs a historical pattern to detect, and refreshes its plan once an hour.
ICC’s predictive scaling asks, “What is ELU about to do in the next 35 seconds?” It forecasts at the scale of seconds, needs a live trend in the current signal, and refreshes its decision every 10 seconds.

In the benchmark in this post, a single 7-minute traffic ramp on a freshly deployed service, AWS Predictive Scaling would do nothing at all. There is no 14-day history to train from, no cyclical pattern to detect, and even if both were present, the hourly forecast cadence is far too coarse for a 2-minute ramp.

Real production traffic doesn’t follow a perfect pattern. A flash sale, a viral post, a partner deployment that doubles your API load, or an upstream outage that triples your support traffic, none of these show up in last week’s data.

The benchmark in this post focuses on the AWS scalers that actually engage when load changes within minutes, because for a Node.js application past the latency cliff (i.e., past the point of event loop saturation), that is the only timescale that matters.

The structural problems with reactive scaling on ECS

Even if you choose the best AWS-native option and set it up as aggressively as possible, the reactive approach has built-in problems that no configuration can fix.

The startup gap. Once a reactive scaler decides to add tasks, there is a delay before those tasks actually serve traffic:

The CloudWatch alarm evaluates and changes state (60s minimum for Step, 180s for Target Tracking).
AAS receives the alarm transition and applies the scaling policy.
ECS places the task on an EC2 instance with available capacity.
The container image is pulled (or skipped if cached on the host).
The container starts, and the application initializes.
The ECS health check passes.
The ALB registers the task in the target group and begins routing traffic.

In our benchmark, we pre-cached the app image on every EC2 host during deployment, so we skipped step 4. With everything else tuned, it took about 30 to 60 seconds from when the alarm fired to when the task started serving traffic. Without this, it usually takes two to four minutes. Either way, this delay adds to the alarm evaluation time, which is the main source of lag.

If your ECS service runs on EC2 and the underlying Auto Scaling Group needs to add an instance to fit the new task, add another two to five minutes for EC2 boot, ECS agent registration, and the ASG/Capacity Provider machinery. We sized our cluster to avoid this in the benchmark, but it’s the common case in production.

The CloudWatch pipeline lag. Even if you’re not on the AWS-native scalers, CloudWatch itself introduces a delay. Standard metrics aggregate over 60-second windows and arrive with roughly 30 seconds of ingestion lag. If you push a custom ELU metric via PutMetricData, you’ve already added 60 to 90 seconds of staleness before any alarm can evaluate it. There is no version of “fast” that goes through CloudWatch.

The saturation cap. When the metric has a natural ceiling, like ELU at 1.0 or CPU at 100%, the scaler loses visibility into the actual load. A task at 100% CPU might need one more task or ten more, but the formula sees the same number either way. This forces the scaler into a staircase pattern: add tasks based on what it can see, wait for the new tasks to also saturate, and only then realize more are needed. Each step requires a full cycle of task startup and saturation before the next decision can be made.

The redistribution problem. Every time AAS adds a task, it creates a temporary distortion in the metric. The new task starts receiving traffic immediately, but the existing tasks don’t shed their load at the same pace: queues take time to drain, in-flight requests must complete, and garbage collection needs to settle. During this transition, the new task’s CPU is rising while the old tasks’ CPU hasn’t dropped yet. The scaler sees the sum go up and interprets it as growing demand, when it’s actually the overlap of old and new tasks, both holding load at the same time. This can lead AAS to add tasks that aren’t needed, then scale them back fifteen minutes later.

All these issues have the same root cause: each scaling decision is made in isolation. The scaler doesn’t remember past values, can’t see trends, and can’t account for the delay between making a decision and when new capacity is actually available.

Platformatic Intelligent Command Center on ECS

Platformatic Intelligent Command Center (ICC) is the control plane for managing, monitoring, and optimizing Node.js applications running on Watt. A single Watt instance can host multiple Node.js applications, each in its own worker thread within the same process. In an ECS deployment, each task runs one Watt instance, which may host one or several applications as worker threads.

A companion module, @platformatic/watt-extra, runs Watt in each task. It collects runtime metrics, including per-application ELU and heap usage, and streams them to ICC.

The ECS integration differs from the Kubernetes one in exactly one place: how ICC applies its scaling decisions. On Kubernetes, ICC updates the replicas field on a Deployment object. On ECS, ICC calls the UpdateService API directly to change desiredCount. Everything between the metric and the decision is identical: the same metric collection, the same prediction algorithm, the same scale-up logic.

Importantly, ICC skips CloudWatch completely. Watt-Extra sends raw ELU samples straight to ICC, which runs its algorithm and calls the ECS API directly. There’s no CloudWatch metric, no alarm, no 60-second aggregation, and no 3-of-3 evaluation period. This one design choice removes over three minutes of built-in delay before the algorithm even starts.

How ICC’s predictive scaling works

A reactive scaler asks: “Is the application overloaded right now?” and acts on the answer. By the time new tasks are ready, the answer has changed, usually for the worse.

Instead, ICC tracks the load trend over time: not just the current value, but whether it is rising, falling, or stable, and how fast. It extrapolates the trend forward by the time it takes a new task to start and begin serving traffic. If the projected load exceeds the capacity of the current task count, ICC adds tasks immediately. The full details of the algorithm are described in the algorithm whitepaper.

The chart shows ELU per task over the last 20 seconds. The solid line (M_t) has been rising steadily. Right now it’s at 0.73 ($M_\text{now}$), just below the 0.75 threshold (dashed red line). ICC sees the trend and projects that by the time a new task would be ready (the prediction horizon $H$), the metric will reach 0.78 ($M_H$), above the threshold. So it scales up now, before the overload begins.

The rest of this section explains how the algorithm builds that prediction.

Aggregate, predict, project

The algorithm takes per-task metric values (like ELU on each task), combines them into a single cluster-wide number, predicts where that number is heading, and converts the prediction back into a per-task value to compare against the threshold. This aggregate-predict-project flow is the backbone of the algorithm.

Why predict on an aggregate? Per-task metrics change for two reasons: external traffic changes and the scaler’s own actions. When the scaler adds a task, the ALB starts routing traffic to it, and ELU on the existing tasks drops, even though external traffic hasn’t changed at all. If the algorithm predicted the trend from per-task ELU, it would see this drop as “load is decreasing” and might delay further scaling when it’s actually needed. The algorithm avoids this by summing ELU across all tasks into a cluster-wide aggregate. When a task is added and the load redistributes, individual ELU values shift, but the total stays approximately the same. The aggregate reflects external traffic changes without being distorted by scaling actions.

Cleaning the data

Raw metric data is not ready for prediction. Tasks send measurements in batches at different times, so at any given moment, some tasks have reported recent data and others haven’t. After a scale-up, new tasks create temporary distortions in the aggregate. Three preprocessing stages handle this before the data reaches the prediction stage.

Alignment places irregularly-timed samples onto a uniform time grid (e.g., one tick per second) by interpolation, so values from different tasks can be compared at the same points in time.
Imputation estimates values for tasks that haven’t been reported yet. At each tick, the algorithm takes the previous total, subtracts the previous values of tasks that have now reported new data, and uses the remainder as the estimated contribution of the tasks still missing. When a late batch arrives, the estimates are replaced by real data and the totals are recomputed.

Redistribution smooths out the metric distortion after a scale-up. New tasks’ values are included gradually (their contribution ramps from zero to full over a stabilization period) rather than appearing all at once. At the same time, the artificial drop on existing tasks as they shed load is absorbed: the algorithm allows the aggregate to rise (to catch real traffic increases) but prevents it from dropping while new tasks are still stabilizing. Redistribution artifacts are filtered out, but real load changes pass through immediately.

Predicting the trend

The cleaned aggregate enters the prediction stage, which uses Holt’s double exponential smoothing. This method maintains two values at each tick: the level (a smoothed estimate of where the aggregate is now) and the trend (how fast the aggregate is changing). Each new data point updates both. The level tracks the signal while filtering single-tick noise. The trend builds gradually over multiple ticks, converging to the actual rate of change. This lets the smoothing be aggressive enough to filter noise while still reacting quickly to sustained changes.

Asymmetric reaction

The algorithm uses different smoothing for increases and decreases. If the metric rises faster than expected, it reacts quickly, since missing a spike can push the app into the latency cliff before the scaler can help. If the metric drops, it responds more slowly, letting the downward trend build before scaling down. A short dip might just be noise, and scaling down too soon could mean scaling right back up. This matches reality: under-provisioning hurts right away, but a little extra capacity just costs resources.

The prediction horizon

The horizon H determines how far into the future the algorithm looks when extrapolating the trend. It is derived from observed task startup times: how long it actually takes a new task to be scheduled, the container to start, the health check to pass, and the ALB to begin routing traffic. ICC measures this from real-scale-up events in the cluster and adapts over time, so the horizon tracks actual infrastructure conditions.

On ECS, our benchmark uses a horizon of 35 seconds, reflecting the typical task startup time we observed with a pre-cached image. ICC continuously measures real startup time as a rolling window of the last five scale-up events and lifts the horizon as needed, so the horizon tracks actual infrastructure conditions rather than relying on a hardcoded constant. A configurable floor and ceiling prevent the horizon from becoming too short (which would reduce the algorithm’s effectiveness) or too long (which would make the extrapolation unreliable).

The decision loop itself runs every 10 seconds (PLT_SIGNALS_SCALER_PROCESSING_COOLDOWN_MS=10000) — slower than the 5-second metric batch arrival under load, so the algorithm has multiple fresh samples to work with on every decision.

Handling metric saturation

Some metrics have a natural cap. ELU maxes out at 1.0: once the event loop is fully saturated, ELU cannot rise further, no matter how much more traffic arrives. Without special handling, the trend would decay to zero during saturation, and the algorithm would stop scaling even though the load is still growing behind the cap. ICC handles this by preserving the trend during saturation: the trend is allowed to increase but never decrease while the metric is clipped, so the algorithm continues to scale up even when the signal is flat at its maximum.

The scaling decision

The prediction stage produces a predicted aggregate ($A_H$): the forecasted total load at the horizon. The decision stage converts this back into a per-task value by dividing by the current task count, producing the projected per-task metric at the horizon ($M_H$). If $M_H$ exceeds the threshold $\tau$, the algorithm computes how many tasks are needed to keep the per-task metric below the threshold and scales up immediately. If the trend is flat or falling and the metric is within the threshold, it considers scaling down, with a safety margin to avoid immediately scaling right back up.

The full algorithm, including the mathematical formulation and worked examples, is in the algorithm whitepaper.

Signals

Accurate forecasting needs good data. If you average metrics over 15 or 60 seconds, you lose the details that matter for short-term prediction. For example, a sharp spike in the last 5 seconds looks just like a slow climb over 60 seconds. This makes the trend and the forecast less precise.

ICC works with raw metric samples instead. Each task pushes every individual measurement to ICC in batches, with no client-side averaging and no data loss. The batch timing is dynamic: under load, batches are sent frequently (every 5 seconds) to give the scaler fresh data when it matters most. When the application is idle, batches are sent infrequently (every 40 seconds) to save resources. A spike that started 5 seconds ago is visible immediately, not hidden inside a 60-second average that arrives 90 seconds late.

Benchmarks

To measure what predictive scaling actually buys you on ECS, we ran ICC against Target Tracking and Step Scaling under identical conditions on the same cluster with the same application.

Test setup

Application. A Next.js 16 e-commerce application (App Router, Server Components, SSR) runs on Platformatic Watt with one worker per task. We use the same next-bench application from our prior Kubernetes benchmarks, with the same mix of request types (homepage, search, product detail, cart) at the same weights. Each task is sized at 1 vCPU (1024 CPU units) and 2 GiB of memory.

Cluster. ECS-on-EC2 running on five m7i.xlarge instances (4 vCPU / 16 GiB each, 20 vCPU / 80 GiB total) in us-east-1. The Auto Scaling Group is locked at min=max=desired=5, so the EC2 layer doesn’t add its own scaling lag mid-benchmark. The 20 vCPU cluster capacity is sized to fit the full task ceiling (20 tasks × 1 vCPU) with headroom, so no run is constrained by infrastructure. Image pre-caching. The application image is pre-pulled to every EC2 host at deployment time, eliminating ECR pull latency on task start. This is a real-world optimization any production team can do, and it benefits the AWS-native scalers more than it benefits ICC (since AAS sees its 3-minute alarm delay regardless). Traffic redistribution. An Application Load Balancer with a 30-second slow start on the target group (newly-healthy tasks ramp traffic linearly over 30 s rather than getting full round-robin share instantly) and a 10-second deregistration delay on task shutdown. The slow start prevents V8 JIT compilation on cold code paths from skewing the comparison; the short deregistration delay keeps the picture clean during scale-down events.

Scalers. All three operate on the same service definition (min 4, max 20 tasks):

ICC: predictive scaling on average ELU with a 0.7 threshold
ECS Step Scaling: CPU utilization with a 50% threshold (1-of-1 × 60s alarm, +1/+2/+3 graduated step adjustments)
ECS Target Tracking: predefined ECSServiceAverageCPUUtilization metric with a 50% target

ICC scales on ELU because that is the metric that actually tracks Node.js application load. The AWS-native scalers cannot scale on ELU without a custom CloudWatch pipeline, so they scale on CPU at the equivalent threshold. This reflects how each scaler is realistically deployed in production.

Load generator. Grafana k6 running on a dedicated t3.medium EC2 instance in the same VPC, using a ramping-arrival-rate executor (fixed RPS target regardless of server response time) with noConnectionReuse: true to ensure each request requires a fresh TCP and TLS handshake, preventing artificially low latency from connection pooling. The client-side timeout is set to 10 seconds; any request that exceeds it is counted as an error.

Traffic profile. A single ramp scenario:

10 seconds at 100 req/s (baseline)
120 seconds linear ramp from 100 to 800 req/s
300 seconds sustained at 800 req/s

This is a common real-world pattern: traffic grows over a couple of minutes as users arrive, then holds at the new level. The total test duration is 7 minutes and 10 seconds. We did not benchmark a sudden zero-to-peak spike on ECS: it tells you almost nothing about scaler quality on infrastructure with multi-minute task startup, because no scaler can fix it.

Results

All three scalers responded. ICC acted well before the latency cliff, while Step Scaling and Target Tracking only kicked in after the problem started. The latencies below show the results:

ICC kept the median request time at 20 ms with a 99.99% success rate. Step Scaling’s median was 471 ms—about 23 times slower—and it lost roughly one out of every seven requests. Target Tracking was even slower, with a 929 ms median and one in four requests dropped.

Each chart below plots how the cluster behaved over time during one scaler’s run: the average ELU across all tasks (purple), the task count (black step line), and the target request rate (blue shaded area). The dashed red line marks the ELU threshold of 0.7. For Step Scaling and Target Tracking, we also overlay CloudWatch CPU (orange) — the metric they actually scale on.

ICC.

The first scale-up happens before ELU crosses the threshold. This shows the predictive part in action: ICC looks about 35 seconds ahead and acts based on the forecast, not just the current value. When peak load arrives, the cluster already has 9 tasks running, added before the demand hits, not after overload.

ECS Step Scaling.

Step Scaling has the same reactive weaknesses we discussed earlier: it only acts after a threshold is crossed, scales in fixed steps instead of matching demand, and uses CPU instead of ELU. On Kubernetes, these differences set ICC apart from reactive scalers. On ECS, though, the main issue is the built-in delay in AWS’s alarm system. Even the best-tuned Step Scaling alarm can’t fire in less than two to three minutes. By the time the first scale-up happens, the app has already been overloaded for almost four minutes.

ECS Target Tracking.

Target Tracking has the same reactive weaknesses, plus an even longer alarm delay—the hardcoded 3-of-3 × 60s rule means the delay exceeds the entire 7:10 test. The app stays overloaded the entire time, and the only scale-up happens just two seconds before the test ends.

Why does the alarm take so long to fire?

The multi-minute delays are the result of how CloudWatch alarm evaluation is structured. Every CloudWatch alarm runs on three parameters:

Period: how long CloudWatch aggregates raw data into a single data point. For ECS service CPU metrics, this is 60 seconds.
Evaluation Periods: how many recent data points the alarm looks at when deciding whether to change state.
Datapoints to Alarm: how many of those data points must breach the threshold to trigger ALARM

But the alarm doesn’t read these data points directly from your service. Three separate delays stack up before it can transition:

Metric reporting lag. ECS reports CPU in 1-minute periods. A data point for the window 12:00–12:01 doesn’t appear in CloudWatch the instant 12:01 arrives; there’s roughly a minute of pipeline lag before it becomes available for the alarm to read. AWS does not document this latency precisely, and it varies by service.
Alarm evaluation frequency. CloudWatch alarms with a Period of 60 seconds or longer evaluate once per minute. Even if a data point becomes available between ticks, the alarm won’t act on it until the next evaluation cycle.
The evaluation window itself. For 1-of-1, the window is 60 seconds — one period must breach. For 3-of-3, three consecutive periods must breach. That’s 3 minutes of sustained breach data, just to satisfy the alarm condition.

These three stack. An AWS re:Post article on alarm-evaluation timing walks through an example timeline for a 3-of-3 × 1-min alarm: roughly 4 minutes from “metric first breaches the threshold” to “alarm fires.”

The diagram traces each scaler on the same time axis, one lane each.

ICC’s lane is dense with green ticks, one decision-loop evaluation every 10 seconds. The first scale-up fires at T+1:24, before a 1-minute CloudWatch period would even have a complete data point to evaluate.

Step Scaling spends the first minute filling one over-threshold period, then sits in the orange band, the reporting, alarm-evaluation, and AAS-handoff stack listed above. First scale-up lands at T+5:20.

Target Tracking goes through the same orange band, but its yellow band runs three full minutes first, the hardcoded 3-of-3 requirement, before the orange pipeline even begins. First scale-up lands at T+7:08.

Using high-resolution custom metrics or shorter evaluation windows can help a bit, but every CloudWatch-based scaler on ECS still works on a minute-by-minute basis.

Conclusion

Predictive scaling with ICC removes the built-in delays associated with standard methods of scaling ECS reading ELU straight from each task to predict where load is going, and scaling before demand hits.

In our benchmark, ICC had a 99.99% success rate and 20 ms median latency, while Step Scaling lost 14% of requests and Target Tracking lost 25%, with both hitting the 10-second client timeout for the slowest requests.

If you’re running high-traffic Node.js on ECS and want to talk through how this would fit your workload, drop us a note at hello@platformatic.dev or reach out on LinkedIn.

Thanks for reading, and happy building.

AWS ECS auto-scaler is broken (don’t worry, we’ve fixed it)