@platformatic/job-queue: Reliable Background Jobs for Node.j

Every backend developer knows the frustration: a key job disappears during a server restart, or duplicate tasks pile up when a client retries a request. Lost work, repeated emails, missing reports: these breakdowns always seem to happen when reliability matters most.

@platformatic/job-queue is a new queue library from Platformatic focused on reliability and operational simplicity. This library is built on a workflow that lets you enqueue jobs and wait for results when needed, making background processing feel just as smooth as calling a function. Alongside this, it provides Node.js teams with a modern API that includes built-in caching, deduplication, retries, and pluggable storage.

In practice, this means you can start with a tiny local setup and then move to a distributed, production-grade deployment without rewriting your application code.

What makes it different

Most queue setups force you to stitch together multiple patterns and handle edge cases yourself. @platformatic/job-queue includes those patterns out of the box:

Deduplication by job id so repeated enqueue attempts do not create duplicate work.
Request/response support with enqueueAndWait() when you need async processing but still want a result.
Reliable retries with configurable attempts and backoff behavior.
Stalled job recovery via a Reaper that requeues jobs from crashed workers.
Graceful shutdown ensures in-flight jobs complete before the service stops, reducing lost work during deploys and restarts.
Move fast with safety: The API is TypeScript-native with typed payloads and results, so you catch errors at compile time and move confidently.

This makes it appropriate for both classic fire-and-forget workloads and RPC-style workloads that require a response. You do not have to pick one model globally: many teams use both in the same system, depending on endpoint and latency requirements. For example, in use cases such as sending emails and notifications, fire-and-forget jobs make sense because results are often not needed immediately and occasional retries can be handled gracefully. On the other hand, workflows such as generating invoices or processing payments may require the caller to wait for a result, making the request/response pattern with enqueueAndWait() a better fit.

A quick look at the API

You can use the queue as a producer and consumer in the same process, or split them across services. The API is intentionally small, so the same primitives are easy to apply in monoliths, microservices, and worker pools.

import { Queue, MemoryStorage } from '@platformatic/job-queue'

const storage = new MemoryStorage()
const queue = new Queue<{ email: string }, { sent: boolean }>({
 storage,
 concurrency: 5
})

queue.execute(async job => {
 // your business logic
 return { sent: true }
})

await queue.start()

// fire-and-forget
await queue.enqueue('email-1', { email: 'user@example.com' })

// request/response
const result = await queue.enqueueAndWait('email-2', { email: 'another@example.com' }, { timeout: 30_000 })
console.log(result)

await queue.stop()

Architecture description

When you call enqueue(), the producer checks if the job already exists in the storage. If it’s a new job, it's added to the queue with the state “queued,” and the method returns immediately. If the job is a duplicate, the storage returns a duplicate status without creating a new entry.

When you call enqueueAndWait(), the producer first subscribes to a notification for that job, then enqueues it. If the job was already processed, it returns the cached result immediately. Otherwise, it waits for a notification from the worker when the job completes (or fails), then fetches the result and returns it.

The consumer continuously dequeues jobs from the storage using a blocking move operation. When it receives a job, it marks it as “processing” and executes the handler. On success, it stores the result with TTL and marks the job as completed. On failure, it either retries (if attempts remain) or marks the job as failed.

The producer API supports per-job options such as maxAttempts and resultTTL, which are useful when not all jobs have the same retention or retry requirements. For example, you might keep invoice-generation results longer than low-value notification results, even if they run on the same queue.

Storage backends for different environments

@platformatic/job-queue ships with three storage adapters:

MemoryStorage

MemoryStorage keeps all queue states in process memory. This makes it ideal for local development, testing, and simple single-instance services where data can be ephemeral.

import { Queue, MemoryStorage } from '@platformatic/job-queue'
const storage = new MemoryStorage()
const queue = new Queue({ storage })

Jobs are stored in JavaScript Maps and Sets within the same process. This gives you the lowest latency possible, but means jobs are lost if the process restarts. For development workflows where you restart frequently, this is usually not a concern.

FileStorage

FileStorage persists the queue state to the filesystem in JSON format. It works well for simple deployments on a single node where you need persistence but do not want external dependencies like Redis.

import { Queue, FileStorage } from '@platformatic/job-queue'

const storage = new FileStorage('./queue-data')
const queue = new Queue({ storage })

The storage writes atomically to prevent corruption, and it maintains separate files for jobs, metadata, and locks. Since it relies on file system locks, it is not suitable for multi-node deployments.

RedisStorage

RedisStorage uses Redis (7+) or Valkey (8+) for distributed queue operations. This is the recommended choice for production workloads that require horizontal scaling, leader election, or cross-instance coordination.

import { Queue, RedisStorage } from '@platformatic/job-queue'
const storage = new RedisStorage({ connectionString: 'redis://localhost:6379' })
const queue = new Queue({ storage })

RedisStorage leverages Redis data structures for atomic operations:

Lists for job queues
Sorted sets for delayed job scheduling
Pub/sub for notifications across instances
Lua scripts for atomic state changes

For high availability, RedisStorage also supports Sentinel and Cluster modes for failover and sharding.

Choosing the right backend

Start with MemoryStorage for development, use FileStorage for simple single-node deployments, and choose RedisStorage for production systems that need horizontal scaling.

Reliability features that matter in production

The library is designed around the real failure modes of job processing systems.

Visualize this: you deploy a routine patch, and one of your job workers crashes unnoticed. By the next day, 5,000 critical jobs piled up and could have vanished forever. But thanks to built-in recovery, every one of them was automatically rescued. Situations like this are exactly where background processing systems prove their worth, thanks to strong safeguards.

Recovering stalled jobs

If a worker crashes while processing a job, the Reaper can detect the stalled work and requeue it after visibilityTimeout.

import { Reaper } from '@platformatic/job-queue'
const reaper = new Reaper({
 storage,
 visibilityTimeout: 30_000
})
await reaper.start()

For high availability, the Reaper also supports leader election (with Redis storage), so multiple instances can run safely while only one acts as leader at a time. If the leader goes away, another instance takes over, which helps avoid manual control during incidents.

Controlled retries and terminal states

Failed jobs can retry automatically up to maxRetries. When retries are exhausted, errors are persisted as a terminal state so producers can inspect or react programmatically.

This gives you reliable behavior for flaky dependencies, such as third-party APIs: transient failures recover automatically, while permanent failures remain visible and actionable.

Graceful shutdown

When stopping a worker, queue.stop() waits for in-flight jobs to finish. This reduces dropped work during deploys and restarts and helps keep queue state consistent across gradual updates. In practice, this means you can safely perform blue/green or canary deployments without worrying about losing in-progress work. Teams can ship changes faster, with the confidence that jobs will complete and customer data will not go missing, even as new versions are rolled out.

Request/response without building custom plumbing

One particularly useful capability is enqueueAndWait(). Teams often build this pattern manually on top of queues, but it is already integrated here, including timeout handling and typed errors.

try {
 const result = await queue.enqueueAndWait('invoice-123', payload, { timeout: 10_000 })
 return result
} catch (error) {
 // handle TimeoutError / JobFailedError, etc.
}

This is a good fit when work should run in a worker context, but the caller still needs a bounded response path, such as document generation, webhook fan-out, or expensive validation that should not run on an HTTP thread.

You also get explicit queue errors (TimeoutError, JobFailedError, and others), so your application can distinguish among transport problems, worker failures, and business-level errors.

Getting started

Install the package:

npm install @platformatic/job-queue

Then choose a backend based on your environment:

Start with MemoryStorage for local development.
Move to RedisStorage (Redis 7+ or Valkey 8+) for production.
Add Reaper when running multiple workers or when stalled-job recovery is required.

If you already have queue infrastructure in place, one good migration approach is to move one bounded workflow first (for example, email delivery or report generation), validate behavior and observability, and then expand usage across other jobs.

We recommend separating responsibilities into dedicated processes:

Producer services enqueue jobs from HTTP handlers or internal events.
Worker services execute jobs with tuned concurrency.
A Reaper instance handles stalled-job recovery (or multiple instances with leader election).

This setup lets you scale producers and workers independently. If incoming traffic spikes, add producers; if processing backlog grows, add workers.

Final thoughts

@platformatic/job-queue is a practical option for Node.js teams that want reliable background processing without having to assemble every reliability feature from scratch. The combination of deduplication, request/response semantics, retries, and pluggable storage makes it flexible enough for both simple jobs and more demanding production workloads. Most importantly, it lets you focus on what matters most: building features and generating value, knowing your background tasks are handled with care. Imagine deployments where you can sleep soundly, confident that every job is accounted for and that no critical work is lost, even during outages. With the right foundation, you are set up not just for peace of mind, but for lasting success as your systems and team continue to grow.

If you are evaluating queue systems for your next service, this is a good time to try it and share feedback with the team (us). Real-world feedback is especially valuable while the project is still young and evolving quickly. If you run into an unexpected edge case or a strange retry failure, please open an issue describing your scenario: we love to fix hard problems. Concrete examples help us improve reliability for everyone!

Introducing @platformatic/job-queue

What makes it different

A quick look at the API

Architecture description

Storage backends for different environments

MemoryStorage

FileStorage

RedisStorage

Choosing the right backend

Reliability features that matter in production

Recovering stalled jobs

Controlled retries and terminal states

Graceful shutdown

Request/response without building custom plumbing

Getting started

Final thoughts

Comments

More from this blog

Run Durable Eve Agents on Kubernetes with Platformatic

Stop Request Stampedes at the Gateway with Platformatic Deduplication

AWS ECS auto-scaler is broken (don’t worry, we’ve fixed it)

Destino: Doom in Your Terminal, Powered by Node.js FFI

Ahead of Time Scaling: How Platformatic ICC Predicts and Provisions

Command Palette

What makes it different

A quick look at the API

Architecture description

Storage backends for different environments

MemoryStorage

FileStorage

RedisStorage

Choosing the right backend

Reliability features that matter in production

Recovering stalled jobs

Controlled retries and terminal states

Graceful shutdown

Request/response without building custom plumbing

Getting started

Final thoughts

Comments

More from this blog